Login / Signup

Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity.

Kamel MansouriKyla TaylorScott AuerbachStephen FergusonRachel FrawleyJui-Hua HsiehGloria JahnkeNicole KleinstreuerSuril MehtaJosé T Moreira-FilhoFred ParhamCynthia RiderAndrew A RooneyAmy WangVicki Sutherland
Published in: Environmental health perspectives (2024)
Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.
Keyphrases
  • machine learning
  • deep learning
  • rna seq
  • randomized controlled trial
  • high resolution
  • systematic review
  • chronic pain
  • risk assessment
  • single molecule