Login / Signup

Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter.

Swarit JasialErik GilbergThomas BlaschkeJürgen Bajorath
Published in: Journal of medicinal chemistry (2018)
Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.
Keyphrases
  • machine learning
  • neural network
  • high throughput
  • artificial intelligence
  • big data
  • mental health
  • climate change
  • risk assessment
  • single cell
  • human health
  • drug discovery