Login / Signup

COVER: conformational oversampling as data augmentation for molecules.

Jennifer HemmerichEce AsilarGerhard F Ecker
Published in: Journal of cheminformatics (2020)
Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.
Keyphrases
  • neural network
  • molecular dynamics
  • molecular dynamics simulations
  • single molecule
  • rna seq
  • oxidative stress
  • virtual reality
  • big data
  • single cell
  • soft tissue