Login / Signup

Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast.

Yuyang WangRishikesh MagarChen LiangAmir Barati Farimani
Published in: Journal of chemical information and modeling (2022)
Deep learning has been a prevalence in computational chemistry and widely implemented in molecular property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), has gathered growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multilevel graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR, i mprovement of Mol ecular C ontrastive L earning of R epresentations with graph neural networks (GNNs) in two aspects: (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs and (2) fragment-level contrasting between intramolecule and intermolecule substructures decomposed from molecules. Experiments have shown that the proposed strategies significantly improve the performance of GNN models on various challenging molecular property predictions. In comparison to the previous CL framework, iMolCLR demonstrates an averaged 1.2% improvement of ROC-AUC on eight classification benchmarks and an averaged 10.1% decrease of the error on six regression benchmarks. On most benchmarks, the generic GNN pretrained by iMolCLR rivals or even surpasses supervised learning models with sophisticated architectures and engineered features. Further investigations demonstrate that representations learned through iMolCLR intrinsically embed scaffolds and functional groups that can reason molecule similarities.
Keyphrases
  • machine learning
  • deep learning
  • single molecule
  • healthcare
  • artificial intelligence
  • computed tomography
  • neural network
  • climate change
  • big data
  • risk factors
  • mental health
  • risk assessment
  • convolutional neural network