Login / Signup

FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction.

Biaoshun LiMujie LinTiegen ChenLing Wang
Published in: Briefings in bioinformatics (2023)
Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.
Keyphrases
  • machine learning
  • artificial intelligence
  • working memory
  • deep learning
  • drug discovery
  • big data
  • single molecule
  • physical activity
  • transcription factor
  • mental health
  • air pollution
  • emergency department
  • single cell