GraphSite: Ligand Binding Site Classification with Deep Graph Learning.
Wentao ShiManali SinghaLimeng PuGopal SrivastavaJagannathan RamanujamMichal BrylinskiPublished in: Biomolecules (2022)
The binding of small organic molecules to protein targets is fundamental to a wide array of cellular functions. It is also routinely exploited to develop new therapeutic strategies against a variety of diseases. On that account, the ability to effectively detect and classify ligand binding sites in proteins is of paramount importance to modern structure-based drug discovery. These complex and non-trivial tasks require sophisticated algorithms from the field of artificial intelligence to achieve a high prediction accuracy. In this communication, we describe GraphSite, a deep learning-based method utilizing a graph representation of local protein structures and a state-of-the-art graph neural network to classify ligand binding sites. Using neural weighted message passing layers to effectively capture the structural, physicochemical, and evolutionary characteristics of binding pockets mitigates model overfitting and improves the classification accuracy. Indeed, comprehensive cross-validation benchmarks against a large dataset of binding pockets belonging to 14 diverse functional classes demonstrate that GraphSite yields the class-weighted F1-score of 81.7%, outperforming other approaches such as molecular docking and binding site matching. Further, it also generalizes well to unseen data with the F1-score of 70.7%, which is the expected performance in real-world applications. We also discuss new directions to improve and extend GraphSite in the future.
Keyphrases
- deep learning
- neural network
- artificial intelligence
- convolutional neural network
- machine learning
- molecular docking
- big data
- drug discovery
- binding protein
- magnetic resonance
- high resolution
- dna binding
- protein protein
- molecular dynamics simulations
- contrast enhanced
- network analysis
- working memory
- magnetic resonance imaging
- amino acid
- current status
- radiation induced
- mass spectrometry
- computed tomography
- gene expression
- radiation therapy
- data analysis
- transcription factor