True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better.
Viet-Khoa Tran-NguyenGuillaume BretDidier RognanPublished in: Journal of chemical information and modeling (2021)
Hundreds of fast scoring functions have been developed over the last 20 years to predict binding free energies from three-dimensional structures of protein-ligand complexes. Despite numerous statistical promises, we believe that none of them has been properly validated for daily prospective high-throughput virtual screening studies, mostly because in silico screening challenges usually employ artificially built and biased datasets. We here carry out a fully unbiased evaluation of four scoring functions (Pafnucy, ΔvinaRF20, IFP, and GRIM) on an in-house developed data collection of experimental high-confidence screening data (LIT-PCBA) covering about 3 million data points on 15 diverse pharmaceutical targets. All four scoring functions were applied to rescore the docking poses of LIT-PCBA compounds in conditions mimicking exactly standard drug discovery scenarios and were compared in terms of propensity to enrich true binders in the top 1%-ranked hit lists. Interestingly, rescoring based on simple interaction fingerprints or interaction graphs outperforms state-of-the-art machine learning and deep learning scoring functions in most of the cases. The current study notably highlights the strong tendency of deep learning methods to predict affinity values within a very narrow range centered on the mean value of samples used for training. Moreover, it suggests that knowledge of pre-existing binding modes is the key to detecting the most potent binders.
Keyphrases
- deep learning
- machine learning
- big data
- electronic health record
- high throughput
- drug discovery
- protein protein
- artificial intelligence
- molecular dynamics
- molecular dynamics simulations
- healthcare
- climate change
- high resolution
- molecular docking
- small molecule
- binding protein
- density functional theory
- dna binding
- case control