Login / Signup

TorsionNet: A Deep Neural Network to Rapidly Predict Small-Molecule Torsional Energy Profiles with the Accuracy of Quantum Mechanics.

Brajesh K RaiVishnu SreshtQingyi YangRay UnwallaMeihua TuAlan M MathiowetzGregory A Bakken
Published in: Journal of chemical information and modeling (2022)
Fast and accurate assessment of small-molecule dihedral energetics is crucial for molecular design and optimization in medicinal chemistry. Yet, accurate prediction of torsion energy profiles remains challenging as the current molecular mechanics (MM) methods are limited by insufficient coverage of drug-like chemical space and accurate quantum mechanical (QM) methods are too expensive. To address this limitation, we introduce TorsionNet, a deep neural network (DNN) model specifically developed to predict small-molecule torsion energy profiles with QM-level accuracy. We applied active learning to identify nearly 50k fragments (with elements H, C, N, O, F, S, and Cl) that maximized the coverage of our corporate compound library and leveraged massively parallel cloud computing resources for density functional theory (DFT) torsion scans of these fragments, generating a training data set of 1.2 million DFT energies. After training TorsionNet on this data set, we obtain a model that can rapidly predict the torsion energy profile of typical drug-like fragments with DFT-level accuracy. Importantly, our method also provides an uncertainty estimate for the predicted profiles without any additional calculations. In this report, we show that TorsionNet can accurately identify the preferred dihedral geometries observed in crystal structures. Our TorsionNet-based analysis of a diverse set of protein-ligand complexes with measured binding affinity shows a strong association between high ligand strain and low potency. We also present practical applications of TorsionNet that demonstrate how consideration of DNN-based strain energy leads to substantial improvement in existing lead discovery and design workflows. TorsionNet500, a benchmark data set comprising 500 chemically diverse fragments with DFT torsion profiles (12k MM- and DFT-optimized geometries and energies), has been created and is made publicly available.
Keyphrases