OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy.
Anders S ChristensenSai Krishna SirumallaZhuoran QiaoMichael B O'ConnorDaniel G A SmithFeizhi DingPeter J BygraveAnimashree AnandkumarMatthew WelbornFrederick R ManbyThomas F Miller IiiPublished in: The Journal of chemical physics (2021)
We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing graph neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 × 10 6 DFT calculations on molecules and geometries. This dataset covers the most common elements in biochemistry and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coefficient of R 2 = 0.90 compared to the reference DLPNO-CCSD(T) calculation and R 2 = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an average mean absolute error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.
Keyphrases
- density functional theory
- molecular dynamics
- machine learning
- neural network
- low cost
- mental health
- artificial intelligence
- deep learning
- human health
- magnetic resonance imaging
- risk assessment
- body composition
- convolutional neural network
- staphylococcus aureus
- pseudomonas aeruginosa
- candida albicans
- biofilm formation
- protein kinase