Login / Signup

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids.

Sruthy K ChandyKrishnan Raghavachari
Published in: Journal of chemical theory and computation (2023)
We developed a random forest machine learning (ML) model for the prediction of 1 H and 13 C NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 1 H chemical shifts and 1780 13 C chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for 1 H chemical shifts and 2.52 ppm for 13 C chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.
Keyphrases
  • density functional theory
  • machine learning
  • high resolution
  • magnetic resonance
  • molecular dynamics
  • randomized controlled trial
  • climate change
  • computed tomography
  • molecular docking
  • mass spectrometry
  • deep learning