Leveraging DFT and Molecular Fragmentation for Chemically Accurate p K a Prediction Using Machine Learning.
Alec J SanchezSarah MaierKrishnan RaghavachariPublished in: Journal of chemical information and modeling (2024)
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the p K a s of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Keyphrases
- density functional theory
- molecular dynamics
- machine learning
- deep learning
- big data
- randomized controlled trial
- artificial intelligence
- high resolution
- electronic health record
- white matter
- single molecule
- functional connectivity
- multiple sclerosis
- mass spectrometry
- convolutional neural network
- water soluble
- crystal structure