The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory.
Nicolas TielkerLukas EberleinStefan GüssregenStefan M KastPublished in: Journal of computer-aided molecular design (2018)
The "embedded cluster reference interaction site model" (EC-RISM) integral equation theory is applied to the problem of predicting aqueous pKa values for drug-like molecules based on an ensemble of tautomers. EC-RISM is based on self-consistent calculations of a solute's electronic structure and the distribution function of surrounding water. Following-up on the workflow developed after the SAMPL5 challenge on cyclohexane-water distribution coefficients we extended and improved the methodology by taking into account exact electrostatic solute-solvent interactions taken from the wave function in solution. As before, the model is calibrated against Gibbs energies of hydration from the "Minnesota Solvation Database" and a public dataset of acidity constants of organic acids and bases by adjusting in total 4 parameters, among which only 3 are relevant for predicting pKa values. While the best-performing training model yields a root-mean-square error (RMSE) of 1 pK unit, the corresponding test set prediction on the full SAMPL6 dataset of macroscopic pKa values using the same level of theory exhibits slightly larger error (1.7 pK units) than the best test set model submitted (1.7 pK units for corresponding training set vs. test set performance of 1.6). Post-submission analysis revealed a number of physical optimization options regarding the numerical treatment of electrostatic interactions and conformational sampling. While the experimental test set data revealed after submission was not used for reparametrizing the methodology, the best physically optimized models consequentially result in RMSEs of 1.5 if only improved electrostatic interactions are considered and of 1.1 if, in addition, conformational sampling accounts for quantum-chemically derived rankings. We conclude that these numbers are probably near the ultimate accuracy achievable with the simple 3-parameter model using a single or the two best-ranking conformations per tautomer or microstate. Finally, relations of the present macrostate approach to microstate pKa results are discussed and some illustrative results for microstate populations are presented.