Benchmarking In Silico Tools for Cysteine p K a Prediction.
Ernest Awoonor-WilliamsAndrei A GolosovViktor HornakPublished in: Journal of chemical information and modeling (2023)
Accurate estimation of the p K a 's of cysteine residues in proteins could inform targeted approaches in hit discovery. The p K a of a targetable cysteine residue in a disease-related protein is an important physiochemical parameter in covalent drug discovery, as it influences the fraction of nucleophilic thiolate amenable to chemical protein modification. Traditional structure-based in silico tools are limited in their predictive accuracy of cysteine p K a 's relative to other titratable residues. Additionally, there are limited comprehensive benchmark assessments for cysteine p K a predictive tools. This raises the need for extensive assessment and evaluation of methods for cysteine p K a prediction. Here, we report the performance of several computational p K a methods, including single-structure and ensemble-based approaches, on a diverse test set of experimental cysteine p K a 's retrieved from the PKAD database. The dataset consisted of 16 wildtype and 10 mutant proteins with experimentally measured cysteine p K a values. Our results highlight that these methods are varied in their overall predictive accuracies. Among the test set of wildtype proteins evaluated, the best method (MOE) yielded a mean absolute error of 2.3 p K units, highlighting the need for improvement of existing p K a methods for accurate cysteine p K a estimation. Given the limited accuracy of these methods, further development is needed before these approaches can be routinely employed to drive design decisions in early drug discovery efforts.