Comparative Performance of High-Throughput Methods for Protein p K a Predictions.
Wanlei WeiHervé HoguesTraian SuleaPublished in: Journal of chemical information and modeling (2023)
The medically relevant field of protein-based therapeutics has triggered a demand for protein engineering in different pH environments of biological relevance. In silico engineering workflows typically employ high-throughput screening campaigns that require evaluating large sets of protein residues and point mutations by fast yet accurate computational algorithms. While several high-throughput p K a prediction methods exist, their accuracies are unclear due to the lack of a current comprehensive benchmarking. Here, seven fast, efficient, and accessible approaches including PROPKA3, DeepKa, PKAI, PKAI+, DelPhiPKa, MCCE2, and H++ were systematically tested on a nonredundant subset of 408 measured protein residue p K a shifts from the p K a database (PKAD). While no method outperformed the null hypotheses with confidence, as illustrated by statistical bootstrapping, DeepKa, PKAI+, PROPKA3, and H++ had utility. More specifically, DeepKa consistently performed well in tests across multiple and individual amino acid residue types, as reflected by lower errors, higher correlations, and improved classifications. Arithmetic averaging of the best empirical predictors into simple consensuses improved overall transferability and accuracy up to a root-mean-square error of 0.76 p K a units and a correlation coefficient ( R 2 ) of 0.45 to experimental p K a shifts. This analysis should provide a basis for further methodological developments and guide future applications, which require embedding of computationally inexpensive p K a prediction methods, such as the optimization of antibodies for pH-dependent antigen binding.
Keyphrases
- amino acid
- high throughput
- protein protein
- binding protein
- machine learning
- deep learning
- magnetic resonance imaging
- single cell
- transcription factor
- mass spectrometry
- high resolution
- current status
- computed tomography
- patient safety
- molecular dynamics simulations
- quality improvement
- electronic health record
- dna binding
- data analysis