Redefining the Protein-Protein Interface: Coarse Graining and Combinatorics for an Improved Understanding of Amino Acid Contributions to the Protein-Protein Binding Affinity.
Josh K SmithShaoyi JiangJim PfaendtnerPublished in: Langmuir : the ACS journal of surfaces and colloids (2017)
The ability to intervene in biological pathways has for decades been limited by the lack of a quantitative description of protein-protein interactions (PPIs). Herein we generate and compare millions of simple PPI models for insight into the mechanisms of specific recognition and binding. We use a coarse-grained approach whereby amino acids are counted in the interface, and these counts are used as binding affinity predictors. We perform lasso regression, a modern regression technique aimed at interpretability, with every possible amino acid combination (over 106 unique feature sets) to select only those amino acid predictors that provide more information than noise. This approach circumvents arbitrary binning and assumptions about the binding environment that obscure other binding affinity models. Aggregated analysis of these models trained at various interfacial cutoff distances informs the roles of specific amino acids in different binding contexts. We find that a simple amino acid count model outperforms detailed intermolecular contact and binned residue type models. We identify the prevalence of serine, glycine, and tryptophan in the interface as particularly important for predicting binding affinity across a range of distance cutoffs. Although current sample size limitations prevent a robust consensus model for binding affinity prediction, our approach underscores the relevance of a residue-based description of the protein-protein interface to increase our understanding of specific interactions.