Login / Signup

Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation.

Qian JiangAshley I TeufelEleisha L JacksonClaus O Wilke
Published in: Genetics (2018)
Biological evolution generates a surprising amount of site-specific variability in protein sequences. Yet, attempts at modeling this process have been only moderately successful, and current models based on protein structural metrics explain, at best, 60% of the observed variation. Surprisingly, simple measures of protein structure, such as solvent accessibility, are often better predictors of site-specific variability than more complex models employing all-atom energy functions and detailed structural modeling. We suggest here that these more complex models perform poorly because they lack consideration of the evolutionary process, which is, in part, captured by the simpler metrics. We compare protein sequences that are computationally designed to sequences that are computationally evolved using the same protein-design energy function and to homologous natural sequences. We find that, by a wide variety of metrics, evolved sequences are much more similar to natural sequences than are designed sequences. In particular, designed sequences are too conserved on the protein surface relative to natural sequences, whereas evolved sequences are not. Our results suggest that evolutionary simulation produces a realistic sampling of sequence space. By contrast, protein design-at least as currently implemented-does not. Existing energy functions seem to be sufficiently accurate to correctly describe the key thermodynamic constraints acting on protein sequences, but they need to be paired with realistic sampling schemes to generate realistic sequence alignments.
Keyphrases