Aggregation prone regions in human proteome: Insights from large-scale data analyses.
R PrabakaranDhruv GoelSandeep KumarM Michael GromihaPublished in: Proteins (2017)
Protein aggregation leads to several burdensome human maladies, but a molecular level understanding of how human proteome has tackled the threat of aggregation is currently lacking. In this work, we survey the human proteome for incidence of aggregation prone regions (APRs), by using sequences of experimentally validated amyloid-fibril forming peptides and via computational predictions. While approximately 30 human proteins are currently known to be amyloidogenic, we found that 260 proteins (∼1% of human proteome) contain at least one experimentally validated amyloid-fibril forming segment. Computer predictions suggest that more than 80% of the human proteins contain at least one potential APR and approximately two-thirds (65%) contain two or more APRs; spanning 3-5% of their sequences. Sequence randomizations show that this apparently high incidence of APRs has been actually significantly reduced by unique amino acid composition and sequence patterning of human proteins. The human proteome has utilized a wide repertoire of sequence-structural optimization strategies, most of them already known, to minimize deleterious consequences due to the presence of APRs while simultaneously taking advantage of their order promoting properties. This survey also found that APRs tend to be located near the active and ligand binding sites in human proteins, but not near the post translational modification sites. The APRs in human proteins are also preferentially found at heterotypic interfaces rather than homotypic ones. Interestingly, this survey reveals that APRs play multiple, often opposing, roles in the human protein sequence-structure-function relationships. Insights gained from this work have several interesting implications towards novel drug discovery and development. Proteins 2017; 85:1099-1118. © 2017 Wiley Periodicals, Inc.