Where Informatics Lags Chemistry Leads.
Rahul KaushikAnkita SinghBhyravabhotla JayaramPublished in: Biochemistry (2017)
The fact that amino acid sequences dictate the tertiary structures of proteins has been known for more than five decades. While the molecular pathways to tertiary structure are still being worked out, with the axiom that similar sequences adopt similar structures, computational methods are being developed continually in parallel, utilizing the Protein Data Bank structural repository and homologue detection strategies to predict structures of sequences of interest. The success of this approach is limited by the ability to unravel the hidden similarities among amino acid sequences. We consider here the 20 amino acids as a complete set of chemical templates in the physicochemical space of proteins and propose a new structural and chemical classification of amino acids. An integration of this perspective into the conventional evolutionary methods of similarity detection leads to an unprecedented increase in the accuracy in homologue detection, resulting in improved protein structure prediction. The performance is validated on a large data set of 11716 unique proteins, and the results are benchmarked against conventional methods. The availability of good quality protein structures helps in structure-based drug design endeavors and in establishing protein structure-function correlations.