Estimating Change in Foldability Due to Multipoint Deletions in Protein Structures.
Anupam BanerjeeAmit KumarKushal Kanti GhoshPralay MitraPublished in: Journal of chemical information and modeling (2020)
Insertions/deletions of amino acids in the protein backbone potentially result in altered structural/functional specifications. They can either contribute positively to the evolutionary process or can result in disease conditions. Despite being the second most prevalent form of protein modification, there are no databases or computational frameworks that delineate harmful multipoint deletions (MPD) from beneficial ones. We introduce a positive unlabeled learning-based prediction framework (PROFOUND) that utilizes fold-level attributes, environment-specific properties, and deletion site-specific properties to predict the change in foldability arising from such MPDs, both in the non-loop and loop regions of protein structures. In the absence of any protein structure dataset to study MPDs, we introduce a dataset with 153 MPD instances that lead to native-like folded structures and 7650 unlabeled MPD instances whose effect on the foldability of the corresponding proteins is unknown. PROFOUND on 10-fold cross-validation on our newly introduced dataset reports a recall of 82.2% (86.6%) and a fall out rate (FR) of 14.2% (20.6%), corresponding to MPDs in the protein loop (non-loop) region. The low FR suggests that the foldability in proteins subject to MPDs is not random and necessitates unique specifications of the deleted region. In addition, we find that additional evolutionary attributes contribute to higher recall and lower FR. The first of a kind foldability prediction system owing to MPD instances and the newly introduced dataset will potentially aid in novel protein engineering endeavors.