Login / Signup

Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting.

Xuan LvJianwen ChenYutong LuZhiguang ChenNong XiaoYuedong Yang
Published in: Journal of chemical information and modeling (2020)
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Keyphrases
  • protein protein
  • amino acid
  • binding protein
  • machine learning
  • healthcare
  • small molecule
  • dna methylation
  • mass spectrometry
  • signaling pathway
  • diffusion weighted imaging