Login / Signup

Evaluation of sequence-based predictors for phase-separating protein.

Shaofeng LiaoYujun ZhangYifei QiZhuqing Zhang
Published in: Briefings in bioinformatics (2023)
Liquid-liquid phase separation (LLPS) of proteins and nucleic acids underlies the formation of biomolecular condensates in cell. Dysregulation of protein LLPS is closely implicated in a range of intractable diseases. A variety of tools for predicting phase-separating proteins (PSPs) have been developed with the increasing experimental data accumulated and several related databases released. Comparing their performance directly can be challenging due to they were built on different algorithms and datasets. In this study, we evaluate eleven available PSPs predictors using negative testing datasets, including folded proteins, the human proteome, and non-PSPs under near physiological conditions, based on our recently updated LLPSDB v2.0 database. Our results show that the new generation predictors FuzDrop, DeePhase and PSPredictor perform better on folded proteins as a negative test set, while LLPhyScore outperforms other tools on the human proteome. However, none of the predictors could accurately identify experimentally verified non-PSPs. Furthermore, the correlation between predicted scores and experimentally measured saturation concentrations of protein A1-LCD and its mutants suggests that, these predictors could not consistently predict the protein LLPS propensity rationally. Further investigation with more diverse sequences for training, as well as considering features such as refined sequence pattern characterization that comprehensively reflects molecular physiochemical interactions, may improve the performance of PSPs prediction.
Keyphrases
  • endothelial cells
  • amino acid
  • protein protein
  • binding protein
  • machine learning
  • stem cells
  • small molecule
  • induced pluripotent stem cells
  • mesenchymal stem cells
  • drug induced
  • electronic health record