Machine-Learning-Based Predictor of Human-Bacteria Protein-Protein Interactions by Incorporating Comprehensive Host-Network Properties.
Xianyi LianShiping YangHong LiChen FuZiding ZhangPublished in: Journal of proteome research (2019)
The large-scale identification of protein-protein interactions (PPIs) between humans and bacteria remains a crucial step in systematically understanding the underlying molecular mechanisms of bacterial infection. Computational prediction approaches are playing an increasingly important role in accelerating the identification of PPIs. Here, we developed a new machine-learning-based predictor of human- Yersinia pestis PPIs. First, three conventional sequence-based encoding schemes and two host network-property-related encoding schemes (i.e., NetTP and NetSS) were introduced. Motivated by previous human-pathogen PPI network analyses, we designed NetTP to systematically characterize the host proteins' network topology properties and designed NetSS to reflect the molecular mimicry strategy used by pathogen proteins. Subsequently, individual predictive models for each encoding scheme were inferred by Random Forest. Finally, through the noisy-OR algorithm, 5 individual models were integrated into a final powerful model with an AUC value of 0.922 in the 5-fold cross-validation. Stringent benchmark experiments further revealed that our model could achieve a better performance than two state-of-the-art human-bacteria PPI predictors. In addition to the selection of a suitable computational framework, the success of our proposed approach could be largely attributed to the introduction of two comprehensive host network-property-related feature sets. To facilitate the community, a web server implementing our proposed method has been made freely accessible at http://systbio.cau.edu.cn/intersppiv2/ or http://zzdlab.com/intersppiv2/ .