Login / Signup

Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features.

Yi-Ze ZhangHong-Bin Shen
Published in: Journal of chemical information and modeling (2017)
Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the transmembrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data sets, and the experimental results have demonstrated its accuracy. The online server is available at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/ .
Keyphrases
  • amino acid
  • randomized controlled trial
  • machine learning
  • transcription factor
  • dna binding
  • mental health
  • drug delivery
  • climate change
  • neural network
  • data analysis