Login / Signup

Core column prediction for protein multiple sequence alignments.

Dan DeBlasioJohn Kececioglu
Published in: Algorithms for molecular biology : AMB (2017)
We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.
Keyphrases
  • machine learning
  • liquid chromatography
  • deep learning
  • mass spectrometry
  • protein protein
  • high resolution
  • magnetic resonance imaging
  • small molecule
  • contrast enhanced