Login / Signup

UniAligner: a parameter-free framework for fast sequence alignment.

Andrey V BzikadzePavel A Pevzner
Published in: Nature methods (2023)
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Keyphrases
  • endothelial cells
  • machine learning
  • deep learning
  • induced pluripotent stem cells
  • single cell
  • pluripotent stem cells
  • genome wide
  • high resolution
  • gene expression
  • copy number