Login / Signup

Parameterized syncmer schemes improve long-read mapping.

Abhinav DuttaDavid PellowRon Shamir
Published in: PLoS computational biology (2022)
We introduce parameterized syncmer schemes (PSS), a generalization of syncmers, and provide a theoretical analysis for multi-parameter schemes. By combining PSS with downsampling or minimizers we can achieve any desired compression and window guarantee. We implemented the use of PSS in the popular minimap2 and Winnowmap2 mappers. In tests on simulated and real long-read data from a variety of genomes, the PSS-based algorithms, with scheme parameters selected on the basis of our theoretical analysis, reduced unmapped reads by 20-60% at high compression while usually using less memory. The advantage was more pronounced at low sequence identity. At sequence identity of 75% and medium compression, PSS-minimap had only 37% as many unmapped reads, and 8% fewer of the reads that did map were incorrectly mapped. Even at lower compression and error rates, PSS-based mapping mapped more reads than the original minimizer-based mappers as well as mappers using the original syncmer schemes. We conclude that using PSS can improve mapping of long reads in a wide range of settings.
Keyphrases
  • high resolution
  • high density
  • machine learning
  • working memory
  • electronic health record
  • deep learning
  • big data
  • amino acid
  • data analysis
  • artificial intelligence