Login / Signup

SPUMONI 2: improved classification using a pangenome index of minimizer digests.

Omar Y AhmedMassimiliano RossiTravis GagieChristina BoucherBen Langmead
Published in: Genome biology (2023)
Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.
Keyphrases
  • deep learning
  • machine learning
  • climate change
  • healthcare
  • mental health
  • risk assessment
  • artificial intelligence
  • drinking water
  • single cell
  • mass spectrometry
  • sensitive detection