Login / Signup

Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet.

Nicolae SapovalYunxi LiuKristen D CurryBryce KilleWenyu HuangNatalie KokrokoMichael G NuteAlona TyshaievaAlexander T DiltheyErin K MolloyTodd J Treangen
Published in: bioRxiv : the preprint server for biology (2024)
The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets. Lemur is a marker-gene-based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet is a whole-genome read mapping based method that provides detailed presence and absence calls for bacterial genomes. We demonstrate that Lemur and Magnet can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs, a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. Lemur and Magnet are open-source and available at https://github.com/treangenlab/lemur and https://github.com/treangenlab/magnet.
Keyphrases
  • single molecule
  • single cell
  • rna seq
  • machine learning
  • high resolution
  • copy number
  • antibiotic resistance genes
  • deep learning
  • genome wide
  • gene expression
  • mass spectrometry
  • genome wide identification