Login / Signup

Lightweight taxonomic profiling of long-read sequenced metagenomes with Lemur and Magnet.

Nicolae SapovalYunxi LiuKristen D CurryBryce KilleWenyu HuangNatalie KokrokoMichael G NuteAlona TyshaievaAlexander T DiltheyErin K MolloyTodd J Treangen
Published in: bioRxiv : the preprint server for biology (2024)
Taxonomic profiling is a ubiquitous task in the analysis of clinical and environmental microbiomes. The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling from long-read shotgun metagenomic datasets. Lemur is a marker-gene based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet makes detailed presence/absence calls for bacterial genomes based on whole-genome read mapping. The tools work in sequence: Lemur estimates abundances conservatively, and Magnet operates on the genomes of identified organisms to filter out likely false positive taxa. The result is an increase in precision of as much as 70%, which far exceeds competing methods. By operating only on marker genes, Lemur is a comparatively lightweight software. We demonstrate that it can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs - a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. The reference is available at https://zenodo.org/records/10802546, and the software is open-source and available at https://github.com/treangenlab/lemur.
Keyphrases
  • single molecule
  • single cell
  • rna seq
  • machine learning
  • genome wide
  • high resolution
  • emergency department
  • gene expression
  • healthcare
  • dna methylation
  • data analysis
  • genome wide analysis
  • wastewater treatment
  • adverse drug