Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet.
Nicolae SapovalYunxi LiuKristen D CurryBryce KilleWenyu HuangNatalie KokrokoMichael G NuteAlona TyshaievaAlexander T DiltheyErin K MolloyTodd J TreangenPublished in: bioRxiv : the preprint server for biology (2024)
The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets. Lemur is a marker-gene-based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet is a whole-genome read mapping based method that provides detailed presence and absence calls for bacterial genomes. We demonstrate that Lemur and Magnet can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs, a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. Lemur and Magnet are open-source and available at https://github.com/treangenlab/lemur and https://github.com/treangenlab/magnet.