How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner.
John VollmersSandra WiegandFlorian LenkAnne-Kristin KasterPublished in: Nucleic acids research (2022)
As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.
Keyphrases
- microbial community
- single cell
- rna seq
- genome wide
- healthcare
- adverse drug
- machine learning
- primary care
- risk assessment
- mental health
- emergency department
- endothelial cells
- deep learning
- high throughput
- gene expression
- big data
- heavy metals
- electronic health record
- loop mediated isothermal amplification
- multidrug resistant
- gram negative
- quantum dots
- life cycle