Database size positively correlates with the loss of species-level taxonomic resolution for the 16S rRNA and other prokaryotic marker genes.
Seth CommichauxTu LuanHarihara Subrahmaniam MuralidharanMihai PopPublished in: bioRxiv : the preprint server for biology (2023)
The use of reference databases for assigning taxonomic labels to genomic and metagenomic sequences is a fundamental bioinformatic task in the characterization of microbial communities. The increasing accessibility of high throughput sequencing has led to a rapid increase in the size and number of sequences in databases. This has been beneficial for improving our understanding of the global microbial genetic diversity. However, there is evidence that as the microbial diversity is more densely sampled, increasingly longer genomic segments are needed to differentiate between distinct species. The scientific community needs to be aware of this issue and needs to develop methods that better account for it when assigning taxonomic labels to metagenomic sequences from microbial communities.
Keyphrases
- genetic diversity
- high throughput sequencing
- microbial community
- antibiotic resistance genes
- copy number
- big data
- healthcare
- mental health
- genome wide
- single molecule
- emergency department
- machine learning
- gene expression
- artificial intelligence
- dna methylation
- deep learning
- loop mediated isothermal amplification
- genome wide analysis
- electronic health record
- anaerobic digestion