Login / Signup

BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities.

Brendan A DaisleyGregor Reid
Published in: mSystems (2021)
High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to "bxid" placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species-two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors.IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on "in-house" reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study.
Keyphrases