Species-level resolution for the vaginal microbiota with short amplicons.
Wei QingYiya ShiRongdan ChenYin'ai ZouCancan QiYingxuan ZhangZuyi ZhouShanshan LiYi HouHong-Wei ZhouMuxuan ChenPublished in: mSystems (2024)
Specific bacterial species have been found to play important roles in human vagina. Achieving high species-level resolution is vital for analyzing vaginal microbiota data. However, contradictory conclusions were yielded from different methodological studies. More comprehensive evaluation is needed for determining an optimal pipeline for vaginal microbiota. Based on the sequences of vaginal bacterial species downloaded from NCBI, we conducted simulated amplification with various primer sets targeting different 16S regions as well as taxonomic classification on the amplicons applying different combinations of algorithms (BLAST+, VSEARCH, and Sklearn) and reference databases (Greengenes2, SILVA, and RDP). Vaginal swabs were collected from participants with different vaginal microecology to construct 16S full-length sequenced mock communities. Both computational and experimental amplifications were performed on the mock samples. Classification accuracy of each pipeline was determined. Microbial profiles were compared between the full-length and partial 16S sequencing samples. The optimal pipeline was further validated in a multicenter cohort against the PCR results of common STI pathogens. Pipeline V1-V3_Sklearn_Combined had the highest accuracy for classifying the amplicons generated from both the NCBI downloaded data (84.20% ± 2.39%) and the full-length sequencing data (95.65% ± 3.04%). Vaginal samples amplified and sequenced targeting the V1-V3 region but merely employing the forward reads (223 bp) and classified using the optimal pipeline, resembled the mock communities the most. The pipeline demonstrated high F1-scores for detecting STI pathogens within the validation cohort. We have determined an optimal pipeline to achieve high species-level resolution for vaginal microbiota with short amplicons, which will facilitate future studies.IMPORTANCEFor vaginal microbiota studies, diverse 16S rRNA gene regions were applied for amplification and sequencing, which affect the comparability between different studies as well as the species-level resolution of taxonomic classification. We conducted comprehensive evaluation on the methods which influence the accuracy for the taxonomic classification and established an optimal pipeline to achieve high species-level resolution for vaginal microbiota with short amplicons, which will facilitate future studies.
Keyphrases
- double blind
- machine learning
- deep learning
- big data
- case control
- electronic health record
- single molecule
- endothelial cells
- single cell
- current status
- men who have sex with men
- microbial community
- gram negative
- cancer therapy
- transcription factor
- multidrug resistant
- induced pluripotent stem cells
- antimicrobial resistance