Rapid and accurate taxonomic classification of cpn60 amplicon sequence variants.
Qingyi RenJanet E HillPublished in: ISME communications (2023)
The "universal target" region of the gene encoding the 60 kDa chaperonin protein (cpn60, also known as groEL or hsp60) is a proven sequence barcode for bacteria and a useful target for marker gene amplicon-based studies of complex microbial communities. To date, identification of cpn60 sequence variants from microbiome studies has been accomplished by alignment of queries to a reference database. Naïve Bayesian classifiers offer an alternative identification method that provides variable rank classification and shorter analysis times. We curated a set of cpn60 barcode sequences to train the RDP classifier and tested its performance on data from previous human microbiome studies. Results showed that sequences accounting for 79%, 86% and 92% of the observations (read counts) in saliva, vagina and infant stool microbiome data sets were classified to the species rank. We also trained the QIIME 2 q2-feature-classifier on cpn60 sequence data and demonstrated that it gives results consistent with the standalone RDP classifier. Successful implementation of a naïve Bayesian classifier for cpn60 sequences will facilitate future microbiome studies and open opportunities to integrate cpn60 amplicon sequence identification into existing analysis pipelines.
Keyphrases
- copy number
- machine learning
- case control
- deep learning
- electronic health record
- amino acid
- big data
- endothelial cells
- primary care
- heat shock protein
- healthcare
- bioinformatics analysis
- emergency department
- minimally invasive
- high resolution
- current status
- artificial intelligence
- mass spectrometry
- genetic diversity
- resistance training
- peripheral blood
- data analysis
- body composition
- heat shock
- single molecule
- transcription factor
- genome wide identification
- high speed
- pluripotent stem cells
- sensitive detection
- drug induced