Serovar-level identification of bacterial foodborne pathogens from full-length 16S rRNA gene sequencing.
Dmitry GrinevichLyndy HardenSiddhartha ThakurBenjamin J CallahanPublished in: mSystems (2024)
The resolution of variation within species is critical for interpreting and acting on many microbial measurements. In the key foodborne pathogens Salmonella and Escherichia coli , the primary subspecies classification scheme used is serotyping: differentiating variants within these species by surface antigen profiles. Serotype prediction from whole-genome sequencing (WGS) of isolates is now seen as comparable or preferable to traditional laboratory methods where WGS is available. However, laboratory and WGS methods depend on an isolation step that is time-consuming and incompletely represents the sample when multiple strains are present. Community sequencing approaches that skip the isolation step are, therefore, of interest for pathogen surveillance. Here, we evaluated the viability of amplicon sequencing of the full-length 16S rRNA gene for serotyping Salmonella enterica and E. coli . We developed a novel algorithm for serotype prediction, implemented as an R package (Seroplacer), which takes as input full-length 16S rRNA gene sequences and outputs serovar predictions after phylogenetic placement into a reference phylogeny. We achieved over 89% accuracy in predicting Salmonella serotypes on in silico test data and identified key pathogenic serovars of Salmonella and E. coli in isolate and environmental test samples. Although serotype prediction from 16S rRNA gene sequences is not as accurate as serotype prediction from WGS of isolates, the potential to identify dangerous serovars directly from amplicon sequencing of environmental samples is intriguing for pathogen surveillance. The capabilities developed here are also broadly relevant to other applications where intraspecies variation and direct sequencing from environmental samples could be valuable.IMPORTANCEIn order to prevent and stop outbreaks of foodborne pathogens, it is important that we can detect when pathogenic bacteria are present in a food or food-associated site and identify connections between specific pathogenic bacteria present in different samples. In this work, we develop a new computational technology that allows the important foodborne pathogens Escherichia coli and Salmonella enterica to be serotyped (a subspecies level classification) from sequencing of a single-marker gene, and the 16S rRNA gene often used to surveil bacterial communities. Our results suggest current limitations to serotyping from 16S rRNA gene sequencing alone but set the stage for further progress that we consider likely given the rapid advance in the long-read sequencing technologies and genomic databases our work leverages. If this research direction succeeds, it could enable better detection of foodborne pathogens before they reach the public and speed the resolution of foodborne pathogen outbreaks.
Keyphrases
- escherichia coli
- copy number
- single cell
- genome wide
- genome wide identification
- klebsiella pneumoniae
- human health
- machine learning
- healthcare
- public health
- mental health
- antimicrobial resistance
- dna methylation
- multidrug resistant
- listeria monocytogenes
- magnetic resonance
- computed tomography
- biofilm formation
- loop mediated isothermal amplification
- pseudomonas aeruginosa
- transcription factor
- mass spectrometry
- artificial intelligence
- cystic fibrosis
- quantum dots
- big data
- disease virus
- high throughput sequencing