Annotated Whole-Genome Multilocus Sequence Typing Schema for Scalable High-Resolution Typing of Streptococcus pyogenes.
A FriãesR MamedeM FerreiraJ Melo-CristinoMario RamirezPublished in: Journal of clinical microbiology (2022)
Streptococcus pyogenes is a major human pathogen with high genetic diversity, largely created by recombination and horizontal gene transfer, making it difficult to use single nucleotide polymorphism (SNP)-based genome-wide analyses for surveillance. Using a gene-by-gene approach on 208 complete genomes of S. pyogenes, a novel whole-genome multilocus sequence typing (wgMLST) schema was developed, comprising 3,044 target loci. The schema was used for core-genome MLST (cgMLST) analyses of previously published data sets and 265 newly sequenced draft genomes with other molecular and phenotypic typing data. Clustering based on cgMLST data supported the genetic heterogeneity of many emm types and correlated poorly with pulsed-field gel electrophoresis macrorestriction profiling, superantigen gene profiling, and MLST sequence type, highlighting the limitations of older typing methods. While 763 loci were present in all isolates of a data set representative of S. pyogenes genetic diversity, the proposed schema allows scalable cgMLST analysis, which can include more loci for an increased resolution when typing closely related isolates. The cgMLST and PopPUNK clusters were broadly consistent in this diverse population. The cgMLST analyses presented results comparable to those of SNP-based methods in the identification of two recently emerged sublineages of emm 1 and emm 89 and the clarification of the genetic relatedness among isolates recovered in outbreak contexts. The schema was thoroughly annotated and made publicly available on the chewie-NS online platform (https://chewbbaca.online/species/1/schemas/1), providing a framework for high-resolution typing and analyzing the genetic variability of loci of particular biological interest.
Keyphrases
- genetic diversity
- genome wide
- dna methylation
- copy number
- high resolution
- electronic health record
- single cell
- big data
- candida albicans
- endothelial cells
- gene expression
- health information
- physical activity
- biofilm formation
- social media
- randomized controlled trial
- systematic review
- data analysis
- machine learning
- high throughput
- genome wide identification
- middle aged
- amino acid
- pseudomonas aeruginosa
- genome wide association study
- bioinformatics analysis