Login / Signup

Whole genome-based comparative analysis of the genus Streptomyces reveals many misclassifications.

Marieke MispelaereAnne-Sofie De RopCedric HermansSofie L De MaeseneireWim K SoetaertMaarten Lieven De MolPaco Hulpiau
Published in: Applied microbiology and biotechnology (2024)
Streptomyces species are experts in the production of bioactive secondary metabolites; however, their taxonomy has fallen victim of the tremendous interest shown by the scientific community, evident in the discovery of numerous synonymous in public repositories. Based on genomic data from NCBI Datasets and nomenclature from the LPSN database, we compiled a dataset of 600 Streptomyces species along with their annotations and metadata. To pinpoint the most suitable taxonomic classification method, we conducted a comprehensive assessment comparing multiple methodologies, including analysis of 16S rRNA, individual housekeeping genes, multilocus sequence analysis (MLSA), and Fast Average Nucleotide Identity (FastANI) on a subset of 409 species with complete data. Due to insufficient resolution of 16S rRNA and inconsistency observed in individual housekeeping genes, we performed a more in-depth analysis, comparing only FastANI and MLSA, which expanded our dataset to include 502 species. With FastANI validated as the preferred method, we conducted pairwise analysis on the entire dataset identifying 59 non-unique species among the 600, and subsequently refined the dataset to 541 unique species. Additionally, we collected data on 724 uncharacterized Streptomyces strains to investigate the uniqueness potential of the unannotated fraction of the Streptomyces genus. Utilizing FastANI, 289 strains could be successfully classified into one of the 541 Streptomyces species. KEY POINTS: • Evaluation of taxonomic classification methods for Streptomyces species. • Whole genome analysis, specifically FastANI, has been chosen as preferred method. • Various reclassifications are proposed within the Streptomyces genus.
Keyphrases
  • genetic diversity
  • machine learning
  • healthcare
  • escherichia coli
  • electronic health record
  • big data
  • small molecule
  • ms ms
  • gene expression
  • climate change
  • transcription factor
  • data analysis
  • amino acid