Login / Signup

Towards estimating the number of strains that make up a natural bacterial population.

Tomeu ViverRoth E ConradLuis M Rodriguez-RAna Sofia RamírezStephanus N VenterJairo Rocha-CárdenasMercè LlabrésRudolf I AmannKonstantinos T KonstantinidisRamon Rosselló-Móra
Published in: Nature communications (2024)
What a strain is and how many strains make up a natural bacterial population remain elusive concepts despite their apparent importance for assessing the role of intra-population diversity in disease emergence or response to environmental perturbations. To advance these concepts, we sequenced 138 randomly selected Salinibacter ruber isolates from two solar salterns and assessed these genomes against companion short-read metagenomes from the same samples. The distribution of genome-aggregate average nucleotide identity (ANI) values among these isolates revealed a bimodal distribution, with four-fold lower occurrence of values between 99.2% and 99.8% relative to ANI >99.8% or <99.2%, revealing a natural "gap" in the sequence space within species. Accordingly, we used this ANI gap to define genomovars and a higher ANI value of >99.99% and shared gene-content >99.0% to define strains. Using these thresholds and extrapolating from how many metagenomic reads each genomovar uniquely recruited, we estimated that -although our 138 isolates represented about 80% of the Sal. ruber population- the total population in one saltern pond is composed of 5,500 to 11,000 genomovars, the great majority of which appear to be rare in-situ. These data also revealed that the most frequently recovered isolate in lab media was often not the most abundant genomovar in-situ, suggesting that cultivation biases are significant, even in cases that cultivation procedures are thought to be robust. The methodology and ANI thresholds outlined here should represent a useful guide for future microdiversity surveys of additional microbial species.
Keyphrases
  • escherichia coli
  • risk assessment
  • magnetic resonance
  • computed tomography
  • genome wide
  • machine learning
  • artificial intelligence
  • deep learning
  • electronic health record
  • copy number