A novel taxon selection method, aimed at minimizing recombination, clarifies the discovery of a new sub-population of Helicobacter pylori from Australia.
Binit LamichhaneMichael J WiseEng Guan ChuaBarry J MarshallAlfred Chin-Yen TayPublished in: Evolutionary applications (2019)
We present a novel method for taxon selection, the aim being to minimize problems arising from highly recombinant species such as Helicobacter pylori. Helicobacter pylori has accompanied modern-human migration out of Africa and is marked by a phylogeographic strain distribution, which has been exploited to add an extra layer of information about human migrations to that obtained from human sources. However, H. pylori's genome has high sequence heterogeneity combined with a very high rate of recombination, causing major allelic diversification across strains. On the other hand, recombination events that have become preserved in sub-populations are a useful source of phylogenetic information. This creates a potential problem in selecting representative strains for particular genetic or phylogeographic clusters and generally ameliorating the impact on analyses of extensive low-level recombination. To address this issue, we perform multiple population structure-based analyses on core genomes to select exemplar strains, called 'quintessents', which exhibit limited recombination. In essence, quintessent strains are representative of their specific phylogenetic clades and can be used to refine the current MLST concatenation-based population structure classification system. The use of quintessents reduces the noise due to local recombination events, while preserving recombination events that have become fixed in sub-populations. We illustrate the method with an analysis of core genome concatenations from 185 H. pylori strains, which reveals a recent speciation event resulting from the recombination of strains from phylogeographic clade hpSahul, carried by Aboriginal Australians, and hpEurope, carried by some of the people who arrived in Australia over the past 200 years. The signal is much clearer when based on quintessent strains, but absent from the analysis based on MLST concatenations.
Keyphrases
- helicobacter pylori
- dna repair
- dna damage
- escherichia coli
- helicobacter pylori infection
- endothelial cells
- healthcare
- induced pluripotent stem cells
- oxidative stress
- small molecule
- genome wide
- single cell
- pluripotent stem cells
- drinking water
- risk assessment
- copy number
- dna methylation
- climate change
- health information
- data analysis