A fine-scale map of genome-wide recombination in divergent Escherichia coli population.
Yu KangLina YuanXing ShiYanan ChuZilong HeXinmiao JiaQiang LinQin MaJian WangJingfa XiaoSongnian HuZhancheng GaoFei ChenJun YuPublished in: Briefings in bioinformatics (2021)
Recombination is one of the most important molecular mechanisms of prokaryotic genome evolution, but its exact roles are still in debate. Here we try to infer genome-wide recombination within a species, utilizing a dataset of 149 complete genomes of Escherichia coli from diverse animal hosts and geographic origins, including 45 in-house sequenced with the single-molecular real-time platform. Two major clades identified based on physiological, clinical and ecological characteristics form distinct genetic lineages based on scarcity of interclade gene exchanges. By defining gene-based syntenies for genomic segments within and between the two clades, we build a fine-scale recombination map for this representative global E. coli population. The map suggests extensive within-clade recombination that often breaks physical linkages among individual genes but seldom interrupts the structure of genome organizational frameworks as well as primary metabolic portfolios supported by the framework integrity, possibly due to strong natural selection for both physiological compatibility and ecological fitness. In contrast, the between-clade recombination declines drastically when phylogenetic distance increases to the extent where a 10-fold reduction can be observed, establishing a firm genetic barrier between clades. Our empirical data suggest a critical role for such recombination events in the early stage of speciation where recombination rate is associated with phylogenetic distance in addition to sequence and gene variations. The extensive intraclade recombination binds sister strains into a quasisexual group and optimizes genes or alleles to streamline physiological activities, whereas the sharply declined interclade recombination split the population into clades adaptive to divergent ecological niches.
Keyphrases
- genome wide
- dna repair
- dna damage
- escherichia coli
- dna methylation
- copy number
- early stage
- physical activity
- climate change
- genome wide identification
- air pollution
- mental health
- oxidative stress
- risk assessment
- pseudomonas aeruginosa
- squamous cell carcinoma
- magnetic resonance imaging
- rectal cancer
- big data
- klebsiella pneumoniae
- artificial intelligence
- electronic health record