Pangenome-spanning epistasis and coselection analysis via de Bruijn graphs.
Juri KuronenSamuel T HorsfieldAnna Kaarina PöntinenSudaraka MallawaarachchiSergio Arredondo-AlonsoHarry A ThorpeRebecca A GladstoneRob J L WillemsStephen D BentleyNicholas J CroucherJohan PensarJohn A LeesGerry Q Tonkin-HillJukka CoranderPublished in: Genome research (2024)
Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens Streptococcus pneumoniae and Enterococcus faecalis .
Keyphrases
- genome wide
- escherichia coli
- endothelial cells
- dna methylation
- pseudomonas aeruginosa
- magnetic resonance
- antimicrobial resistance
- copy number
- computed tomography
- emergency department
- magnetic resonance imaging
- single cell
- gene expression
- biofilm formation
- convolutional neural network
- hiv infected
- deep learning
- antiretroviral therapy
- human immunodeficiency virus
- electronic health record