Inferring strain-level mutational drivers of phage-bacteria interaction phenotypes.
Adriana Lucia-SanzShengyun PengChung Yin Joey LeungAnimesh GuptaJustin R MeyerJoshua S WeitzPublished in: bioRxiv : the preprint server for biology (2024)
The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary -and largely uncharacterized- genetics of adsorption, injection, and cell take-over. Here we present a machine learning (ML) approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions amongst 51 Escherichia coli strains and 45 phage λ strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without a priori knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. The most effective ML approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, predicting phage host range with 86% mean classification accuracy while reducing the relative error in the estimated strength of the infection phenotype by 40%. Further, transparent feature selection in the predictive model revealed 18 of 176 phage λ and 6 of 18 E. coli mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage λ infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. While the genetic variation studied was limited to a focal, coevolved phage-bacteria system, the method's success at recapitulating strain-level infection outcomes provides a path forward towards developing strategies for inferring interactions in non-model systems, including those of therapeutic significance.
Keyphrases
- pseudomonas aeruginosa
- escherichia coli
- machine learning
- single cell
- healthcare
- cystic fibrosis
- deep learning
- biofilm formation
- genome wide
- high resolution
- artificial intelligence
- mass spectrometry
- staphylococcus aureus
- climate change
- risk assessment
- skeletal muscle
- insulin resistance
- ultrasound guided
- genetic diversity