Login / Signup

Virus-host interactions predictor (VHIP): Machine learning approach to resolve microbial virus-host interaction networks.

G Eric BastienRachel N CableCecelia BatterbeeA J WingLuis ZamanMelissa B Duhaime
Published in: PLoS computational biology (2024)
Viruses of microbes are ubiquitous biological entities that reprogram their hosts' metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only "who is there?" we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding "who do they infect?" Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, 'Virus Host Range network' (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.
Keyphrases
  • machine learning
  • healthcare
  • sars cov
  • small molecule
  • gene expression
  • mental health
  • big data
  • climate change
  • wastewater treatment
  • artificial intelligence
  • body composition
  • disease virus