Login / Signup

Deciphering Phage-Host Specificity Based on the Association of Phage Depolymerases and Bacterial Surface Glycan with Deep Learning.

Yiyan YangKeith Dufault-ThompsonWei YanTian CaiLei XieXiaofang Jiang
Published in: bioRxiv : the preprint server for biology (2023)
Phage tailspike proteins are depolymerases that target diverse bacterial surface glycans with high specificity, determining the host-specificity of numerous phages. To address the challenge of identifying tailspike proteins due to their sequence diversity, we developed SpikeHunter, an approach based on the ESM-2 protein language model. Using SpikeHunter, we successfully identified 231,965 tailspike proteins from a dataset comprising 8,434,494 prophages found within 165,365 genomes of five common pathogens. Among these proteins, 143,035 tailspike proteins displayed strong associations with serotypes. Moreover, we observed highly similar tailspike proteins in species that share closely related serotypes. We found extensive domain swapping in all five species, with the C-terminal domain being significantly associated with host serotype highlighting its role in host range determination. Our study presents a comprehensive cross-species analysis of tailspike protein to serotype associations, providing insights applicable to phage therapy and biotechnology.
Keyphrases
  • pseudomonas aeruginosa
  • deep learning
  • dengue virus
  • stem cells
  • escherichia coli
  • autism spectrum disorder
  • bone marrow
  • binding protein
  • mass spectrometry
  • artificial intelligence
  • cell therapy
  • molecularly imprinted