Deep Evolutionary Forecasting identifies highly-mutated SARS-CoV-2 variants via functional sequence-landscape enumeration.
Mireia Solà ColomJelena VucinicJared Adolf-BryfogleJames W BowmanSébastien VerelIsabelle MoczygembaThomas SchiexDavid SimonciniChristopher D BahlPublished in: Research square (2022)
Host-pathogen interactions drive an evolutionary game of cat-and-mouse between a pathogen's protein virulence factors, the host's adaptive immune system, and therapeutics targeting the pathogen. There is an urgent need for treatments and prophylactics that remain effective as a pathogen evolves, and the ability to predict pathogen evolution is a longstanding challenge. Therefore, a common strategy has been to target conserved epitopes, but strong selective pressures can drive pathogens to evolve resistance nonetheless. Here, we report a novel, generally-applicable approach called Deep Evolutionary Forecasting that predicts protein evolution using artificial intelligence and molecular modeling. The first step is to perform a complete enumeration of the functional sequence landscape in silico for a target protein. Then, we construct a graph where the edges between sequence variants are weighted by evolutionary probability. Protein evolution is forecasted by traversing this graph. We chose the SARS-CoV-2 receptor binding domain (RBD) as a model system because highly-mutated viral variants have continued to emerge that escape available therapeutics and vaccines. The RBD variants that we forecasted carry up to 11 concurrent amino acid substitutions at the host receptor binding site. Pseudoviruses harboring forecasted RBDs are active and escape binding and neutralization by FDA-approved monoclonal antibody therapeutics. We identified bottlenecks in the evolutionary landscape of SARS-CoV-2 that are promising targets for therapeutics that preempt evolution.
Keyphrases
- sars cov
- amino acid
- genome wide
- artificial intelligence
- copy number
- small molecule
- binding protein
- candida albicans
- protein protein
- respiratory syndrome coronavirus
- monoclonal antibody
- machine learning
- escherichia coli
- single cell
- gene expression
- biofilm formation
- pseudomonas aeruginosa
- molecular docking
- magnetic resonance
- transcription factor
- circulating tumor cells
- deep learning
- antimicrobial resistance
- dna binding
- computed tomography
- coronavirus disease
- contrast enhanced
- molecular dynamics simulations