Login / Signup

Unsupervisedly Prompting AlphaFold2 for Accurate Few-Shot Protein Structure Prediction.

Jun ZhangSirui LiuMengyun ChenHaotian ChuMin WangZidong WangJialiang YuNingxi NiFan YuDechin ChenYi Isaac YangBoxin XueLijiang YangYuan LiuYi Qin Gao
Published in: Journal of chemical theory and computation (2023)
Data-driven predictive methods that can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining an accurate folding landscape using coevolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit coevolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologues. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in the low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method that could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.
Keyphrases
  • amino acid
  • protein protein
  • high throughput
  • high resolution
  • healthcare
  • binding protein
  • machine learning
  • small molecule
  • mass spectrometry
  • genetic diversity
  • data analysis
  • neural network