Unsupervised evolution of protein and antibody complexes with a structure-informed language model.
Varun R ShankerTheodora U J BruunBrian L HiePeter S KimPublished in: Science (New York, N.Y.) (2024)
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.
Keyphrases
- amino acid
- respiratory syndrome coronavirus
- protein protein
- autism spectrum disorder
- sars cov
- machine learning
- binding protein
- healthcare
- coronavirus disease
- high resolution
- depressive symptoms
- gene expression
- mass spectrometry
- dna methylation
- deep learning
- resistance training
- health information
- virtual reality
- genome wide