Login / Signup

Evolutionary-scale prediction of atomic-level protein structure with a language model.

Zeming LinHalil AkinRoshan RaoBrian L HieZhongkai ZhuWenting LuNikita SmetaninRobert VerkuilOri KabeliYaniv ShmueliAllan Dos Santos CostaMaryam Fazel-ZarandiTom SercuSalvatore CandidoAlexander Rives
Published in: Science (New York, N.Y.) (2023)
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.
Keyphrases
  • high resolution
  • amino acid
  • machine learning
  • autism spectrum disorder
  • binding protein
  • healthcare
  • gene expression
  • antibiotic resistance genes
  • microbial community