Login / Signup

Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning.

Tobias VornholtMojmír MutnýGregor W SchmidtChristian SchellhaasRyo TachibanaSven PankeThomas R WardAndreas KrauseMarkus Jeschek
Published in: ACS central science (2024)
Tailored enzymes are crucial for the transition to a sustainable bioeconomy. However, enzyme engineering is laborious and failure-prone due to its reliance on serendipity. The efficiency and success rates of engineering campaigns may be improved by applying machine learning to map the sequence-activity landscape based on small experimental data sets. Yet, it often proves challenging to reliably model large sequence spaces while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning, which we applied to engineer an artificial metalloenzyme (ArM) catalyzing a new-to-nature hydroamination reaction. Combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds. Critical characteristics of our pipeline include the cost-effective generation of information-rich data sets, the integration of an explorative round to improve the model's performance, and the inclusion of experimental noise. Our approach led to an order-of-magnitude boost in the hit rate while making efficient use of experimental resources. Search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.
Keyphrases
  • machine learning
  • big data
  • electronic health record
  • artificial intelligence
  • single cell
  • amino acid
  • high resolution
  • air pollution
  • dna methylation
  • mass spectrometry
  • cell free