Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods.

Tanglong Yuan Nana Yan Tianyi Fei Jitan ZhengJuan MengNana Li Jing Liu Haihang Zhang Long Xie Wenqin Ying Di Li Lei ShiYongsen SunYongyao Li Yi-Xue Li Yi-Di Sun Erwei Zuo

Published in: Nature communications (2021)

Efficient and precise base editors (BEs) for C-to-G transversion are highly desirable. However, the sequence context affecting editing outcome largely remains unclear. Here we report engineered C-to-G BEs of high efficiency and fidelity, with the sequence context predictable via machine-learning methods. By changing the species origin and relative position of uracil-DNA glycosylase and deaminase, together with codon optimization, we obtain optimized C-to-G BEs (OPTI-CGBEs) for efficient C-to-G transversion. The motif preference of OPTI-CGBEs for editing 100 endogenous sites is determined in HEK293T cells. Using a sgRNA library comprising 41,388 sequences, we develop a deep-learning model that accurately predicts the OPTI-CGBE editing outcome for targeted sites with specific sequence context. These OPTI-CGBEs are further shown to be capable of efficient base editing in mouse embryos for generating Tyr-edited offspring. Thus, these engineered CGBEs are useful for efficient and precise base editing, with outcome predictable based on sequence context of targeted sites.

Keyphrases