Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods.
Tanglong YuanNana YanTianyi FeiJitan ZhengJuan MengNana LiJing LiuHaihang ZhangLong XieWenqin YingDi LiLei ShiYongsen SunYongyao LiYi-Xue LiYi-Di SunErwei ZuoPublished in: Nature communications (2021)
Efficient and precise base editors (BEs) for C-to-G transversion are highly desirable. However, the sequence context affecting editing outcome largely remains unclear. Here we report engineered C-to-G BEs of high efficiency and fidelity, with the sequence context predictable via machine-learning methods. By changing the species origin and relative position of uracil-DNA glycosylase and deaminase, together with codon optimization, we obtain optimized C-to-G BEs (OPTI-CGBEs) for efficient C-to-G transversion. The motif preference of OPTI-CGBEs for editing 100 endogenous sites is determined in HEK293T cells. Using a sgRNA library comprising 41,388 sequences, we develop a deep-learning model that accurately predicts the OPTI-CGBE editing outcome for targeted sites with specific sequence context. These OPTI-CGBEs are further shown to be capable of efficient base editing in mouse embryos for generating Tyr-edited offspring. Thus, these engineered CGBEs are useful for efficient and precise base editing, with outcome predictable based on sequence context of targeted sites.