Controlling gene expression with deep generative design of regulatory DNA.
Jan ZrimecXiaozhi FuAzam Sheikh MuhammadChristos SkrekasVykintas JauniskisNora K SpeicherChristoph S BörlinVilhelm VerendelMorteza Haghir ChehreghaniDevdatt DubhashiVerena SiewersFlorian DavidJens B NielsenAleksej ZelezniakPublished in: Nature communications (2022)
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue.