Prediction of CRISPR sgRNA Activity Using a Deep Convolutional Neural Network.
Li XueBin TangWei ChenJiesi LuoPublished in: Journal of chemical information and modeling (2018)
The CRISPR-Cas9 system derived from adaptive immunity in bacteria and archaea has been developed into a powerful tool for genome engineering with wide-ranging applications. Optimizing single-guide RNA (sgRNA) design to improve efficiency of target cleavage is a key step for successful gene editing using the CRISPR-Cas9 system. Because not all sgRNAs that cognate to a given target gene are equally effective, computational tools have been developed based on experimental data to increase the likelihood of selecting effective sgRNAs. Despite considerable efforts to date, it still remains a big challenge to accurately predict functional sgRNAs directly from large-scale sequence data. We propose DeepCas9, a deep-learning framework based on the convolutional neural network (CNN), to automatically learn the sequence determinants and further enable the identification of functional sgRNAs for the CRISPR-Cas9 system. We show that the CNN method outperforms previous methods in both (i) the ability to correctly identify highly active sgRNAs in experiments not used in the training and (ii) the ability to accurately predict the target efficacies of sgRNAs in different organisms. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known nucleotide preferences. We finally demonstrate the application of our method to the design of next-generation genome-scale CRISPRi and CRISPRa libraries targeting human and mouse genomes. We expect that DeepCas9 will assist in reducing the numbers of sgRNAs that must be experimentally validated to enable more effective and efficient genetic screens and genome engineering. DeepCas9 can be freely accessed via the Internet at https://github.com/lje00006/DeepCas9 .
Keyphrases
- convolutional neural network
- crispr cas
- deep learning
- genome wide
- genome editing
- dna methylation
- big data
- copy number
- electronic health record
- machine learning
- endothelial cells
- high throughput
- gene expression
- cancer therapy
- social media
- induced pluripotent stem cells
- heat shock
- amino acid
- gram negative
- drug delivery
- decision making
- transcription factor
- data analysis
- health information
- multidrug resistant
- dna binding