Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering.
Jaclyn M NoshayTyler WalkerWilliam G AlexanderDawn M KlingemanJonathon RomeroAngelica M WalkerErica PratesCarrie EckertStephan IrleDavid KainerDaniel A JacobsonPublished in: Nucleic acids research (2023)
CRISPR-Cas9 tools have transformed genetic manipulation capabilities in the laboratory. Empirical rules-of-thumb have been developed for only a narrow range of model organisms, and mechanistic underpinnings for sgRNA efficiency remain poorly understood. This work establishes a novel feature set and new public resource, produced with quantum chemical tensors, for interpreting and predicting sgRNA efficiency. Feature engineering for sgRNA efficiency is performed using an explainable-artificial intelligence model: iterative Random Forest (iRF). By encoding quantitative attributes of position-specific sequences for Escherichia coli sgRNAs, we identify important traits for sgRNA design in bacterial species. Additionally, we show that expanding positional encoding to quantum descriptors of base-pair, dimer, trimer, and tetramer sequences captures intricate interactions in local and neighboring nucleotides of the target DNA. These features highlight variation in CRISPR-Cas9 sgRNA dynamics between E. coli and H. sapiens genomes. These novel encodings of sgRNAs enhance our understanding of the elaborate quantum biological processes involved in CRISPR-Cas9 machinery.
Keyphrases
- crispr cas
- artificial intelligence
- genome editing
- machine learning
- deep learning
- escherichia coli
- molecular dynamics
- big data
- genome wide
- energy transfer
- climate change
- dendritic cells
- mental health
- emergency department
- neural network
- immune response
- gene expression
- cell free
- pseudomonas aeruginosa
- single molecule
- magnetic resonance
- staphylococcus aureus
- cystic fibrosis
- multidrug resistant
- adverse drug