Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions.
Shujun HeBaizhen GaoRushant SabnisQing SunPublished in: ACS synthetic biology (2023)
Much work has been done to apply machine learning and deep learning to genomics tasks, but these applications usually require extensive domain knowledge, and the resulting models provide very limited interpretability. Here, we present the Nucleic Transformer, a conceptually simple but effective and interpretable model architecture that excels in the classification of DNA sequences. The Nucleic Transformer employs self-attention and convolutions on nucleic acid sequences, leveraging two prominent deep learning strategies commonly used in computer vision and natural language analysis. We demonstrate that the Nucleic Transformer can be trained without much domain knowledge to achieve high performance in Escherichia coli promoter classification, viral genome identification, enhancer classification, and chromatin profile predictions.
Keyphrases
- deep learning
- nucleic acid
- machine learning
- working memory
- artificial intelligence
- escherichia coli
- convolutional neural network
- transcription factor
- healthcare
- circulating tumor
- gene expression
- single molecule
- cell free
- genome wide
- dna methylation
- sars cov
- dna damage
- big data
- autism spectrum disorder
- single cell
- genetic diversity
- resistance training
- pseudomonas aeruginosa
- klebsiella pneumoniae
- biofilm formation