SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models.
Yupeng WangRosario B Jaime-LaraAbhrarup RoyYing SunXinyue LiuPaule V JosephPublished in: BMC research notes (2021)
We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of "strong enhancer" chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL .
Keyphrases
- deep learning
- convolutional neural network
- transcription factor
- artificial intelligence
- binding protein
- machine learning
- neural network
- single cell
- circulating tumor
- cell therapy
- cell free
- single molecule
- amino acid
- gene expression
- dna damage
- mesenchymal stem cells
- genetic diversity
- circulating tumor cells
- nucleic acid