An effective deep learning-based approach for splice site identification in gene expression.
Mohsin AliDilawar ShahShahid QaziIzaz Ahmad KhanMohammad AbrarSana ZahirPublished in: Science progress (2024)
A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.
Keyphrases
- deep learning
- neural network
- convolutional neural network
- gene expression
- machine learning
- bioinformatics analysis
- artificial intelligence
- dna methylation
- genome wide
- high resolution
- binding protein
- coronary artery disease
- amino acid
- oxidative stress
- loop mediated isothermal amplification
- genome wide identification
- working memory
- rna seq
- protein protein
- small molecule
- dna damage
- high density
- transcription factor
- energy transfer
- genome wide analysis
- sensitive detection