A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4-Methylcytosine Using Deep Learning Approach.
Rajib Kumar HalderMohammed Nasir UddinMd Ashraf UddinSunil AryalMd Aminul IslamFahima HossainNusrat JahanAnsam KhraisatAmmar AlazabPublished in: Genes (2023)
DNA (Deoxyribonucleic Acid) N4-methylcytosine (4mC), a kind of epigenetic modification of DNA, is important for modifying gene functions, such as protein interactions, conformation, and stability in DNA, as well as for the control of gene expression throughout cell development and genomic imprinting. This simply plays a crucial role in the restriction-modification system. To further understand the function and regulation mechanism of 4mC, it is essential to precisely locate the 4mC site and detect its chromosomal distribution. This research aims to design an efficient and high-throughput discriminative intelligent computational system using the natural language processing method "word2vec" and a multi-configured 1D convolution neural network (1D CNN) to predict 4mC sites. In this article, we propose a grid search-based multi-layer dynamic ensemble system (GS-MLDS) that can enhance existing knowledge of each level. Each layer uses a grid search-based weight searching approach to find the optimal accuracy while minimizing computation time and additional layers. We have used eight publicly available benchmark datasets collected from different sources to test the proposed model's efficiency. Accuracy results in test operations were obtained as follows: 0.978, 0.954, 0.944, 0.961, 0.950, 0.973, 0.948, 0.952, 0.961, and 0.980. The proposed model has also been compared to 16 distinct models, indicating that it can accurately predict 4mC.
Keyphrases
- neural network
- circulating tumor
- gene expression
- cell free
- single molecule
- high throughput
- convolutional neural network
- deep learning
- dna methylation
- copy number
- single cell
- nucleic acid
- body mass index
- healthcare
- mesenchymal stem cells
- physical activity
- cell therapy
- machine learning
- drinking water
- genome wide
- body weight
- amino acid
- bone marrow