Critical evaluation of web-based DNA N6-methyladenine site prediction tools.
Md Mehedi HasanWatshara ShoombuatongHiroyuki KurataBalachandran ManavalanPublished in: Briefings in functional genomics (2021)
Methylation of DNA N6-methyladenosine (6mA) is a type of epigenetic modification that plays pivotal roles in various biological processes. The accurate genome-wide identification of 6mA is a challenging task that leads to understanding the biological functions. For the last 5 years, a number of bioinformatics approaches and tools for 6mA site prediction have been established, and some of them are easily accessible as web application. Nevertheless, the accurate genome-wide identification of 6mA is still one of the challenging works that lead to understanding the biological functions. Especially in practical applications, these tools have implemented diverse encoding schemes, machine learning algorithms and feature selection methods, whereas few systematic performance comparisons of 6mA site predictors have been reported. In this review, 11 publicly available 6mA predictors evaluated with seven different species-specific datasets (Arabidopsis thaliana, Tolypocladium, Diospyros lotus, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli). Of those, few species are close homologs, and the remaining datasets are distant sequences. Our independent, validation tests demonstrated that Meta-i6mA and MM-6mAPred models for A. thaliana, Tolypocladium, S. cerevisiae and D. melanogaster achieved excellent overall performance when compared with their counterparts. However, none of the existing methods were suitable for E. coli, C. elegans and D. lotus. A feasibility of the existing predictors is also discussed for the seven species. Our evaluation provides useful guidelines for the development of 6mA site predictors and helps biologists selecting suitable prediction tools.
Keyphrases
- machine learning
- escherichia coli
- genome wide identification
- saccharomyces cerevisiae
- arabidopsis thaliana
- drosophila melanogaster
- dna methylation
- transcription factor
- deep learning
- single molecule
- gene expression
- lymph node
- artificial intelligence
- rna seq
- genome wide
- multidrug resistant
- single cell
- pseudomonas aeruginosa
- biofilm formation
- nucleic acid