Exploring functional conservation in silico: a new machine learning approach to RNA-editing.
Michał Zawisza-ÁlvarezJesús Peñuela-MeleroEsteban VegasFerran ReverterJordi Garcia-FernàndezCarlos Herrera-ÚbedaPublished in: Briefings in bioinformatics (2024)
Around 50 years ago, molecular biology opened the path to understand changes in forms, adaptations, complexity, or the basis of human diseases through myriads of reports on gene birth, gene duplication, gene expression regulation, and splicing regulation, among other relevant mechanisms behind gene function. Here, with the advent of big data and artificial intelligence (AI), we focus on an elusive and intriguing mechanism of gene function regulation, RNA editing, in which a single nucleotide from an RNA molecule is changed, with a remarkable impact in the increase of the complexity of the transcriptome and proteome. We present a new generation approach to assess the functional conservation of the RNA-editing targeting mechanism using two AI learning algorithms, random forest (RF) and bidirectional long short-term memory (biLSTM) neural networks with an attention layer. These algorithms, combined with RNA-editing data coming from databases and variant calling from same-individual RNA and DNA-seq experiments from different species, allowed us to predict RNA-editing events using both primary sequence and secondary structure. Then, we devised a method for assessing conservation or divergence in the molecular mechanisms of editing completely in silico: the cross-testing analysis. This novel method not only helps to understand the conservation of the editing mechanism through evolution but could set the basis for achieving a better understanding of the adenosine-targeting mechanism in other fields.
Keyphrases
- artificial intelligence
- crispr cas
- big data
- machine learning
- deep learning
- genome wide
- gene expression
- nucleic acid
- copy number
- neural network
- working memory
- genome wide identification
- rna seq
- pregnant women
- cancer therapy
- emergency department
- drug delivery
- single molecule
- data analysis
- circulating tumor
- induced pluripotent stem cells
- adverse drug
- protein kinase
- pregnancy outcomes
- circulating tumor cells