Tailored machine learning models for functional RNA detection in genome-wide screens.
Christopher KlapprothSiegfried ZötzscheFelix KühnlJörg FallmannPeter F StadlerSven FindeißPublished in: NAR genomics and bioinformatics (2023)
The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.
Keyphrases
- genome wide
- machine learning
- dna methylation
- drosophila melanogaster
- single cell
- big data
- molecular docking
- copy number
- high throughput
- artificial intelligence
- data analysis
- cell therapy
- quality improvement
- electronic health record
- gene expression
- working memory
- deep learning
- loop mediated isothermal amplification
- smoking cessation
- mesenchymal stem cells
- stem cells
- protein protein
- label free
- amino acid
- genome wide association study
- bone marrow
- genome wide association