Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data.
Joanna E HandzlikSpyros TastsoglouIoannis S VlachosArtemis G HatzigeorgiouPublished in: Scientific reports (2020)
Small non-coding RNAs (sncRNAs) play important roles in health and disease. Next Generation Sequencing (NGS) technologies are considered as the most powerful and versatile methodologies to explore small RNA (sRNA) transcriptomes in diverse experimental and clinical studies. Small RNA-Seq (sRNA-Seq) data analysis proved to be challenging due to non-unique genomic origin, short length, and abundant post-transcriptional modifications of sRNA species. Here, we present Manatee, an algorithm for the quantification of sRNA classes and the detection of novel expressed non-coding loci. Manatee combines prior annotation of sRNAs with reliable alignment density information and extensive rescue of usually neglected multimapped reads to provide accurate transcriptome-wide sRNA expression quantification. Comparison of Manatee against state-of-the-art implementations using real and simulated data demonstrates its high accuracy across diverse sRNA classes. Manatee also goes beyond common pipelines by identifying and quantifying expression from unannotated loci and microRNA isoforms (isomiRs). It is user-friendly, can be easily incorporated in pipelines, and provides a simplified output suitable for direct usage in downstream analyses and functional studies.
Keyphrases
- rna seq
- single cell
- data analysis
- genome wide
- copy number
- poor prognosis
- public health
- electronic health record
- healthcare
- gene expression
- machine learning
- loop mediated isothermal amplification
- high resolution
- real time pcr
- dna methylation
- mental health
- health information
- deep learning
- binding protein
- genome wide association study
- transcription factor
- circulating tumor
- climate change
- long non coding rna
- social media
- health promotion
- human health
- heat stress