UClncR: Ultrafast and comprehensive long non-coding RNA detection from RNA-seq.
Zhifu SunAsha NairXianfeng ChenNaresh ProdduturiJunwen WangJean-Pierre KocherPublished in: Scientific reports (2017)
Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data. Herein we describe UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR's accuracy and their relevance to sample clinical phenotypes. UClncR would facilitate researchers' novel lncRNA discovery significantly and is publically available at http://bioinformaticstools.mayo.edu/research/UClncR .
Keyphrases
- rna seq
- long non coding rna
- single cell
- poor prognosis
- genome wide identification
- genome wide analysis
- high throughput
- transcription factor
- electronic health record
- network analysis
- genome wide
- machine learning
- gene expression
- small molecule
- loop mediated isothermal amplification
- long noncoding rna
- dna methylation
- quantum dots
- artificial intelligence
- bioinformatics analysis