MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis.
Barry SlaffCaleb M RadensPaul JewellAnupama JhaNicholas F LahensGregory R GrantAndrei Thomas-TikhonenkoKristen W LynchYoseph BarashPublished in: Nature communications (2021)
The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN's effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.
Keyphrases
- rna seq
- single cell
- high throughput
- gene expression
- healthcare
- randomized controlled trial
- poor prognosis
- magnetic resonance
- dna methylation
- systematic review
- mental health
- copy number
- magnetic resonance imaging
- genome wide
- computed tomography
- big data
- drinking water
- machine learning
- genome wide identification
- electronic health record
- artificial intelligence
- deep learning
- binding protein
- solid state