Login / Signup

RNA splicing analysis using heterogeneous and large RNA-seq datasets.

Jorge Vaquero-GarciaJoseph K AicherSan JewellMatthew R GazzaraCaleb M RadensAnupama JhaScott S NortonNicholas F LahensGregory R GrantYoseph Barash
Published in: Nature communications (2023)
The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Using both large scale synthetic data and GTEx v8 as benchmark datasets, we assess the advantages of MAJIQ v2 compared to existing methods. We then apply MAJIQ v2 package to analyze differential splicing across 2,335 samples from 13 brain subregions, demonstrating its ability to offer insights into brain subregion-specific splicing regulation.
Keyphrases
  • rna seq
  • single cell
  • white matter
  • electronic health record
  • machine learning
  • resting state
  • big data
  • deep learning
  • multiple sclerosis
  • gene expression
  • copy number
  • cerebral ischemia
  • label free
  • real time pcr