Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.
Gabriel E HoffmanDonghoon LeeJaroslav BendlPrashant FnuAram HongClara CaseyMarcela AlviaZhiping ShaoStathis ArgyriouKaren TherrienSanan VenkateshGeorgios VoloudakisVahram HaroutunianJohn F FullardPanagiotis RoussosPublished in: Research square (2023)
Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.
Keyphrases
- single cell
- rna seq
- big data
- high throughput
- electronic health record
- endothelial cells
- induced apoptosis
- genome wide
- case control
- working memory
- cell cycle arrest
- magnetic resonance
- magnetic resonance imaging
- cognitive decline
- stem cells
- cell proliferation
- computed tomography
- randomized controlled trial
- artificial intelligence
- machine learning
- oxidative stress
- bioinformatics analysis
- dna methylation
- transcription factor
- bone marrow
- mild cognitive impairment