eSVD-DE: Cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings.
Kevin Z LinYixuan QiuKathryn RoederPublished in: bioRxiv : the preprint server for biology (2023)
Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. In this paper, we develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. We benchmark our method's accuracy and power increase compared to other DE methods typically repurposed for analyzing cohort-wide differential expression based on simulated data. We then use the eSVD-DE to study the cohort-wide differential expression in idiopathic pulmonary fibrosis, varying severity of ulcerative colitis, and autism spectrum disorder. Altogether, eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction.
Keyphrases
- single cell
- rna seq
- genome wide
- idiopathic pulmonary fibrosis
- electronic health record
- autism spectrum disorder
- bioinformatics analysis
- genome wide identification
- high throughput
- big data
- endothelial cells
- emergency department
- gene expression
- genome wide analysis
- data analysis
- health information
- transcription factor
- patient safety
- attention deficit hyperactivity disorder
- drug induced
- adverse drug