A unified model for interpretable latent embedding of multi-sample, multi-condition single-cell data.
Ariel MadrigalTianyuan LuLarisa M SotoHamed S NajafabadiPublished in: Nature communications (2024)
Single-cell analysis across multiple samples and conditions requires quantitative modeling of the interplay between the continuum of cell states and the technical and biological sources of sample-to-sample variability. We introduce GEDI, a generative model that identifies latent space variations in multi-sample, multi-condition single-cell datasets and attributes them to sample-level covariates. GEDI enables cross-sample cell state mapping on par with state-of-the-art integration methods, cluster-free differential gene expression analysis along the continuum of cell states, and machine learning-based prediction of sample characteristics from single-cell data. GEDI can also incorporate gene-level prior knowledge to infer pathway and regulatory network activities in single cells. Finally, GEDI extends all these concepts to previously unexplored modalities that require joint consideration of dual measurements, such as the joint analysis of exon inclusion/exclusion reads to model alternative cassette exon splicing, or spliced/unspliced reads to model the mRNA stability landscapes of single cells.
Keyphrases
- single cell
- rna seq
- high throughput
- machine learning
- induced apoptosis
- genome wide
- healthcare
- cell cycle arrest
- big data
- copy number
- gene expression
- transcription factor
- electronic health record
- endoplasmic reticulum stress
- artificial intelligence
- cell death
- oxidative stress
- dna methylation
- bone marrow
- drinking water
- data analysis
- network analysis