Supervised discovery of interpretable gene programs from single-cell data.
Russell Z KunesThomas WalleMax LandTal NawyDana Pe'erPublished in: Nature biotechnology (2023)
Factor analysis decomposes single-cell gene expression data into a minimal set of gene programs that correspond to processes executed by cells in a sample. However, matrix factorization methods are prone to technical artifacts and poor factor interpretability. We address these concerns with Spectra, an algorithm that combines user-provided gene programs with the detection of novel programs that together best explain expression covariation. Spectra incorporates existing gene sets and cell-type labels as prior biological information, explicitly models cell type and represents input gene sets as a gene-gene knowledge graph using a penalty function to guide factorization toward the input graph. We show that Spectra outperforms existing approaches in challenging tumor immune contexts, as it finds factors that change under immune checkpoint therapy, disentangles the highly correlated features of CD8 + T cell tumor reactivity and exhaustion, finds a program that explains continuous macrophage state changes under therapy and identifies cell-type-specific immune metabolic programs.
Keyphrases
- genome wide
- copy number
- gene expression
- single cell
- genome wide identification
- public health
- healthcare
- dna methylation
- machine learning
- rna seq
- magnetic resonance
- adipose tissue
- electronic health record
- computed tomography
- high throughput
- magnetic resonance imaging
- cell death
- genome wide analysis
- density functional theory
- big data
- deep learning
- quality improvement
- quantum dots
- long non coding rna
- bone marrow
- artificial intelligence
- smoking cessation
- molecular dynamics
- cell cycle arrest
- replacement therapy