Login / Signup

sccomp: Robust differential composition and variability analysis for single-cell data.

Stefano MangiolaAlexandra J Roth-SchulzeMarie TrussartEnrique Zozaya-ValdésMengyao MaZijie GaoAlan F RubinTerence P SpeedHeejung ShimAnthony T Papenfuss
Published in: Proceedings of the National Academy of Sciences of the United States of America (2023)
Cellular omics such as single-cell genomics, proteomics, and microbiomics allow the characterization of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to revealing markers of disease progression, such as cancer and pathogen infection. A dedicated statistical method for differential variability analysis is lacking for cellular omics data, and existing methods for differential composition analysis do not model some compositional data properties, suggesting there is room to improve model performance. Here, we introduce sccomp, a method for differential composition and variability analyses that jointly models data count distribution, compositionality, group-specific variability, and proportion mean-variability association, being aware of outliers. sccomp provides a comprehensive analysis framework that offers realistic data simulation and cross-study knowledge transfer. Here, we demonstrate that mean-variability association is ubiquitous across technologies, highlighting the inadequacy of the very popular Dirichlet-multinomial distribution. We show that sccomp accurately fits experimental data, significantly improving performance over state-of-the-art algorithms. Using sccomp, we identified differential constraints and composition in the microenvironment of primary breast cancer.
Keyphrases
  • single cell
  • microbial community
  • electronic health record
  • big data
  • rna seq
  • machine learning
  • healthcare
  • squamous cell carcinoma
  • stem cells
  • young adults
  • wastewater treatment
  • lymph node metastasis
  • anaerobic digestion