Differential abundance testing on single-cell data using k-nearest neighbor graphs.
Emma DannNeil C HendersonSarah A TeichmannMichael D MorganJohn C MarioniPublished in: Nature biotechnology (2021)
Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR .
Keyphrases
- single cell
- rna seq
- high throughput
- induced apoptosis
- antibiotic resistance genes
- electronic health record
- cell cycle arrest
- genome wide
- big data
- endothelial cells
- small molecule
- data analysis
- endoplasmic reticulum stress
- depressive symptoms
- cell proliferation
- molecular dynamics
- cell death
- stem cells
- machine learning
- oxidative stress
- signaling pathway
- wastewater treatment
- microbial community
- anaerobic digestion
- bone marrow
- convolutional neural network
- induced pluripotent stem cells
- pluripotent stem cells