Binomial models uncover biological variation during feature selection of droplet-based single-cell RNA sequencing.
Breanne SpartaTimothy HamiltonGunalan NatesanSamuel D AragonesEric J DeedsPublished in: PLoS computational biology (2024)
Effective analysis of single-cell RNA sequencing (scRNA-seq) data requires a rigorous distinction between technical noise and biological variation. In this work, we propose a simple feature selection model, termed "Differentially Distributed Genes" or DDGs, where a binomial sampling process for each mRNA species produces a null model of technical variation. Using scRNA-seq data where cell identities have been established a priori, we find that the DDG model of biological variation outperforms existing methods. We demonstrate that DDGs distinguish a validated set of real biologically varying genes, minimize neighborhood distortion, and enable accurate partitioning of cells into their established cell-type groups.
Keyphrases
- single cell
- rna seq
- high throughput
- genome wide
- machine learning
- induced apoptosis
- electronic health record
- big data
- air pollution
- deep learning
- physical activity
- high resolution
- gene expression
- neural network
- bioinformatics analysis
- cell death
- dna methylation
- mass spectrometry
- cell therapy
- artificial intelligence
- bone marrow
- oxidative stress