Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.
Haynes HeatonArthur M TalmanAndrew J KnightsMaria ImazDaniel J GaffneyRichard DurbinMartin HembergMara N K LawniczakPublished in: Nature methods (2020)
Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
Keyphrases
- single cell
- rna seq
- air pollution
- induced apoptosis
- high throughput
- particulate matter
- loop mediated isothermal amplification
- cell cycle arrest
- real time pcr
- label free
- gene expression
- transcription factor
- electronic health record
- endoplasmic reticulum stress
- climate change
- big data
- drinking water
- cell proliferation
- oxidative stress
- nucleic acid
- heat shock
- stem cells
- data analysis
- cell therapy
- machine learning