Login / Signup

Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis.

Abhishek K SarkarMatthew Stephens
Published in: Nature genetics (2021)
The high proportion of zeros in typical single-cell RNA sequencing datasets has led to widespread but inconsistent use of terminology such as dropout and missing data. Here, we argue that much of this terminology is unhelpful and confusing, and outline simple ideas to help to reduce confusion. These include: (1) observed single-cell RNA sequencing counts reflect both true gene expression levels and measurement error, and carefully distinguishing between these contributions helps to clarify thinking; and (2) method development should start with a Poisson measurement model, rather than more complex models, because it is simple and generally consistent with existing data. We outline how several existing methods can be viewed within this framework and highlight how these methods differ in their assumptions about expression variation. We also illustrate how our perspective helps to address questions of biological interest, such as whether messenger RNA expression levels are multimodal among cells.
Keyphrases
  • single cell
  • rna seq
  • poor prognosis
  • gene expression
  • high throughput
  • electronic health record
  • binding protein
  • induced apoptosis
  • dna methylation
  • big data
  • long non coding rna
  • cell cycle arrest
  • signaling pathway