Login / Signup

Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.

Eric WeinePeter CarbonettoMatthew Stephens
Published in: bioRxiv : the preprint server for biology (2024)
Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca.
Keyphrases
  • single cell
  • rna seq
  • electronic health record
  • high throughput
  • big data
  • machine learning
  • primary care
  • deep learning
  • randomized controlled trial
  • data analysis