Interpreting structure in sequence count data with differential expression analysis allowing for grades of membership.
Peter CarbonettoKaixuan LuoAbhishek K SarkarAnthony HungKarl TayebSebastian PottMatthew StephensPublished in: bioRxiv : the preprint server for biology (2023)
"Parts-based" representations of data, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual "parts" remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing the cells to have partial membership in multiple groups (or topics). We call this new approach grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.