Login / Signup

The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.

Jan LausePhilipp BerensDmitry Kobak
Published in: bioRxiv : the preprint server for biology (2024)
A recent paper in PLOS Computational Biology (Chari and Pachter, 2023) claimed that t -SNE and UMAP embeddings of single-cell datasets fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t -SNE and UMAP embeddings of single-cell data do not represent high-dimensional distances, they can nevertheless provide biologically relevant information.
Keyphrases
  • single cell
  • rna seq
  • electronic health record
  • high throughput
  • big data
  • physical activity
  • healthcare
  • gene expression
  • dna methylation
  • hiv infected
  • data analysis
  • artificial intelligence
  • deep learning