Login / Signup

Fast model-free standardization and integration of single-cell transcriptomics data.

Yang XuRafael KramannRachel Patton McCordSikander Hayat
Published in: Research square (2023)
Single-cell transcriptomics datasets from the same anatomical sites generated by different research labs are becoming increasingly common. However, fast and computationally inexpensive tools for standardization of cell-type annotation and data integration are still needed in order to increase research inclusivity. To standardize cell-type annotation and integrate single-cell transcriptomics datasets, we have built a fast model-free integration method, named MASI (Marker-Assisted Standardization and Integration). MASI first identifies putative cell-type markers from reference data through an ensemble approach. Then, it converts gene expression matrix to cell-type score matrix with the identified putative cell-type markers for the purpose of cell-type annotation and data integration. Because of integration through cell-type markers instead of model inference, MASI can annotate approximately one million cells on a personal laptop, which provides a cheap computational alternative for the single-cell community. We benchmark MASI with other well-established methods and demonstrate that MASI outperforms other methods based on speed. Its performance for both tasks of data integration and cell-type annotation are comparable or even superior to these existing methods. To harness knowledge from single-cell atlases, we demonstrate three case studies that cover integration across biological conditions, surveyed participants, and research groups, respectively.
Keyphrases
  • single cell
  • rna seq
  • high throughput
  • electronic health record
  • gene expression
  • big data
  • healthcare
  • mental health
  • induced apoptosis
  • oxidative stress
  • deep learning
  • working memory