scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer.
Jing XuDe-Shuang HuangXiujun ZhangPublished in: Advanced science (Weinheim, Baden-Wurttemberg, Germany) (2024)
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Keyphrases
- single cell
- rna seq
- high throughput
- electronic health record
- data analysis
- big data
- mass spectrometry
- coronavirus disease
- oxidative stress
- gene expression
- machine learning
- deep learning
- induced apoptosis
- endoplasmic reticulum stress
- pain management
- dna methylation
- health information
- cell death
- artificial intelligence
- cell proliferation