Imputing abundance of over 2500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.
Ruoqiao ChenJiayu ZhouBin ChenPublished in: bioRxiv : the preprint server for biology (2024)
Cell surface proteins serve as primary drug targets and cell identity markers. The emergence of techniques like CITE-seq has enabled simultaneous quantification of surface protein abundance and transcript expression for multimodal data analysis within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance based solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability for these computational approaches across diverse contexts, such as different tissues or disease states, impede their widespread adoption. Here we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA-seq), a context-agnostic zero-shot deep ensemble model, which enables the large-scale prediction of cell surface protein abundance and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer.
Keyphrases
- single cell
- rna seq
- cell surface
- high throughput
- data analysis
- antibiotic resistance genes
- binding protein
- machine learning
- poor prognosis
- protein protein
- gene expression
- amino acid
- induced apoptosis
- electronic health record
- big data
- emergency department
- long non coding rna
- chronic pain
- oxidative stress
- artificial intelligence
- drug induced
- cell cycle arrest
- bioinformatics analysis