SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples.
Ze ZhangDanni LuoXue ZhongJin Huk ChoiYuanqing MaStacy WangElena MahrtWei GuoEric W StawiskiZora ModrusanSomasekar SeshagiriPayal KapurGary C HonJames BrugarolasTao WangPublished in: Genes (2019)
Advances in single-cell RNA sequencing (scRNA-Seq) have allowed for comprehensive analyses of single cell data. However, current analyses of scRNA-Seq data usually start from unsupervised clustering or visualization. These methods ignore prior knowledge of transcriptomes and the probable structures of the data. Moreover, cell identification heavily relies on subjective and possibly inaccurate human inspection afterwards. To address these analytical challenges, we developed SCINA (Semi-supervised Category Identification and Assignment), a semi-supervised model that exploits previously established gene signatures using an expectation-maximization (EM) algorithm. SCINA is applicable to scRNA-Seq and flow cytometry/CyTOF data, as well as other data of similar format. We applied SCINA to a wide range of datasets, and showed its accuracy, stability and efficiency, which exceeded most popular unsupervised approaches. SCINA discovered an intermediate stage of oligodendrocytes from mouse brain scRNA-Seq data. SCINA also detected immune cell population changes in cytometry data in a genetically-engineered mouse model. Furthermore, SCINA performed well with bulk gene expression data. Specifically, we identified a new kidney tumor clade with similarity to FH-deficient tumors (FHD), which we refer to as FHD-like tumors (FHDL). Overall, SCINA provides both methodological advances and biological insights from perspectives different from traditional analytical methods.
Keyphrases
- single cell
- rna seq
- machine learning
- electronic health record
- big data
- gene expression
- genome wide
- high throughput
- mouse model
- flow cytometry
- dna methylation
- stem cells
- healthcare
- mesenchymal stem cells
- induced apoptosis
- data analysis
- oxidative stress
- depressive symptoms
- transcription factor
- bone marrow
- mass spectrometry
- cell therapy
- endoplasmic reticulum stress
- artificial intelligence
- wild type