Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding.
Jinxin XieShanshan RuanMingyan TuZhen YuanJianguo HuHongLin LiShiliang LiPublished in: Oncogene (2024)
Single-cell transcriptome sequencing (scRNA-seq) is a high-throughput technique used to study gene expression at the single-cell level. Clustering analysis is a commonly used method in scRNA-seq data analysis, helping researchers identify cell types and uncover interactions between cells. However, the choice of a robust similarity metric in the clustering procedure is still an open challenge due to the complex underlying structures of the data and the inherent noise in data acquisition. Here, we propose a deep clustering method for scRNA-seq data called scRISE (scRNA-seq Iterative Smoothing and self-supervised discriminative Embedding model) to resolve this challenge. The model consists of two main modules: an iterative smoothing module based on graph autoencoders designed to denoise the data and refine the pairwise similarity in turn to gradually incorporate cell structural features and enrich the data information; and a self-supervised discriminative embedding module with adaptive similarity threshold for partitioning samples into correct clusters. Our approach has shown improved quality of data representation and clustering on seventeen scRNA-seq datasets against a number of state-of-the-art deep learning clustering methods. Furthermore, utilizing the scRISE method in biological analysis against the HNSCC dataset has unveiled 62 informative genes, highlighting their potential roles as therapeutic targets and biomarkers.
Keyphrases
- single cell
- rna seq
- high throughput
- data analysis
- electronic health record
- gene expression
- big data
- machine learning
- deep learning
- dna methylation
- healthcare
- magnetic resonance imaging
- stem cells
- minimally invasive
- endoplasmic reticulum stress
- computed tomography
- quality improvement
- pi k akt
- bone marrow
- cell proliferation
- convolutional neural network
- transcription factor
- network analysis
- social media
- human health