Attention-based deep clustering method for scRNA-seq cell type identification.
Shenghao LiHui GuoSimai ZhangYizhou LiMenglong LiPublished in: PLoS computational biology (2023)
Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes.
Keyphrases
- single cell
- rna seq
- working memory
- high throughput
- deep learning
- induced apoptosis
- convolutional neural network
- healthcare
- genome wide
- machine learning
- cell death
- health information
- dna methylation
- electronic health record
- gene expression
- magnetic resonance
- bone marrow
- single molecule
- cell cycle arrest
- mass spectrometry
- signaling pathway
- oxidative stress
- cell proliferation
- image quality
- pi k akt
- data analysis