scLEGA: an attention-based deep clustering method with a tendency for low expression of genes on single-cell RNA-seq data.
Zhenze LiuYingjian LiangGuohua WangTianjiao ZhangPublished in: Briefings in bioinformatics (2024)
Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.
Keyphrases
- single cell
- rna seq
- poor prognosis
- high throughput
- genome wide
- binding protein
- electronic health record
- gene expression
- working memory
- big data
- computed tomography
- induced apoptosis
- convolutional neural network
- magnetic resonance imaging
- oxidative stress
- machine learning
- magnetic resonance
- stem cells
- deep learning
- single molecule
- bone marrow
- genome wide identification
- artificial intelligence
- cell cycle arrest
- endoplasmic reticulum stress
- pi k akt
- neural network