Login / Signup

Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder.

Xiaoshu ZhuJian LiYongchang LinLiquan ZhaoJianxin WangXiaoqing Peng
Published in: Journal of computational biology : a journal of computational molecular cell biology (2022)
ABSTRACT Single-cell RNA sequencing (scRNA-seq) can present cellular heterogeneity at higher resolution when measuring the gene expression in an individual cell. However, there are still some computational problems in scRNA-seq data, including high dimensionality, high sparseness, and high noise. To solve them, dimensionality reduction is essential as it reduces dimensions and also removes most of the zeros and noise. Therefore, we propose a hybrid dimensionality reduction algorithm for scRNA-seq data by integrating binning-based entropy and a denoising autoencoder, named ScEDA. In ScEDA, a novel binning-based entropy estimation method is performed to select efficient genes, while removing noise. For each gene, binning-based entropy is designed to describe the differences in its expression across all cells, that is, the distribution of expression of each gene in all cells. Genes are regarded as inefficient and removed when they achieve low binning-based entropy. Moreover, by combining Kullback-Leibler (KL) divergence with the autoencoder, the objective function is reconstructed to maximize the similarity in distribution between input data and reconstructed data. Furthermore, by adding Poisson-distributed noise to the original input data, the denoising autoencoder is used to improve robustness. Compared with three other clustering methods, ScEDA provides superior average performance on 16 real scRNA-seq datasets, with obvious enhancement in large-scale datasets.
Keyphrases