SUSCC: Secondary Construction of Feature Space based on UMAP for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.
Hai-Yun WangJian-Ping ZhaoChun-Hou ZhengPublished in: Interdisciplinary sciences, computational life sciences (2021)
Clustering is a common method to identify cell types in single cell analysis, but the increasing size of scRNA-seq datasets brings challenges to single cell clustering. Therefore, it is an urgent need to design a faster and more accurate clustering method for large-scale scRNA-seq data. In this paper, we proposed a new method for single cell clustering. First, a count matrix is constructed through normalization and gene filtration. Second, the raw data of gene expression matrix are projected to feature space constructed by secondary construction of feature space based on UMAP (Uniform Manifold Approximation and Projection). Third, the low-dimensional matrix on the feature space is randomly divided into two sub-matrices according to a certain proportion for clustering and classifying, respectively. Finally, one subset is clustered by k-means algorithm and then the other subset is classified by k-nearest neighbor algorithm based on clustering results. Experimental results show that our method can cluster the scRNA-seq datasets effectively.
Keyphrases
- single cell
- rna seq
- machine learning
- deep learning
- high throughput
- gene expression
- big data
- electronic health record
- neural network
- dna methylation
- high resolution
- mesenchymal stem cells
- genome wide
- stem cells
- data analysis
- bone marrow
- magnetic resonance imaging
- mass spectrometry
- computed tomography
- high density
- copy number
- cell therapy