FEED: a feature selection method based on gene expression decomposition for single cell clustering.
Chao ZhangZhi-Wei DuanYun-Pei XuJin LiuHong-Dong LiPublished in: Briefings in bioinformatics (2023)
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Keyphrases
- single cell
- rna seq
- genome wide
- genome wide identification
- gene expression
- dna methylation
- machine learning
- genome wide analysis
- bioinformatics analysis
- poor prognosis
- copy number
- deep learning
- high throughput
- transcription factor
- electronic health record
- neural network
- long non coding rna
- healthcare
- artificial intelligence
- big data