Minimal gene set discovery in single-cell mRNA-seq datasets with ActiveSVM.
Xiaoqiao ChenSisi ChenMatt ThomsonPublished in: Nature computational science (2022)
Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted single-cell mRNA-sequencing reduces sequencing costs by profiling reduced gene sets that capture biological information with a minimal number of genes. Here we introduce an active learning method that identifies minimal but highly informative gene sets that enable the identification of cell types, physiological states and genetic perturbations in single-cell data using a small number of genes. Our active feature selection procedure generates minimal gene sets from single-cell data by employing an active support vector machine (ActiveSVM) classifier. We demonstrate that ActiveSVM feature selection identifies gene sets that enable ~90% cell-type classification accuracy across, for example, cell atlas and disease-characterization datasets. The discovery of small but highly informative gene sets should enable reductions in the number of measurements necessary for application of single-cell mRNA-seq to clinical tests, therapeutic discovery and genetic screens.
Keyphrases
- single cell
- genome wide
- rna seq
- high throughput
- copy number
- genome wide identification
- dna methylation
- small molecule
- deep learning
- machine learning
- genome wide analysis
- stem cells
- healthcare
- transcription factor
- minimally invasive
- electronic health record
- health information
- social media
- artificial intelligence
- bioinformatics analysis
- mesenchymal stem cells