Comprehensive single-cell RNA-seq analysis using deep interpretable generative modeling guided by biological hierarchy knowledge.
He-Gang ChenYuyin LuZhiming DaiYuedong YangQing LiYanghui RaoPublished in: Briefings in bioinformatics (2024)
Recent advances in microfluidics and sequencing technologies allow researchers to explore cellular heterogeneity at single-cell resolution. In recent years, deep learning frameworks, such as generative models, have brought great changes to the analysis of transcriptomic data. Nevertheless, relying on the potential space of these generative models alone is insufficient to generate biological explanations. In addition, most of the previous work based on generative models is limited to shallow neural networks with one to three layers of latent variables, which may limit the capabilities of the models. Here, we propose a deep interpretable generative model called d-scIGM for single-cell data analysis. d-scIGM combines sawtooth connectivity techniques and residual networks, thereby constructing a deep generative framework. In addition, d-scIGM incorporates hierarchical prior knowledge of biological domains to enhance the interpretability of the model. We show that d-scIGM achieves excellent performance in a variety of fundamental tasks, including clustering, visualization, and pseudo-temporal inference. Through topic pathway studies, we found that d-scIGM-learned topics are better enriched for biologically meaningful pathways compared to the baseline models. Furthermore, the analysis of drug response data shows that d-scIGM can capture drug response patterns in large-scale experiments, which provides a promising way to elucidate the underlying biological mechanisms. Lastly, in the melanoma dataset, d-scIGM accurately identified different cell types and revealed multiple melanin-related driver genes and key pathways, which are critical for understanding disease mechanisms and drug development.
Keyphrases
- single cell
- rna seq
- data analysis
- high throughput
- deep learning
- healthcare
- neural network
- machine learning
- electronic health record
- big data
- risk assessment
- gene expression
- white matter
- multiple sclerosis
- artificial intelligence
- mesenchymal stem cells
- dna methylation
- bone marrow
- bioinformatics analysis
- genome wide analysis