Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and cell assignments, while at the same time provide information on how and when the process is regulated. We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions. We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction.
Keyphrases
- single cell
- rna seq
- genome wide
- transcription factor
- poor prognosis
- health information
- high throughput
- healthcare
- electronic health record
- big data
- induced apoptosis
- gene expression
- stem cells
- binding protein
- oxidative stress
- machine learning
- cell therapy
- copy number
- cell cycle arrest
- deep learning
- signaling pathway
- data analysis
- bone marrow
- neural network