Biologically informed NeuralODEs for genome-wide regulatory dynamics.
Intekhab HossainViola FanfaniJohn QuackenbushRebekka BurkholzPublished in: bioRxiv : the preprint server for biology (2023)
Models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the causal gene-regulatory network (GRN) governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impedes scalability and/or explainability. To overcome these limitations, we developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that can flexibly incorporate prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of ODEs. We test accuracy of PHOENIX in a series of in silico experiments benchmarking it against several currently used tools for ODE estimation. We also demonstrate PHOENIX’s flexibility by studying oscillating expression data from synchronized yeast cells and assess its scalability by modelling genome-scale breast cancer expression for samples ordered in pseudotime. Finally, we show how the combination of user-defined prior knowledge and functional forms from systems biology allows PHOENIX to encode key properties of the underlying GRN, and subsequently predict expression patterns in a biologically explainable way. PHOENIX will be available as open source at: https://github.com/QuackenbushLab/phoenix .
Keyphrases
- gene expression
- genome wide
- poor prognosis
- dna methylation
- healthcare
- randomized controlled trial
- binding protein
- induced apoptosis
- big data
- long non coding rna
- machine learning
- copy number
- molecular dynamics simulations
- neural network
- data analysis
- cell proliferation
- genome wide identification
- endoplasmic reticulum stress