The Distance Between: An Algorithmic Approach to Comparing Stochastic Models to Time-Series Data.
Brock D SherlockMarko A A BoonMaria VlasiouAdelle C F CosterPublished in: Bulletin of mathematical biology (2024)
While mean-field models of cellular operations have identified dominant processes at the macroscopic scale, stochastic models may provide further insight into mechanisms at the molecular scale. In order to identify plausible stochastic models, quantitative comparisons between the models and the experimental data are required. The data for these systems have small sample sizes and time-evolving distributions. The aim of this study is to identify appropriate distance metrics for the quantitative comparison of stochastic model outputs and time-evolving stochastic measurements of a system. We identify distance metrics with features suitable for driving parameter inference, model comparison, and model validation, constrained by data from multiple experimental protocols. In this study, stochastic model outputs are compared to synthetic data across three scales: that of the data at the points the system is sampled during the time course of each type of experiment; a combined distance across the time course of each experiment; and a combined distance across all the experiments. Two broad categories of comparators at each point were considered, based on the empirical cumulative distribution function (ECDF) of the data and of the model outputs: discrete based measures such as the Kolmogorov-Smirnov distance, and integrated measures such as the Wasserstein-1 distance between the ECDFs. It was found that the discrete based measures were highly sensitive to parameter changes near the synthetic data parameters, but were largely insensitive otherwise, whereas the integrated distances had smoother transitions as the parameters approached the true values. The integrated measures were also found to be robust to noise added to the synthetic data, replicating experimental error. The characteristics of the identified distances provides the basis for the design of an algorithm suitable for fitting stochastic models to real world stochastic data.