Understanding the impact of numerical solvers on inference for differential equation models

Richard Creswell; Katherine M Shepherd; Ben Lambert; Gary R Mirams; Chon Lok Lei; Simon Tavener; Martin Robinson; David J Gavaghan

doi:10.1098/rsif.2023.0369

. 2024 Mar 6;21(212):20230369. doi: 10.1098/rsif.2023.0369

Understanding the impact of numerical solvers on inference for differential equation models

Richard Creswell ^1,^✉, Katherine M Shepherd ¹, Ben Lambert ², Gary R Mirams ³, Chon Lok Lei ⁴, Simon Tavener ⁵, Martin Robinson ¹, David J Gavaghan ¹

PMCID: PMC10914510 PMID: 38442857

Abstract

Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers that seem sufficiently accurate for the forward problem, i.e. for obtaining an accurate simulation, might not be sufficiently accurate for the inverse problem, i.e. for inferring the model parameters from data. We show that for both fixed step and adaptive step ODE solvers, solving the forward problem with insufficient accuracy can distort likelihood surfaces, which might become jagged, causing inference algorithms to get stuck in local ‘phantom’ optima. We demonstrate that biases in inference arising from numerical approximation of ODEs are potentially most severe in systems involving low noise and rapid nonlinear dynamics. We reanalyse an ODE change point model previously fit to the COVID-19 outbreak in Germany and show the effect of the step size on simulation and inference results. We then fit a more complicated rainfall run-off model to hydrological data and illustrate the importance of tuning solver tolerances to avoid distorted likelihood surfaces. Our results indicate that, when performing inference for ODE model parameters, adaptive step size solver tolerances must be set cautiously and likelihood surfaces should be inspected for characteristic signs of numerical issues.

Keywords: ordinary differential equations, inference, Bayesian statistics, truncation error, compartmental models, hydrological modelling

1. Introduction

Many scientific phenomena involve time-varying signals or outputs. These phenomena are often believed to obey some parametric model, whose parameter values are a priori unknown but can be inferred from observed data. In this paper, we present and analyse the key challenges that arise in parameter inference when models involve ordinary differential equations (ODEs). ODEs are used throughout the biological and physical sciences to express dynamic processes; a few examples among the myriad of their application areas include epidemiology [1], hydrology [2], cardiac electrophysiology [3] and population dynamics [4].

In general, the differential equations used in scientific applications cannot be solved analytically. However, a wide range of computational methods have been developed to obtain a numerical approximation of their solutions. (Solving the differential equation for a particular value of the parameters is known as the forward problem.) While numerical algorithms to solve the forward problem introduce error, the properties of this error are generally well understood and can be controlled. In solvers using a fixed time step (discussed in §3.2), the error can be reduced by decreasing the size of the time step [5]. In solvers in which the time step is set adaptively (discussed in §3.3), the error is typically controlled through user-specified relative and absolute tolerances on the local truncation error (the error in the solution introduced by a single time step of the solver) [6].

Our focus in this paper is on the interplay between the numerical approximations inherent in the forward problem and the inverse problem, which consists of learning values of the parameters that are compatible with an observed time series. Widely used approaches to the inverse problem include optimization of an objective function that measures the quality of fit between the model and the data (e.g. maximum likelihood), or Bayesian approaches that generate samples from the posterior distribution of the parameters (e.g. Markov chain Monte Carlo (MCMC)). These approaches to the inverse problem require the forward problem to be solved at multiple different parameter values. Previous work (e.g. [7]) has demonstrated that numerical error in the forward problem is liable to introduce errors in estimated parameters, and techniques have been developed to model the uncertainty in forward solutions and parameter estimates arising from the approximation errors made by the solver [8]. One approach for analysing the errors in the inverse problem that arise because of numerical approximation in the forward problem, is presented in [9]. In this approach, bounds are computed on the Hellinger distance between the true parameter posterior distribution and the posterior arising from a numerically approximate forward solution. For certain conditions on the forward solution and its numerical approximation (which our example models in this paper satisfy), it can be shown that the error in the forward solution transfers to a bound on the Hellinger distance between the true and numerically approximate posteriors [9, theorem 4.6, eqn 4.12]. This analysis, however, provides an asymptotic result, and achieving accurate posteriors in practical problems is not necessarily straightforward. In particular, as we illustrate in this paper, modest solver step sizes can produce accurate forward solutions but highly inaccurate log-likelihood surfaces which are intractable for inference algorithms such as MCMC sampling.

Our paper provides an empirical analysis, involving both synthetic and real data, demonstrating the various characteristics of errors in parameter likelihood surfaces that can appear due to solver inaccuracy. Our results demonstrate the importance of ensuring that ODE solvers used during an inference algorithm are appropriately tuned for inference, which often requires the ODE to be solved more accurately than for acceptable forward solutions.

The rest of this paper is organized as follows. In §2, we present the widely used independent and identically distributed Gaussian noise log-likelihood function for fitting ODE models and derive a bound on the error in this log-likelihood arising from the use of an approximate solution to the ODE. On the basis of this bound, and results presented subsequently, we argue that the biases in inference results arising from numerical solvers are likely to be most severe in systems that have low noise and rapid nonlinear dynamics. In §3, we study two broad classes of ODE solvers: those involving a fixed time step, and those involving a time step set adaptively to control the error on the solution. For both classes of solver, we illustrate the effects that solver inaccuracy can have on inference, and illustrate this using synthetic data. Additionally, in electronic supplementary material, §S3, we study how smoothing approximations can reduce the influence of numerical error on computation of the likelihood. Finally, in §4 and §5, we consider inference of ODE models for real data series. In §4, we reanalyse an ODE model of disease transmission fit to the COVID-19 outbreak in Germany and show that, when using a solver with a fixed time step, the choice of time step can alter inference and simulation results, and in §5, we fit a rainfall runoff model to hydrological data to illustrate the pitfalls of performing parameter inference using an adaptive step size solver with insufficient local tolerances.

2. Effects of numerical error on computation of the log-likelihood

2.1. Log-likelihood function for an ordinary differential equation model

We assume that time-series data ${y_{i}}_{i = 1}^{N}$ ; $y_{i} \in R^{n}$ are measured at time points ${t_{i}}_{i = 1}^{N}$ . These data are believed to obey some function $g : R^{l} \to R^{n}$ of $x (t; θ) \in R^{l}$ , where x is the solution to an ordinary differential equation

\frac{d x}{d t} = h (t, x, θ)

2.1

and

x (0) = x_{0},

2.2

for some function h that is informed by the relevant scientific theory and parametrized by the (potentially unknown) values $θ \in Θ \subset R^{m}$ . (In some inference problems, the initial value x₀ is also an unknown parameter, or a function of unknown parameters.) Equation (2.1) has been written as a first-order equation (i.e. only involving the first-order derivative of x); higher order differential equations can be rewritten as systems of first-order equations.

Deterministic forward models never fully explain the variation in real observations. To make the forward model a feasible explanation of the data, an additional stochastic component representing elements not modelled as part of the differential equation model (for example, processes involved in the measurement of the signal, or a model of measurement error occurring in the sensor devices used to collect the measurements) is included. This stochastic component is often included in additive form, yielding the proposed data generating process

y_{i} = g (x (t_{i}; θ)) + ϵ_{i},

2.3

where $ϵ_{i}$ is a mean-zero random variable specifying the noise process. Many choices for $ϵ_{i}$ are possible, and in applied data analysis the assumed distribution of this variable should be chosen carefully, as misspecification of $ϵ_{i}$ can cause inaccurate inference results [10]. In this paper, for simplicity, we choose to generate synthetic data using the independent and identically distributed (IID) Gaussian distribution. The IID Gaussian noise distribution is given by

ϵ_{i} \overset{IID}{\sim} N (0, σ^{2}),

2.4

with the parameter σ representing the standard deviation of the noise process, that is also inferred from the data together with the model parameters θ. Choosing a particular noise process enables the joint probability of the observations to be expressed as a function of the parameter values θ—this is known as a likelihood. For an IID Gaussian noise process (equation (2.4)) and one-dimensional data points y_i, the log-likelihood for time-series data ${y_{i}}_{i = 1}^{N}$ takes the form

\log p (y_{1}, \dots, y_{N} | θ, σ) = - \frac{N}{2} \log (2 π) - \frac{N}{2} \log (σ^{2}) - \frac{1}{2 σ^{2}} \sum_{i = 1}^{N} (y_{i} - g {(x (t_{i}; θ)))}^{2} .

2.5

The likelihood expresses the quality of the fit between the model output and the data, with higher values of the likelihood indicating a superior fit. Thus, the values of θ most compatible with data can be found by maximizing the likelihood with respect to θ (i.e. the method of maximum likelihood). Alternatively, the likelihood can be used together with a prior distribution on the parameters (p(θ)) to infer the posterior distribution according to Bayes’ theorem

p (θ | y_{1}, \dots, y_{N}) = \frac{p (y_{1}, \dots, y_{N} | θ) p (θ)}{p (y)} .

In this paper, we consider the typical case where the posterior cannot easily be expressed in closed form but can be approximated using MCMC sampling methods [11].

2.2. Error in the log-likelihood arising from approximation of the forward solution

The data are assumed to obey the IID Gaussian log-likelihood, equation (2.5). We assume that x(t_i; θ) is the true solution to the ODE at time point t_i, which is unavailable and approximated by ${\hat{x}}_{i}$ . The deviation between x(t_i; θ) and ${\hat{x}}_{i}$ at any time point is given by the global truncation error, e(t_i)

e (t_{i}) = x (t_{i}; θ) - {\hat{x}}_{i} .

In general, e(t_i) is unknown, although, for particular numerical solvers, its magnitude might be bounded by some function of the step size or some other quantity which can be used to tune the accuracy of the solver.

The log-likelihood available to the inference algorithm takes the same form as equation (2.5), but computed using the numerical approximation ${\hat{x}}_{i}$ instead of x(t_i; θ). For brevity, we denote the accurate log-likelihood by $L$ , and we denote the log-likelihood computed using a numerical approximation by $L^{'}$ , which is given by

L^{'} = - \frac{N}{2} \log (2 π) - \frac{N}{2} \log (σ^{2}) - \frac{1}{2 σ^{2}} \sum_{i = 1}^{N} {(y_{i} - g ({\hat{x}}_{i}))}^{2} .

2.6

Theorem 2.1. —

Assuming Lipschitz continuity of the observation function g with Lipschitz constant K, the difference between $L^{'}$ and $L$ (both computed at the parameter values θ, which are assumed to be those which generated the data) is bounded according to

$| L - L^{'} | \leq \sum_{i = 1}^{N} (\frac{K^{2}}{2 σ^{2}} {| e (t_{i}) |}^{2} + \frac{K}{σ^{2}} | e (t_{i}) | | y_{i} - g (x (t_{i}; θ)) |) .$ 2.7

Proof. —

The proof is presented in electronic supplementary material, §S1. ▪

We observe an inverse relationship between σ and the bound of $| L - L^{'} |$ when e(t_i) is held constant. Thus, when a solver is tuned to yield a particular global truncation error e(t_i), we expect the absolute bias in the log-likelihood to be more severe at smaller values of σ. Equation (2.7) also indicates that more severe biases can be expected when N is larger—i.e. there are more observations. Additionally, at a fixed level of σ, we expect the bias in the log-likelihood to decrease as the global truncation error is decreased. In electronic supplementary material, §S2, we additionally derive the distribution of the error in the likelihood, and show that $E [L - L^{'}] > 0$ : the numerical approximation of the likelihood will, on average, underestimate the true likelihood.

3. Effects of ordinary differential equation solvers on inference

To study the interplay between ODE solvers and inference, we introduce the following differential equation problem which describes an oscillatory system with damping and forcing:

m \frac{d^{2} x}{d t^{2}} + c \frac{d x}{d t} + k x = F (t) .

The model has three unknown parameters: (m, c, k). In classical mechanics, these represent the mass, damping coefficient and spring constant, respectively. F(t) represents the forcing function or stimulus, and in this paper takes a variety of forms throughout our results. This damped and forced oscillator is described by a second-order differential equation; to apply ODE solvers straightforwardly, we rewrite it as a first-order differential equation of two-state variables

\frac{d}{d t} (\begin{matrix} x \\ \dot{x} \end{matrix}) = (\begin{matrix} \dot{x} \\ \frac{F (t)}{m} - \frac{c}{m} \dot{x} - \frac{k}{m} x \end{matrix}),

3.1

where $\dot{x} = d x / d t$ .

3.1. Fixed step and adaptive step ordinary differential equation solvers

A wide range of numerical algorithms have been developed to obtain approximate solutions to initial value problems (IVPs) of the form given in equations (2.1) and (2.2). These algorithms typically work by computing an approximate solution on a grid of time points (in general, distinct from the time points where the data are located) and then using an interpolation algorithm to obtain the solution at intermediate time points.

Most simply, the grid of solver time points can be prespecified in advance (we refer to such methods as fixed time step solvers). However, in general, it is inefficient to use the same time step throughout the entire time range on which the ODE is being solved, particularly when solved repeatedly over a range of parameters. Solvers can employ large time steps in regions where the solution and its gradients change gradually without causing much error in the solution; however, in regions where the derivative changes rapidly, small time steps are required to maintain a low error. This motivated the development of ODE solvers that adjust the step size throughout the time domain over which the ODE is solved. While fixed step solvers are still commonly used, adaptive step solvers are standard in high-performance computing and are widely implemented in software libraries for ODE solving.

When using an adaptive step size solver, the user does not specify a step size, but rather a local error tolerance. The algorithm then selects a time-varying sequence of step sizes such that the local error in the solution falls below the specified tolerance. The total number of time steps used by the solver thus depends on the selected tolerance and the properties of the solution. Typically, an interpolation scheme is then used to obtain the solution at intermediate time values. Tolerances can be expressed either as an absolute value or relative to the magnitude of the solution. In many implementations, both are available to the user: for example, the SciPy library allows the user to specify both an absolute tolerance atol and a relative tolerance rtol, and chooses step sizes such that the magnitude of the local truncation error on the solution x does not exceed $a t o l + r t o l | x |$ [12]. For the results presented in this paper, we fix $a t o l$ to a value of 10⁻⁹ and tune $r t o l$ to control the accuracy of the solver. Adaptive step sizes have been implemented for a wide variety of ODE solver algorithms. Although the phenomena we analyse in this paper are liable to arise from a variety of adaptive step size solvers, (including, for example, the CVODE multistep method [13, fig. 3]), we focus on Runge–Kutta methods of the form RKp(q), which use the qth order method to estimate the error (and thus control the time step), while making the actual steps using the pth order method [6]. Runge–Kutta methods are not described in detail here for brevity—they are widely used and details can be found in many standard texts (for example, [5]). We rely on the SciPy adaptive time step Runge–Kutta implementation, which employs a quartic interpolation polynomial for RK5(4) and a cubic Hermite interpolation polynomial for RK3(2) [12].

3.1.1. Typical log-likelihood surface shapes

We now consider the influence of the two numerical solution methods for parameter inference. Because fixed step solvers use the same grid throughout parameter space, while adaptive step solvers might employ different grids at different parameter values, these two classes of solvers differ in the characteristics of the error that they tend to introduce into the likelihood function.

We illustrate this by computing the likelihood surface for the k parameter in the oscillator problem, equation (3.1). In total, 75 evenly spaced data points were generated from and including t = 0 to t = 50 from the model with an accurate solver (the RK5(4) solver with relative tolerance set to 10⁻⁸), using true parameter values k = 1, c = 0.2, m = 1, initial conditions of x(t = 0) = 0, $\dot{x} (t = 0) = 0$ , and

F (t) = {\begin{matrix} 1, & t < 25, \\ 0.9, & t \geq 25. \end{matrix}

Then IID Gaussian noise was added to the solution at each of the sampled locations with σ = 0.01. Holding all other parameters fixed at their true values, the log-likelihood was calculated for a range of values of k, using three different ODE solvers. First, the RK5(4) solver with relative tolerance set to 10⁻⁸ was used to compute the accurate (True) likelihood. Next, the Forward Euler solver with a fixed time step of Δt = 0.01 was used. Finally, we used the RK5(4) solver, but with its relative tolerance tuned so the observed magnitude in the error in the log-likelihood at the true parameter values was equal to that produced by the Forward Euler solver (for this problem, this resulted in relative tolerance tuned to 0.00944). These results are shown in figure 1.

Figure 1. — Comparison of log-likelihood surfaces calculated using fixed step and adaptive step solvers. Log-likelihood for the parameter k calculated from data generated from the oscillator model equation (3.1), with all other parameters held at their true values. The log-likelihood was calculated from equation (2.5) using an adaptive step RK5(4) solver with relative tolerance set to 10⁻⁸ (True), a Forward Euler solver with a fixed time step Δt = 0.01, and an adaptive step RK5(4) solver with tolerance tuned such that at the true parameter values (vertical line) it introduces the same magnitude of error in the log-likelihood as the fixed step Forward Euler solver (corresponding to a relative tolerance of 0.00944).

At the true parameter value, both solvers result in a slight underestimation of the log-likelihood. Across the parameter range considered, the fixed time step solver results in a log-likelihood that is shifted relative to the true one, but retains the smooth, unimodal shape. However, the adaptive step solver results in a log-likelihood surface that in addition to being shifted exhibits jagged, discontinuous fluctuations. In the remainder of §3, we examine these two phenomena in more detail.

3.2. Fixed time step solvers

3.2.1. Forward Euler solver

One of the simplest numerical solvers for ordinary differential equations is the Forward Euler method with a uniform step size Δt. This solver is easily implemented and thus has achieved wide usage despite its simplicity and typically mediocre performance.

Forward Euler has been used for inference in some recent high-profile epidemiological research where Δt was set to a value comparable to the time scale of the behaviour of the system (e.g. [14,15]). Whether these applications are representative of the use of Forward Euler more generally is unclear, but our results in §4 indicate that such choices of Δt can alter both forward model solutions and parameter inference results.

3.2.2. Inference for the damped, driven oscillator using Forward Euler

We now exemplify the impact of using Forward Euler with insufficiently small time steps on inference by using synthetic noisy data generated from the (accurate) solution of equation (3.1). In total, 25 evenly spaced data points were generated from and including t = 0 to t = 5 from the model with an accurate solver (the RK5(4) solver with relative tolerance set to 10⁻⁸), using true parameter values k = 1, c = 0.2, m = 1, an initial condition of x(t = 0) = 0, $\dot{x} (t = 0) = 0$ and F(t) = 1. Then, IID Gaussian noise was added to the solution at each of the sampled locations with σ = 0.1. Holding all other parameters fixed at their true values, the log-likelihood was calculated for a range of values of k, using the Forward Euler solver with various time steps.

Figure 2 shows the impact of using Forward Euler on the likelihood surface. The results show the typical effect of a fixed step solver with insufficiently small time steps: the likelihood surface maintains a smooth shape, but it is shifted relative to its true location. The longest time step considered in this study, Δt = 0.1, causes substantial inaccuracy in the likelihood even though Δt = 0.1 is small compared with the time scale of the dynamics of the system and the system with F(t) = 1 contains no discontinuities or other challenging features. Given that Forward Euler is a first-order method—as the step size is refined, the global truncation error introduced by the method is approximately proportional to Δt—its poor performance is unsurprising.

As the step size is refined, the log-likelihood curves converge. This suggests a diagnostic technique that could be incorporated into inference algorithms: once the optimal parameter values have been determined, the log-likelihood should be evaluated at those parameter values with the step size on the solver slightly adjusted; if the solver is sufficiently accurate, the value of the log-likelihood should not be a strong function of the step size.

3.3. Adaptive step size solvers

Adaptive step size solvers enable increased efficiency in obtaining accurate solutions to ODEs. However, when used in inference problems, they can convert a smooth likelihood surface into a rough one, characterized by rapid (and entirely phantom) changes in the likelihood which interfere with inference algorithms. These inaccuracies in the likelihood can be observed even at tolerances in the solution error where further refinements do not visibly influence the solution. For example, in cardiac electrophysiology, jagged parameter likelihoods have been observed with adaptive step size ODE solvers with tolerances as low 10⁻⁷ [16,17]. Next, we investigate the origin of the jagged likelihoods using synthetic data from the oscillator model described in equation (3.1).

3.3.1. Inference for the damped, driven oscillator using an adaptive step size solver

We first study the effects of adaptive time step solvers on inference using the model system that was introduced at the beginning of §3 (equation (3.1)). Here, we set the input stimulus according to

F (t) = {\begin{matrix} 1, & t < t_{change}, \\ f_{1}, & t \geq t_{change} . \end{matrix}

3.2

Thus, f₁ defines the strength of a pulse provided to the system at t = t_change.

Our subsequent analyses in this section employ various discontinuities in F(t) to control the difficulty of the ODE system, and we show that adaptive step solvers can introduce substantial, jagged errors into the likelihood surface. However, it is not only ODEs with discontinuous r.h.s. that are affected by similar phenomena. In electronic supplementary material, §S4, we additionally study the interplay between a continuous (but rapidly changing) F(t) and an adaptive step size solver, and show similar qualitative results.

First, we consider the problem where f₁ = −1 and t_change = 2.5 for different choices of the RK5(4) solver tolerance. In total, 25 evenly spaced data points were generated from and including t = 0 to t = 5 from the model with an accurate solver (the RK5(4) solver with relative tolerance set to 10⁻⁸), using true parameter values k = 1, c = 0.2, m = 1 and an initial condition of x(t = 0) = 0, $\dot{x} (t = 0) = 0$ . Then, IID Gaussian noise was added to the solution at each of the sampled locations with σ = 0.1. Holding all other parameters fixed at their true values, the log-likelihood was calculated for a range of values of k, using the RK5(4) solver with various tolerances. These results are shown in figure 3. At insufficient tolerances, the log-likelihood surface exhibits significant erroneous jaggedness. Notably, visual changes between the forward simulations are minor even at tolerances that cause drastic differences in the log-likelihood.

Figure 3. — Damped oscillator inference using adaptive time step Runge–Kutta. (a) Synthetic data for the damped driven oscillator. The curved line indicates the accurate solution to the ODE with these parameters, while the points indicate the noisy data. (b) Solution for oscillator computed using an RK5(4) solver with three different choices for the relative tolerance (indicated by tol in the legend). (c) Log-likelihood for the parameter k calculated from the noisy data, with all other parameters held at their true values. The log-likelihood was calculated from equation (2.5) using an RK5(4) solver with three different choices for the tolerance.

Next, we fix the adaptive solver tolerance and study how introducing more rapid changes in the system’s behaviour affects the log-likelihood surface. In figure 4, we fix t_change = 25 and consider four different values of f₁ and plot the likelihood surface for the model parameter k calculated according to an RK5(4) solver with $r t o l = 10^{- 3}$ . For each value of f₁, 75 evenly spaced data points were generated on the interval from and including t = 0 to t = 50, using parameter values k = 1, c = 0.2 and m = 1. IID Gaussian noise was added to the solution at each of the sampled locations with σ = 0.01. The likelihood was then calculated over a range of values of k, with all other parameters held at their correct values. For f₁ = 1, the stimulus F(t) is constant over time, and the likelihood surface appears smooth. However, as f₁ is adjusted so the stimulus is a stronger pulse, the likelihood becomes jagged with large deviations away from the true likelihood surface. (This is an example of a challenging r.h.s. that could be made more tractable for inference using smoothing approximations, which we analyse in electronic supplementary material, §S3.) Overall, these results indicate that the more rapid the changes in a system’s behaviour, generally the tighter solver tolerances are required to solve the inverse problem.

Figure 4. — Damped oscillator model: forward simulations and inference using an adaptive solver. Time-series data and parameter likelihood surfaces are shown for four values of f₁ in the oscillator problem: equations (3.1) and (3.2). For each value of f₁, the top plot shows the accurate ODE solution (line) and noisy synthetic data (points) generated from it. The bottom plot panels show the corresponding log-likelihood surface for k over an interval centred on the true value, k = 1, while all other parameters are held at their true values. For generating the likelihood surfaces, an RK5(4) solver was used with $r t o l = 10^{- 3}$ .

A fundamental point to note is that these inaccuracies arise because different values of the parameters represent different forward problems, and the solver selects a different sequence of step sizes for each. When the solution contains regions of rapid change, differences in the positions of the solver time steps, and, particularly, the inevitably discontinuous jumps in the total number of time steps used by the solver, cause errors in the likelihood. This phenomenon is investigated more closely in figure 5. For this study, the oscillator model equation (3.1) was again used. In total, 50 evenly spaced data points were generated on the time interval from and including t = 0 to t = 10, with t_change = 5 and f₁ = −5, using parameter values m = 1, c = 0.2 and k = 1. IID Gaussian noise was added to the solution at each of the sampled locations with σ = 0.01. The likelihood for k was calculated as before and is plotted in figure 5. In this case, the figure is restricted to a very narrow range of k values, and the total number of time points selected by the adaptive solver for the calculation of the likelihood at each value of k is overlaid on the plot. Here, the large jumps in the likelihood correspond to the addition or removal of a solver time point. Smaller spikes and jaggedness where the total number of solver time points is constant correspond to shifting of the solver time points.

Figure 5. — Damped oscillator model: likelihood discontinuities caused by variation in the number of adaptive steps. The log-likelihood surface for the parameter k in the oscillator problem (black solid line) and the number of time points used by the adaptive step size ODE solver in the calculation of each value of the likelihood (blue dashed line) are shown. An RK5(4) solver was used with $r t o l = 10^{- 3}$ .

3.3.2. Effect of jaggedness on inference algorithms

The jagged spikes appearing in the likelihood surface as a result of insufficiently accurate adaptive step size solvers plague computational inference algorithms. A common approach to Bayesian inference is to use the Metropolis MCMC algorithm, or variants of it [11]. This algorithm generates a sequence of parameter values via a Markov chain whose stationary distribution is the posterior distribution of the parameters. Given the most recent parameter values in the chain θ^old, the Metropolis algorithm proposes new parameter values θ^prop according to a proposal distribution and then accepts θ^prop with a probability of

r = min (1, \frac{p (θ^{prop})}{p (θ^{old})} \frac{p (y | θ^{prop})}{p (y | θ^{old})}),

where p(θ^prop) is the prior and p(y|θ^prop) is the likelihood. To illustrate the detrimental effects of jagged errors in the likelihood, we consider a situation where θ^old and θ^prop have identical values under the prior and the accurately computed likelihood (this is a plausible assumption when θ^old and θ^prop are nearby), but we assume that the log-likelihoods at these two parameter values computed using the numerical approximation differ by some factor c driven by numerical error in the adaptive step size solver (i.e. $\log p (y | θ^{prop}) = \log p (y | θ^{old}) - c$ , for c > 0). This assumption of a jump in computed likelihood values at nearby parameter values is analogous to the spikes appearing in the log-likelihood in our results in figures 4 and 5.

Under these assumptions, $\log r = - c$ or $r = \exp (- c)$ . For a value of c = 10 (smaller than many of the magnitudes of spikes observed in our results), the probability of accepting the proposal is less than 1 in 20 000. Even a relatively small jump of magnitude c = 3 will be traversed by the sampler with a probability of only about 5%. Although these computations are based on simplistic assumptions, they suggest that even minor warping of the log-likelihood can severely restrict the ability of a Metropolis–Hastings sampler (or similar inference algorithm) to traverse the parameter space efficiently.

3.4. The impact of observation error magnitude on inference and sampling performance

In this section, we empirically study the effects of different levels of observation noise on inference. We performed Bayesian inference using MCMC for the oscillator problem with varying levels of noise in the data. We considered two values of σ (0.001 and 0.1) to generate the data, fixed f₁ = −1, and otherwise generated data exactly as described for figure 4. We set a uniform prior on [0.1, 1.5] for the three model parameters m, c and k, and a uniform prior on [0, 1] for the σ. Three MCMC chains were run, initialized at random samples from the prior (with the same MCMC starting point being used for both choices of the true σ). In total, 1500 iterations of MCMC were performed using the Haario–Bardenet adaptive covariance algorithm as implemented in PINTS to sample from the posterior [18,19]. The MCMC chains for the m parameter are plotted in figure 6a,c using the RK5(4) solver with $r t o l = 10^{- 3}$ , while figure 6b,d shows the chains using the same solver but with more accurate tolerances of $r t o l = 10^{- 8}$ .

Figure 6. — Effect of noise on MCMC convergence. Data were generated according to the same specifications as for figure 4, with f₁ = −1, and the indicated values of σ. Inference was performed for the three parameters m, c and k, as well as σ, via adaptive covariance MCMC [19] with three independent chains initialized at random samples from the prior (uniform on [0.1, 1.5] for the model parameters, and uniform on [0, 1] for σ). In total, 1500 MCMC iterations were performed. The plots show the three chains for the m parameter. (a,c) Forward simulation was performed using the RK5(4) solver with $r t o l = 10^{- 3}$ . (b,d) Forward simulation was performed using the RK5(4) solver with $r t o l = 10^{- 8}$ .

At the lowest noise level considered (σ = 0.001), the three MCMC chains using the less accurate solver move towards the true value of the parameter but fail to mix with each other. Instead, each chain remains stuck in a narrow region of parameter space near the true parameter value, corresponding to the phantom local maxima in the likelihood surface observed in figure 4. Reducing the solver tolerance to 10⁻⁸ was, however, sufficient to ensure chain mixing, indicating that the lack of convergence was purely an artefact of using an inaccurate solver. At the highest level of noise considered here (σ = 0.1), the three MCMC chains mix well for either tolerance choice;¹ this can be explained by our bound given in equation (2.7): that larger σ values lead to gentler variation in the log-likelihood surface and so easier exploration by inference algorithms.

4. Fixed step solvers applied to a susceptible–infected–recovered change point model of the spread of COVID-19 in Germany

A widely used class of differential equation models in epidemiology are compartmental models. These models divide the population into a number of compartments representing different diseased or non-diseased states and specify the rates at which individuals move from one compartment to another [1]. A simple yet commonly used example is the susceptible–infected–recovered (SIR) model [20]. This model keeps track of the number of susceptible individuals S(t) (those who can be infected with the disease), infected individuals I(t) (those who are currently infectious with the disease) and recovered individuals R(t) (those who have recovered from the disease and are assumed immune). Neglecting births and deaths, the model is expressed by the following system of differential equations:

\frac{d S}{d t} = - λ \frac{S I}{N},

4.1

\frac{d I}{d t} = λ \frac{S I}{N} - μ I

4.2

and \frac{d R}{d t} = μ I,

4.3

where λ > 0 is the spreading rate of the disease, μ > 0 is the recovery rate and N > 0 is the total size of the population. The system additionally requires the specification of initial conditions for each compartment (S(0), I(0), R(0)), where I(0) must exceed zero for an infection to spread.

The qualitative behaviour of the SIR model can be determined by the basic reproduction number, R₀, where

R_{0} = \frac{λ}{μ} .

Assuming that S(0) ≈ N and I(0) > 0, when R₀ > 1, the number of infected individuals will grow, and the epidemic will eventually infect the entire population (barring a change in λ or μ); for R₀ < 1, the number of infected individuals will fall. Thus, fitting an SIR model to infection data, and estimating the spreading rate λ and reproduction number R₀, are important steps in understanding and predicting the progression of an epidemic.

An extension to the standard SIR model has λ vary over time, allowing the model to capture changes in the spread of a disease caused by behavioural changes or government policy. In the aftermath of the outbreak of COVID-19 in Europe in early 2020, an SIR model allowing changes in λ through time was used in a high-profile paper which attempted to capture the impact of major public health policy interventions on COVID-19 transmission in Germany [15]. The authors used the model equations (4.1)–(4.3), discretized with a 1-day time step, equivalent to a Forward Euler solver with Δt = 1

S_{t} = S_{t - 1} - λ (t) Δ t \frac{S_{t - 1} I_{t - 1}}{N},

4.4

I_{t} = I_{t - 1} + λ (t) Δ t \frac{S_{t - 1} I_{t - 1}}{N} - μ Δ t I_{t - 1}

4.5

and R_{t} = R_{t - 1} + μ Δ t I_{t - 1} .

4.6

The initial condition was given by an unknown parameter I₀ = I(0). The system was closed with R(0) = 0 and S(0) = N − I₀. The spreading rate λ was assumed to be a continuous function of time and was allowed to shift at three time points, whose locations were estimated from the data. Specifically, these three time points, t_i, i ∈ {1, 2, 3} denoted the times at which λ began to (linearly) change to a new, constant value, and the time taken for these shifts was dictated by durations d_i. The λ profile then has the following piecewise representation:

λ (t) = {\begin{matrix} λ_{0}, & t < t_{1}, \\ λ_{0} + \frac{λ_{1} - λ_{0}}{d_{1}} (t - t_{1}), & t_{1} \leq t < t_{1} + d_{1}, \\ λ_{1}, & t_{1} + d_{1} \leq t < t_{2}, \\ λ_{1} + \frac{λ_{2} - λ_{1}}{d_{2}} (t - t_{2}), & t_{2} \leq t < t_{2} + d_{2}, \\ λ_{2}, & t_{2} + d_{2} \leq t < t_{3}, \\ λ_{2}, + \frac{λ_{3} - λ_{2}}{d_{3}} (t - t_{3}) & t_{3} \leq t < t_{3} + d_{3}, \\ λ_{3}, & t_{3} + d_{3} \leq t . \end{matrix}

Additional features of the model included a reporting delay and a weekly modulation. The reporting delay was characterized by a single parameter D indicating the number of days between the time at which new infections occur and the time at which they are reported. The modulation accounts for the weekly periodicity evident in the data and is characterized by two parameters f_w and $Φ_{w}$ . This significant periodicity probably arises from processes involved in the reporting of COVID-19 cases and deaths [21]. Specifically, cases C_t are modelled by

C_{t} = (1 - f (t)) I_{t - D}^{new},

4.7

where

f (t) = (1 - f_{w}) (1 - | \sin (\frac{π}{7} t - \frac{1}{2} Φ_{w}) |),

4.8

where $I_{t}^{new} = S_{t - 1} - S_{t}$ . Dehning et al. [15] assumed a Student-t distribution with four degrees of freedom and multiplicative noise for the likelihood, such that the likelihood for observed cases $\hat{C_{t}}$ was given by

p (\hat{C_{t}} | θ, σ) = {Student-t}_{ν = 4} (mean = C_{t} (θ), scale = σ \sqrt{C_{t} (θ)}),

where $θ = (λ_{0}, λ_{1}, λ_{2}, λ_{3}, t_{1}, t_{2}, t_{3}, d_{1}, d_{2}, d_{3}, μ, D, I_{0}, f_{w}, Φ_{w}, σ)$ is the full vector of parameters for the differential equation model, and C_t(θ) is the deterministic solution which can be computed using a range of different time steps. The prior distributions for the parameters are given in electronic supplementary material, table S1.

4.1. Effect of time step on the forward solution

We first study the effect of assuming Δt = 1 day on forward simulations of the model. We set up the forward simulations using the same settings that Dehning et al. [15] used to generate their fig. 2. The parameters of an SIR model without change points or weekly modulation (i.e. a single value of λ, μ, D, I₀ and σ) were inferred from an early period of the German daily reported COVID-19 cases, from 2 to 15 March 2020. The posterior median values of these parameters (excepting λ) were then used to generate forward simulations according to the full model without weekly modulation (equations (4.4)–(4.7)), with one change point, and pre-specified values of λ₀ and λ₁.

As in [15], the first set of simulations considered how different levels of social restrictions could influence the course of disease transmission, as measured by cases. Three levels of social restrictions (assumed to be captured by different λ values) are considered, which each yield two sets of simulations: one corresponding to Forward Euler with Δt = 1 day (as in [15]) and another with Δt = 0.1 days. We choose 0.1 days as the more accurate comparator method, as further refinement of the step size yields little change in forward solutions or inference results but is increasingly costly to run. The results of this are shown in figure 7a. Our second set of simulations, shown in figure 7b, considered only our ‘strong’ social distancing scenario and explored three different times at which the change in λ might occur (e.g. if a public health intervention were implemented at different times). These simulations illustrate how using a large time step generally leads to a substantial underestimation of case counts for a given choice of λ(t), particularly during the (crucial) growth phase of the epidemic.

Figure 7. — COVID-19 model: forward simulations using Forward Euler. In (a–d), the top panels (a,b) shows three different pre-specified trajectories of λ(t), and the bottom panels (c,d) shows the number of daily cases resulting from these trajectories for each choice of the time step Δt.

4.2. Effect of time step on the posterior distributions

We also studied the effect of the time step on parameter inference for the full model (equations (4.4)–(4.8)) using the German daily cases data from 2 March to 21 April 2020 as was done in [15]. Inference was performed using the PyMC3 No-U-Turn MCMC sampler (NUTS) [11,22] using the model developed in [15], modified to allow the 0.1-day step size. To initialize the chains, automatic differentiation variational inference [23] as implemented in PyMC3 [22] was performed to generate an approximate posterior (which, however, does not capture correlations between the parameters). Four MCMC chains were then initialized by sampling from this approximation of the posterior. The chains were run for 500 iterations of NUTS, with the first 100 discarded as burn-in, and convergence assessed by requiring that $\hat{R} < 1.05$ [11]. These results are shown in figure 8.

Figure 8. — COVID-19 model: inference using Forward Euler. (a) Real data and model fits for the number of daily COVID-19 cases in Germany over the period 2 March 2020 to 21 April 2020. Note that the model fits for Δt = 1 and Δt = 0.1 overlap almost completely. (b) Inferred basic reproduction number over time for the Germany COVID-19 data, using the SIR model with change points in λ (equations (4.4)–(4.8)) and two different values for the ODE solver time step, Δt. In both panels, lines indicate the posterior median and shaded regions indicate the central 95% of the posterior.

Both models achieve a near identical visual fit to the data, using the median values of the recovered parameters. However, the parameter estimates of the two models differed. We focus on the posterior distribution for the basic reproduction number R₀, which is calculated using the MCMC samples of the joint posterior for (λ, μ). The 1-day time step results in overestimation of R₀ (by approx. 10% relative to the 0.1-day time step) during the early stages of the epidemic (i.e. before the first change point). This is because, during the growth phase of the epidemic, the larger time step results in slower growth for a given λ value (cf. figure 7), meaning a larger λ value is estimated to compensate. During the later stages of the epidemic, the values of R₀ are more similar between the two models. Additionally, the change point locations are not much affected by the choice of time step (though, this is expected as the change points have fairly informative priors).

Our results indicate that while the discrete version of the SIR change point model using Δt = 1 appears visually to obtain a good fit to German COVID-19 data, the growth parameters of the discrete model using this time step vary markedly from those recovered using Δt = 0.1, and thus care should be taken in the deployment of such discrete models and the reporting of their results.

5. Numerical errors arising in rainfall run-off models of river streamflow data

In this section, we use real data from the French Broad River at Asheville, North Carolina to investigate the impact of adaptive solvers in performing inference for rainfall run-off models used in hydrology [24,25].

Rainfall run-off models divide the flow of water through a river basin into several spatially grouped components representing different hydrological processes. The model we consider here is governed by a system of five ODEs:

\frac{d S_{i}}{d t} = Precip (t) - InterceptEvap (t) - EffectPrecip (t),

5.1

\frac{d S_{u}}{d t} = EffectPrecip (t) - UnsatEvap (t) - Percolation (t) - Runoff (t),

5.2

\frac{d S_{s}}{d t} = Percolation (t) - SlowStream (t),

5.3

\frac{d S_{f}}{d t} = Runoff (t) - FastStream (t)

5.4

and \frac{d z}{d t} = SlowStream (t) + FastStream (t),

5.5

Each term in this equation is defined in electronic supplementary material, table S2, and the seven unknown parameters of the model and their prior distributions are defined in electronic supplementary material, table S3. The data consist of daily streamflow measurements (dz/dt), and the authors assume an IID Gaussian likelihood with unknown standard deviation σ.

Previous work has shown that using large time steps with such hydrological models can bias inferences [2]. We show that using an adaptive step size method (as suggested in [24]) can also cause inaccurate inference results, unless the error is tightly controlled.

Using an accurate ODE solver (the CVODE multistep solver from the SUNDIALS library [26] with $r t o l = a t o l = 10^{- 7}$ ), we obtained the posterior distributions for the seven parameters of the model, using United States Geological Survey (USGS) data for the streamflow at Asheville, North Carolina (USGS station 03451500) over a 200-day period starting 1 January 1960. Sampling was performed using the Dream multi-chain MCMC algorithm as implemented in PINTS [18,27], using six chains with each initialized by a sample from the prior (electronic supplementary material, table S3). In total, 25 000 MCMC iterations were performed, and convergence of the chains was assessed by requiring that $\hat{R} < 1.05$ [11]. In figure 9, we plot the likelihood surfaces of the parameters for slices through parameter space near the estimated posterior medians. Likelihood surfaces are plotted for two adaptive step size solvers: the RK3(2) solver from SciPy with $r t o l = a t o l = 10^{- 3}$ , and the CVODE solver as described above. For all parameters, the 10⁻³ tolerance solver causes highly jagged likelihoods, of sufficient magnitude to interfere with inference via MCMC or maximum-likelihood estimation. This is in accordance with our earlier results using the oscillator model in §3, as rapid changes in the r.h.s. cause spurious jaggedness in the computed likelihood. The likelihoods calculated using the more accurate solver have similar broad-scale shapes but are smooth enough for accurate inference to be performed.

Figure 9. — Rainfall run-off model: inference using accurate and inaccurate adaptive solvers. Here, we plot the likelihood surface for each parameter for the rainfall run-off model defined by equations (5.1)–(5.5). For each parameter, the solid line indicates the likelihood calculated using an RK3(2) adaptive solver with $r t o l = a t o l = 10^{- 3}$ , while the dashed line indicates the likelihood calculated using the CVODE adaptive solver with $r t o l = a t o l = 10^{- 7}$ .

6. Discussion

Inaccurate solution of ODEs through either fixed time step or adaptive solvers can lead to biased inferences which are generally exacerbated when there is low observation noise. For adaptive solvers, these biases can manifest through the presence of phantom jaggedness in the likelihood surface, which can wreak havoc for inference algorithms attempting to navigate the surface. They might also lead researchers to falsely conclude that a model is unidentifiable, when, for example, the chains in an MCMC run fail to mix. Researchers might then choose to modify the model in arbitrary ways when, in fact, all that was required to render inference soluble was a reduction in solver tolerances. Tolerances that seem good enough for forward simulation are probably insufficient for solving inverse problems. For example, a default relative tolerance of 10⁻³ was insufficient for both the synthetic data and real data studied in §§3.3 and 5. When using an ODE solver library to perform inference, default settings may well not suffice and, ideally, the solver tolerance should be set by inspection of the likelihood surface. In this paper, we focused our attention on two simple solvers: Forward Euler with a uniform grid and adaptive step size RK5(4). In practice, neither of these solvers may be ideal for a particular problem. More powerful solvers, such as implicit solvers which are known to perform more accurately on stiff problems, or solvers which have been augmented to better handle discontinuities (see electronic supplementary material, §S5), could help to reduce the computational runtime required to achieve sufficient accuracy for forward simulation or inference. However, regardless of the solver employed, practitioners should be aware that tuning their solver to achieve accurate forward simulations may still be insufficiently accurate for inference. For example, jagged errors in an objective function for the best fit parameters have been observed that arise when insufficient tolerances are supplied to the CVODE solver—a variable order multistep method using backwards differentiation formulae for stiff problems, which is significantly more powerful than the Forward Euler and RK5(4) solvers [13, fig. 3].

Unless there is a bifurcation in system behaviour at points in parameter space, likelihood surfaces should not have abrupt discontinuities. So, the presence of such changes may well be an artefact of using an adaptive ODE solver with insufficient tolerances. MCMC and optimization algorithms could be augmented by monitoring for such jumps and warning the user should they occur.

ODEs involving discontinuous r.h.s. functions are known to be particularly challenging to solve accurately. Our results indicate that r.h.s. functions involving rapid changes over time, such as those involving discontinuities, also curse computational inference when adaptive ODE solvers are used. However, our results in electronic supplementary material, §S5 also indicate that errors in the likelihood arising from discontinuous r.h.s. functions can be ameliorated through the use of simple smoothing approximations—a potentially more computationally efficient alternative to increasing tolerances. We argue that in many scientific systems such smoothing approximations are additionally more realistic descriptions of the phenomena being modelled. Although the appropriate degree of smoothing can be difficult to determine in general, for certain systems, the level of smoothing can be tuned based on knowledge of the process being modelled.

Much of the work on error control for ODE solvers has focused merely on the accuracy of the forward problem. The accuracies of widely used ODE solvers are typically tuned via their step sizes or local truncation error tolerances, but these are not the most relevant quantities for inference—instead, it is the error in the log-likelihood that must be controlled. ODE solvers that control the error on the log-likelihood directly would avoid much of the problems highlighted in this paper, and we suggest this as a fruitful research direction.

Endnote

We note that, for this level of noise, the centres of the sampling distributions are shifted slightly away from the true parameter value because the noise limits our ability to estimate this parameter.

Data accessibility

The code to perform the computer experiments presented in this paper was written in Python 3.7 and is available in an open source Python library from the Zenodo repository: https://zenodo.org/records/10578920 [28]. To run the COVID-19 simulations, we adapted the software library developed by Dehning et al. [15]. The version of the code including our modifications is available from the Zenodo repository: https://zenodo.org/records/10578938 [29].

Supplementary material is available online [30].

Declaration of AI use

We have not used AI-assisted technologies in creating this article.

Authors' contributions

R.C.: conceptualization, formal analysis, investigation, methodology, software, validation, writing—original draft, writing—review and editing; K.M.S.: formal analysis, investigation, methodology, software, visualization, writing—original draft, writing—review and editing; B.L.: conceptualization, formal analysis, investigation, methodology, project administration, supervision, validation, writing—original draft, writing—review and editing; G.R.M.: conceptualization, investigation, methodology, writing—review and editing; C.L.L.: conceptualization, investigation, methodology, supervision, writing—original draft, writing—review and editing; S.T.: formal analysis, investigation, methodology, writing—original draft, writing—review and editing; M.R.: conceptualization, investigation, methodology, supervision, writing—original draft, writing—review and editing; D.J.G.: conceptualization, formal analysis, investigation, methodology, project administration, supervision, writing—original draft, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare we have no competing interests.

Funding

This work was supported by the Wellcome Trust (grant no. 212203/Z/18/Z). G.R.M. acknowledges support from the Wellcome Trust via a Wellcome Trust Senior Research Fellowship. K.M.S. and D.J.G. acknowledge funding from the EPSRC CDT in Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research—SABS:R3 (EP/S024093/1). C.L.L. acknowledges support from the Science and Technology Development Fund, Macao SAR (FDCT) (reference no. 0048/2022/A) and support from the University of Macau via a UM Macao Fellowship. This research was funded in whole, or in part, by the Wellcome Trust [212203/Z/18/Z].

References

1.van der Vegt SA, et al. 2022. Learning transmission dynamics modelling of COVID-19 using comomodels. Math. Biosci. 349, 108824. ( 10.1016/j.mbs.2022.108824) [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kavetski D, Kuczera G, Franks SW. 2003. Semidistributed hydrological modeling: a ‘saturation path’ perspective on TOPMODEL and VIC. Water Resour. Res. 39, 1246. ( 10.1029/2003WR002122) [DOI] [Google Scholar]
3.Whittaker DG, Clerx M, Lei CL, Christini DJ, Mirams GR. 2020. Calibration of ionic and cellular cardiac electrophysiology models. Wiley Interdiscip. Rev. Dev. Biol. 12, e1482. ( 10.1002/wsbm.1482) [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Shertzer KW, Ellner SP, Fussmann GF, Hairston NG Jr. 2002. Predator-prey cycles in an aquatic microcosm: testing hypotheses of mechanism. J. Anim. Ecol. 71, 802-815. ( 10.1046/j.1365-2656.2002.00645.x) [DOI] [Google Scholar]
5.Gautschi W. 1997. Numerical analysis. Boston, MA: Springer Science & Business Media. [Google Scholar]
6.Dormand JR, Prince PJ. 1980. A family of embedded Runge-Kutta formulae. J. Comput. Appl. Math. 6, 19-26. ( 10.1016/0771-050X(80)90013-3) [DOI] [Google Scholar]
7.Kaipio J, Somersalo E. 2006. Statistical and computational inverse problems, vol. 160. New York, NY: Springer Science & Business Media. [Google Scholar]
8.Conrad PR, Girolami M, Särkkä S, Stuart A, Zygalakis K. 2017. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput. 27, 1065-1082. ( 10.1007/s11222-016-9671-0) [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Stuart AM. 2010. Inverse problems: a Bayesian perspective. Acta Numer. 19, 451-559. ( 10.1017/S0962492910000061) [DOI] [Google Scholar]
10.Lambert B, Lei CL, Robinson M, Clerx M, Creswell R, Ghosh S, Tavener S, Gavaghan D. 2023. Autocorrelated measurement processes and inference for ordinary differential equation models of biological systems. J. R. Soc. Interface 20, 20220725. ( 10.1098/rsif.2022.0725) [DOI] [Google Scholar]
11.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian data analysis. New York, NY: CRC Press. [Google Scholar]
12.Virtanen P, et al. SciPy 1.0 Contributors. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261-272. ( 10.1038/s41592-019-0686-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Clerx M, Beattie KA, Gavaghan DJ, Mirams GR. 2019. Four ways to fit an ion channel model. Biophys. J. 117, 2420-2437. ( 10.1016/j.bpj.2019.08.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Birrell P, Blake J, Van Leeuwen E, Gent N, De Angelis D. 2021. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Phil. Trans. R. Soc. B 376, 20200279. ( 10.1098/rstb.2020.0279) [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, Priesemann V. 2020. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science 369, eabb9789. ( 10.1126/science.abb9789) [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Johnstone RH. 2018. Uncertainty characterisation in action potential modelling for cardiac drug safety. PhD thesis, University of Oxford, Oxford, UK.
17.Mirams G. 2018. Numerical errors from ODE solvers can mess up optimisation and inference very easily. Math. Matters Heart. See https://mirams.wordpress.com/2018/10/17/ode-errors-and-optimisation/. [Google Scholar]
18.Clerx M, Robinson M, Lambert B, Lei CL, Ghosh S, Mirams GR, Gavaghan DJ. 2019. Probabilistic inference on noisy time series (PINTS). J. Open Res. Softw. 7, 23. ( 10.5334/jors.252) [DOI] [Google Scholar]
19.Haario H, Saksman E, Tamminen J. 2001. An adaptive metropolis algorithm. Bernoulli 7, 223-242. ( 10.2307/3318737) [DOI] [Google Scholar]
20.Weiss HH. 2013. The SIR model and the foundations of public health. Mater. Math. 3, 1-17. [Google Scholar]
21.Gallagher K, Creswell R, Gavaghan D, Lambert B. 2023. Identification and attribution of weekly periodic biases in epidemiological time series data. medRxiv. ( 10.1101/2023.06.13.23290903) [DOI]
22.Salvatier J, Wiecki TV, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55. ( 10.7717/peerj-cs.55) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Kucukelbir A, Tran D, Ranganath R, Gelman A, Blei DM. 2017. Automatic differentiation variational inference. J. Mach. Learn. Res. 18, 1-45. [Google Scholar]
24.Schoups G, Vrugt J, Fenicia F, Van De Giesen N. 2010. Corruption of accuracy and efficiency of Markov chain Monte Carlo simulation by inaccurate numerical implementation of conceptual hydrologic models. Water Resour. Res. 46, W10530. ( 10.1029/2009WR008648) [DOI] [Google Scholar]
25.Schoups G, Vrugt JA. 2010. A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour. Res. 46, W10531. ( 10.1029/2009WR008933) [DOI] [Google Scholar]
26.Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS. 2005. SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw. 31, 363-396. ( 10.1145/1089014.1089020) [DOI] [Google Scholar]
27.Vrugt JA, Ter Braak C, Diks C, Robinson BA, Hyman JM, Higdon D. 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10, 273-290. ( 10.1515/IJNSNS.2009.10.3.273) [DOI] [Google Scholar]
28.rccreswell. 2024. rccreswell/ode_inference: v1.0 (v1.0). Zenodo. ( 10.5281/zenodo.10578920) [DOI] [Google Scholar]
29.Onas Dehning Sebastian, Mohr Paul, Spitzner Joao Pinheiro, Neto Johannes, Vp Michael, Osthege Lode. 2024. rccreswell/covid19_inference_forecast: v0.1.2 (v0.1.2). Zenodo. ( 10.5281/zenodo.10578938) [DOI] [Google Scholar]
30.Creswell R, Shepherd KM, Lambert B, Mirams GR, Lei CL, Tavener S, Robinson M, Gavaghan DJ. 2024. Understanding the impact of numerical solvers on inference for differential equation models. Figshare. ( 10.6084/m9.figshare.c.7075553) [DOI] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Creswell R, Shepherd KM, Lambert B, Mirams GR, Lei CL, Tavener S, Robinson M, Gavaghan DJ. 2024. Understanding the impact of numerical solvers on inference for differential equation models. Figshare. ( 10.6084/m9.figshare.c.7075553) [DOI] [PubMed]

Data Availability Statement

Supplementary material is available online [30].

[RSIF20230369C1] 1.van der Vegt SA, et al. 2022. Learning transmission dynamics modelling of COVID-19 using comomodels. Math. Biosci. 349, 108824. ( 10.1016/j.mbs.2022.108824) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C2] 2.Kavetski D, Kuczera G, Franks SW. 2003. Semidistributed hydrological modeling: a ‘saturation path’ perspective on TOPMODEL and VIC. Water Resour. Res. 39, 1246. ( 10.1029/2003WR002122) [DOI] [Google Scholar]

[RSIF20230369C3] 3.Whittaker DG, Clerx M, Lei CL, Christini DJ, Mirams GR. 2020. Calibration of ionic and cellular cardiac electrophysiology models. Wiley Interdiscip. Rev. Dev. Biol. 12, e1482. ( 10.1002/wsbm.1482) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C4] 4.Shertzer KW, Ellner SP, Fussmann GF, Hairston NG Jr. 2002. Predator-prey cycles in an aquatic microcosm: testing hypotheses of mechanism. J. Anim. Ecol. 71, 802-815. ( 10.1046/j.1365-2656.2002.00645.x) [DOI] [Google Scholar]

[RSIF20230369C5] 5.Gautschi W. 1997. Numerical analysis. Boston, MA: Springer Science & Business Media. [Google Scholar]

[RSIF20230369C6] 6.Dormand JR, Prince PJ. 1980. A family of embedded Runge-Kutta formulae. J. Comput. Appl. Math. 6, 19-26. ( 10.1016/0771-050X(80)90013-3) [DOI] [Google Scholar]

[RSIF20230369C7] 7.Kaipio J, Somersalo E. 2006. Statistical and computational inverse problems, vol. 160. New York, NY: Springer Science & Business Media. [Google Scholar]

[RSIF20230369C8] 8.Conrad PR, Girolami M, Särkkä S, Stuart A, Zygalakis K. 2017. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput. 27, 1065-1082. ( 10.1007/s11222-016-9671-0) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C9] 9.Stuart AM. 2010. Inverse problems: a Bayesian perspective. Acta Numer. 19, 451-559. ( 10.1017/S0962492910000061) [DOI] [Google Scholar]

[RSIF20230369C10] 10.Lambert B, Lei CL, Robinson M, Clerx M, Creswell R, Ghosh S, Tavener S, Gavaghan D. 2023. Autocorrelated measurement processes and inference for ordinary differential equation models of biological systems. J. R. Soc. Interface 20, 20220725. ( 10.1098/rsif.2022.0725) [DOI] [Google Scholar]

[RSIF20230369C11] 11.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian data analysis. New York, NY: CRC Press. [Google Scholar]

[RSIF20230369C12] 12.Virtanen P, et al. SciPy 1.0 Contributors. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261-272. ( 10.1038/s41592-019-0686-2) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C13] 13.Clerx M, Beattie KA, Gavaghan DJ, Mirams GR. 2019. Four ways to fit an ion channel model. Biophys. J. 117, 2420-2437. ( 10.1016/j.bpj.2019.08.001) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C14] 14.Birrell P, Blake J, Van Leeuwen E, Gent N, De Angelis D. 2021. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Phil. Trans. R. Soc. B 376, 20200279. ( 10.1098/rstb.2020.0279) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C15] 15.Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, Priesemann V. 2020. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science 369, eabb9789. ( 10.1126/science.abb9789) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C16] 16.Johnstone RH. 2018. Uncertainty characterisation in action potential modelling for cardiac drug safety. PhD thesis, University of Oxford, Oxford, UK.

[RSIF20230369C17] 17.Mirams G. 2018. Numerical errors from ODE solvers can mess up optimisation and inference very easily. Math. Matters Heart. See https://mirams.wordpress.com/2018/10/17/ode-errors-and-optimisation/. [Google Scholar]

[RSIF20230369C18] 18.Clerx M, Robinson M, Lambert B, Lei CL, Ghosh S, Mirams GR, Gavaghan DJ. 2019. Probabilistic inference on noisy time series (PINTS). J. Open Res. Softw. 7, 23. ( 10.5334/jors.252) [DOI] [Google Scholar]

[RSIF20230369C19] 19.Haario H, Saksman E, Tamminen J. 2001. An adaptive metropolis algorithm. Bernoulli 7, 223-242. ( 10.2307/3318737) [DOI] [Google Scholar]

[RSIF20230369C20] 20.Weiss HH. 2013. The SIR model and the foundations of public health. Mater. Math. 3, 1-17. [Google Scholar]

[RSIF20230369C21] 21.Gallagher K, Creswell R, Gavaghan D, Lambert B. 2023. Identification and attribution of weekly periodic biases in epidemiological time series data. medRxiv. ( 10.1101/2023.06.13.23290903) [DOI]

[RSIF20230369C22] 22.Salvatier J, Wiecki TV, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55. ( 10.7717/peerj-cs.55) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSIF20230369C23] 23.Kucukelbir A, Tran D, Ranganath R, Gelman A, Blei DM. 2017. Automatic differentiation variational inference. J. Mach. Learn. Res. 18, 1-45. [Google Scholar]

[RSIF20230369C24] 24.Schoups G, Vrugt J, Fenicia F, Van De Giesen N. 2010. Corruption of accuracy and efficiency of Markov chain Monte Carlo simulation by inaccurate numerical implementation of conceptual hydrologic models. Water Resour. Res. 46, W10530. ( 10.1029/2009WR008648) [DOI] [Google Scholar]

[RSIF20230369C25] 25.Schoups G, Vrugt JA. 2010. A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour. Res. 46, W10531. ( 10.1029/2009WR008933) [DOI] [Google Scholar]

[RSIF20230369C26] 26.Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS. 2005. SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw. 31, 363-396. ( 10.1145/1089014.1089020) [DOI] [Google Scholar]

[RSIF20230369C27] 27.Vrugt JA, Ter Braak C, Diks C, Robinson BA, Hyman JM, Higdon D. 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10, 273-290. ( 10.1515/IJNSNS.2009.10.3.273) [DOI] [Google Scholar]

[RSIF20230369C28] 28.rccreswell. 2024. rccreswell/ode_inference: v1.0 (v1.0). Zenodo. ( 10.5281/zenodo.10578920) [DOI] [Google Scholar]

[RSIF20230369C29] 29.Onas Dehning Sebastian, Mohr Paul, Spitzner Joao Pinheiro, Neto Johannes, Vp Michael, Osthege Lode. 2024. rccreswell/covid19_inference_forecast: v0.1.2 (v0.1.2). Zenodo. ( 10.5281/zenodo.10578938) [DOI] [Google Scholar]

[RSIF20230369C30] 30.Creswell R, Shepherd KM, Lambert B, Mirams GR, Lei CL, Tavener S, Robinson M, Gavaghan DJ. 2024. Understanding the impact of numerical solvers on inference for differential equation models. Figshare. ( 10.6084/m9.figshare.c.7075553) [DOI] [PubMed]

PERMALINK

Understanding the impact of numerical solvers on inference for differential equation models

Richard Creswell

Katherine M Shepherd

Ben Lambert

Gary R Mirams

Chon Lok Lei

Simon Tavener

Martin Robinson

David J Gavaghan

Roles

Abstract

1. Introduction

2. Effects of numerical error on computation of the log-likelihood

2.1. Log-likelihood function for an ordinary differential equation model

2.2. Error in the log-likelihood arising from approximation of the forward solution

Theorem 2.1. —

Proof. —

3. Effects of ordinary differential equation solvers on inference

3.1. Fixed step and adaptive step ordinary differential equation solvers

3.1.1. Typical log-likelihood surface shapes

Figure 1.

3.2. Fixed time step solvers

3.2.1. Forward Euler solver

3.2.2. Inference for the damped, driven oscillator using Forward Euler

Figure 2.

3.3. Adaptive step size solvers

3.3.1. Inference for the damped, driven oscillator using an adaptive step size solver

Figure 3.

Figure 4.

Figure 5.

3.3.2. Effect of jaggedness on inference algorithms

3.4. The impact of observation error magnitude on inference and sampling performance

Figure 6.

4. Fixed step solvers applied to a susceptible–infected–recovered change point model of the spread of COVID-19 in Germany

4.1. Effect of time step on the forward solution

Figure 7.

4.2. Effect of time step on the posterior distributions

Figure 8.

5. Numerical errors arising in rainfall run-off models of river streamflow data

Figure 9.

6. Discussion

Endnote

Data accessibility

Declaration of AI use

Authors' contributions

Conflict of interest declaration

Funding

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases