State-space modelling for infectious disease surveillance data: Stochastic simulation techniques and structural change detection

Christopher D Prashad

doi:10.1016/j.idm.2025.05.005

. 2025 May 21;10(4):1507–1532. doi: 10.1016/j.idm.2025.05.005

State-space modelling for infectious disease surveillance data: Stochastic simulation techniques and structural change detection

Christopher D Prashad ¹

PMCID: PMC12396556 PMID: 40896642

Abstract

We present an exploration of advanced stochastic simulation techniques for state-space models, with a specific focus on their applications in infectious disease modelling. Utilizing COVID-19 surveillance data from the province of Ontario, Canada, we employ Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) methods to detect structural changes and pre-dict future trends in case counts. Our approach begins with the application of a Kalman smoothing technique, integrated with MCMC for state sampling within local level and seasonal models, alongside Bayesian inference for non-linear dynamic regression models. We then assess the effectiveness of various priors, including normal, Student's t, Laplace, and horseshoe distributions, in capturing abrupt changes within the data using a Rao-Blackwellized par-ticle filter. Our findings highlight the superior performance of the horseshoe prior in identifying change points and adapting to complex data structures, offering valuable insights for real-time monitoring and forecasting in public health. This study emphasizes the efficacy of state-space models, particu-larly when enhanced with sophisticated prior distributions, in providing a nuanced understanding of infectious disease transmission.

Keywords: State-space modelling, Nonlinear dynamic regression, Stochastic simulation, Structural change detection, Rao-blackwellized particle filter, Infectious disease surveillance data

Graphical abstract

1. Introduction

State-space modelling has become indispensable in the analysis of dynamic systems across various fields, including economics, engineering, and public health (Barber et al., 2011; Commandeur & Koopman, 2007; Durbin & Koopman, 2012). These models provide a framework for understanding how unobservable states evolve over time based on observable data, making them particularly useful for real-time monitoring and forecasting (Harvey, 1989; Kalman, 1960a; Meinhold & Singpurwalla, 1983) (Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 10, Fig. 11, Fig. 12, Fig. 13, Fig. 14, Fig. 15, Fig. 16, Fig. 17, Fig. 18, Fig. 19, Fig. 20, Fig. 21, Fig. 22, Fig. 23, Fig. 24, Fig. 25, Fig. 26, Fig. 27, Fig. 28, Fig. 29, Fig. 30, Fig. 31, Fig. 32, Fig. 33).

Fig. 1 — Graphical depiction of the bootstrap filter.

Fig. 2 — Shrinkage probability distributions.

Fig. 3 — Local level and seasonal model for COVID-19 cases.

Fig. 4 — MCMC traceplots of the state noise variance of the level (W_μ) and seasonal components (W_γ) and observation noise variance V. State sampling by MCMC.

Fig. 5 — MCMC traceplots of the state noise variance of the level (W_μ) and seasonal components (W_γ) and observation noise variance V. State sampling by FFBS.

Fig. 6 — Local level and seasonal model for COVID-19 cases results. State sampling by FFBS.

Fig. 7 — Percent positivity by age over time.

Fig. 8 — Alternative view of percent positivity by age over time after the large epidemic wave.

Fig. 9 — OLS fitted curves at selected dates.

Fig. 11 — Comparison of MCMC and OLS smoothing estimates.

Fig. 12 — Case counts and hospitalizations.

Fig. 14 — Original data with common stochastic trend.

Fig. 15 — Posterior means of the log volatility.

Fig. 16 — Original data series for the MSV model application.

Fig. 17 — Simulated data set for change detection.

Fig. 18 — Normal distribution filter results.

Fig. 19 — T-distribution filter results.

Fig. 20 — Laplace distribution filter results.

Fig. 21 — Horseshoe distribution filter results.

Fig. 22 — Parameter estimate for normal distribution model.

Fig. 23 — Parameter estimate for t-distribution model.

Fig. 24 — Parameter estimate for Laplace distribution model.

Fig. 25 — Parameter estimate for horseshoe distribution model.

Fig. 26 — ESS for normal distribution model.

Fig. 28 — ESS for Laplace distribution model.

Fig. 29 — ESS for horseshoe distribution model.

Fig. 30 — Parameter filtering and smoothing results.

Fig. 31 — Change points identified via τ_t thresholding.

Fig. 32 — The result of change detection using PELT algorithm/binary segmentation.

Fig. 33 — Posterior means and probabilities of change points using Bayesian change point method.

In our previous work, we applied state-space models to COVID-19 surveillance data from the province of Ontario, Canada, implementing dynamic regression models and novel covariance estimation methodologies to investigate the relationships between COVID-19 case counts, hospitalizations, and environmental factors such as wastewater signal measurements (Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2022). This integration of covariates into state-space models provided a nuanced understanding of the factors driving COVID-19 transmission and public health outcomes (Dukic et al., 2012; Osthus et al., 2017).

Building on that foundation, the current article extends our exploration by focusing on the application of stochastic simulation methods, including Markov Chain Monte Carlo (MCMC) and particle filters, within the state-space modelling framework. These methods address the challenges of real-time updating and sequential estimation, which are crucial for timely decision-making in public health (Brooks et al., 2011; Doucet et al., 2001; Liu & Chen, 1998). We apply a Kalman smoothing approach with MCMC for state sampling in a local level and seasonal model to further enhance the accuracy and robustness of our models.

This work also investigates Bayesian inference for nonlinear dynamic regression models, focusing on multivariate Bayesian dynamic models and their applications to public health data (Migon et al., 2005; Prado et al., 2021; Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2022). By exploring these advanced techniques, we aim to provide deeper insights into the temporal dynamics of COVID-19 transmission, building on the foundations laid in our previous analysis.

Overall, this article demonstrates the efficacy of these advanced state-space modelling techniques, offering valuable insights into the complex relationship between different epidemiological factors and improving the robustness of predictions in public health modelling (Carvalho et al., 2010; Chopin & Papaspiliopoulos, 2020). In particular, the paper's contribution is to show, via real COVID-19 data from Ontario, how a horseshoe prior in a Rao-Blackwellized particle filter reveals structural changes more effectively than standard approaches.

2. Materials and methods

2.1. Markov chain Monte Carlo method

For the linear Gaussian state-space model, Wiener and Kalman filters provide an analytic solution wherein the number of observations are fixed and evolving, respectively. However, in the general state-space model, the derivation of an analytic solution proves to be elusive for all but a handful of the simplest cases. Historically, analytic derivation was possible through the method of conjugate prior distribution, which belongs to the same family as the final posterior distribution. More recently, computational methods have been employed to approximate numerical solutions of the targeted filtering, predictive and marginal smoothing distribution. Some simple methods that work at low state dimension include the deterministic grid filter and the stochastic importance sampling which draws a sample from a surrogate proposal distribution from which statistics are calculated by taking into account the likelihood of the target distribution. In essence, the proposal distribution is approximated by the linear Gaussian state-space model via the Laplace approximation and statistics are calculated with the aid of sampling from the proposal distribution through Kalman filtering and smoothing. Such calculations are made more amenable to analysis by assuming that the observations follow an exponential family distribution, provided that the states are known. Further, the random number based Markov chain Monte Carlo (MCMC) method (Brooks et al., 2011) has exploded in widespread applicability owing to the advances in computational resources available. The main idea in MCMC is to explore high-density regions of the target distribution by employing the properties of a Markov chain. From this process, a random number is generated from the target distribution in the general state-space model. The MCMC approach works well for the backward-looking smoothing operations when the number of observations is fixed, however suffers from the need to restart for sequential data when obtaining the filtering and predictive distribution.

For a linear Gaussian state-space model with unknown parameters, the target distribution for estimation is the joint posterior distribution p(θ_0:T, ψ|y_1:T).

Equation 2.1.

(2.1)

MCMC can be used to generate samples from a distribution of interest, p(ψ). There are several approaches, however the most basic is the Metropolis method as follows.

Algorithm 1

Metropolis method

Input. Initialize ψ⁰. Let N be the number of iterations in the calculation of the MC.

For j = 1, …, N:

Set ψ = ψ^j−1: Draw proposal $\tilde{ψ}$ from q(⋅|ψ)

Calculate acceptance ratio $α = \frac{p (\tilde{ψ})}{p (ψ)}$

If α ≥ 1 then accept proposal $\tilde{ψ}$ by setting $ψ^{j} = \tilde{ψ}$

If α < 1 then accept proposal $\tilde{ψ}$ with probability proportional to α:

by setting $ψ^{j} = \tilde{ψ}$ (acceptance), or,

by setting ψ^j = ψ^j−1 (rejection)

Output. ${ψ^{j}}_{j = 1}^{N}$ a sample from p(ψ)

Open in a new tab

The proposal distribution q(⋅|ψ) is used to obtain the new proposal value $\tilde{ψ}$ given the present value ψ. Commonly, a normal distribution is selected with mean ψ. Sampling from q(⋅|ψ) follows random walk Markov chain. The method is randomized since it is based on the acceptance ratio, α which implies that it is a Monte Carlo method.

An extension of the Metropolis method, which assumes a symmetric proposal distribution such that $q (\tilde{ψ} | ψ) = q (ψ | \tilde{ψ})$ as in a random walk, is known as Metropolis-Hastings method. The main difference is that the Metropolis-Hasting method includes appropriate factors in the acceptance ratio to account for an asymmetric proposal distribution.

Algorithm 2

Metropolis-Hastings method

Input. Initialize ψ⁰

For j = 1, …, N:

Set ψ = ψ^j−1: Draw proposal $\tilde{ψ}$ from q(⋅|ψ)

Calculate acceptance ratio $α = \frac{p (\tilde{ψ}) q (ψ | \tilde{ψ})}{p (ψ) q (\tilde{ψ} | ψ)}$

If α ≥ 1 then accept proposal $\tilde{ψ}$ by setting $ψ^{j} = \tilde{ψ}$

If α < 1 then accept proposal $\tilde{ψ}$ with probability proportional to α:

by setting $ψ^{j} = \tilde{ψ}$ (acceptance), or,

by setting ψ^j = ψ^j−1 (rejection)

Output. ${ψ^{j}}_{j = 1}^{N}$ a sample from p(ψ)

Open in a new tab

A special type of Metropolis-Hastings method is the Gibbs method whereby the proposal distribution is a full conditional distribution, which means that every variable is given except for one: q(⋅|ψ) = p(⋅|ψ_−k). At each iteration, one element ${\tilde{ψ}}_{k}$ is updated, and all others are kept fixed. The Gibbs method has acceptance ratio

\begin{array}{l} α & = \frac{p (\tilde{ψ}) \cdot q (ψ | \tilde{ψ})}{p (ψ) \cdot q (\tilde{ψ} | ψ)} \\ = \frac{p (\tilde{ψ}) \cdot p (ψ_{k} | {\tilde{ψ}}_{- k})}{p (ψ) \cdot p ({\tilde{ψ}}_{k} | ψ_{- k})} \\ = \frac{p ({\tilde{ψ}}_{k}, {\tilde{ψ}}_{- k}) \cdot p (ψ_{k} | {\tilde{ψ}}_{- k})}{p (ψ_{k}, {\tilde{ψ}}_{- k}) \cdot p ({\tilde{ψ}}_{k} | ψ_{- k})} \\ = \frac{p ({\tilde{ψ}}_{k} | {\tilde{ψ}}_{- k}) \cdot p ({\tilde{ψ}}_{- k}) \cdot p (ψ_{k} | {\tilde{ψ}}_{- k})}{p (ψ_{k} | {\tilde{ψ}}_{- k}) \cdot p (ψ_{- k}) \cdot p ({\tilde{ψ}}_{k} | ψ_{- k})} \\ = 1 . \end{array}

which shows that every proposal is accepted with probability one.

Some checks when performing MCMC are in order. Since the outputs are random numbers, checking the convergence of the Markov chain is imperative. The most common way to do this is visually via a trace plot, where the number of search steps is plotted against sample realizations. This is done for a few Markov chains, typically three or four. Good convergence occurs when the trace plot shows high intensity and corresponds to adequate exploration of the distributional sample space. This is sometimes known as adequate mixing property. Poor convergence occurs when the Markov chain does not converge and there is minimal fluctuation in the overall pattern of the plot. This corresponds to inadequate searching of the target distribution and less mixing.

When preforming smoothing using MCMC, the distribution of interest is the joint posterior p(θ_0:T, ψ|y_1:T). The Gibbs method is easily adapted to draw samples from this distribution by repeatedly sampling from the full conditional distribution as follows.

Algorithm 3

Gibbs method

Input. Initialize ψ⁰

For j = 1, …, N:

Draw $θ_{0 : T}^{j}$ from p(θ_0:T|ψ = ψ^j−1, y_1:T)

Draw ψ^j from $p (ψ | θ_{0 : T} = θ_{0 : T}^{j}, y_{1 : T})$

Output. ${θ_{0 : T}^{j}, ψ^{j}}_{j = 1}^{N}$ a sample from p(θ_0:T, ψ|y_1:T)

Open in a new tab

Sampling can be improved by performing Kalman filtering forward in time followed by sampling via Kalman smoothing backward in time. This is known as forward filtering backward sampling (FFBS) (Gamerman & Lopes, 2006) or simulation smoothing, as it is a simulation-based variant of Kalman smoothing.

Algorithm 4

Forward Filtering Backward Sampling

Run Kalman filtering with parameter ψ^j−1

Draw θ_T from $N (m_{T}, C_{T})$

For t = T − 1, …, 0:

Smoothing gain

$A_{t} \leftarrow C_{t} G_{t + 1}^{T} R_{t + 1}^{- 1}$

Draw θ_t from $N (h_{t}, H_{t})$

where h_t ← m_t + A_t[θ_t+1 − a_t+1] and

$H_{t} \leftarrow C_{t} + A_{t} [0 - R_{t + 1}] A_{t}^{T}$

Output. ${θ_{t}}_{t = T}^{0}$ a sample $θ_{0 : T}^{j}$ from p(θ_0:T|ψ = ψ^j−1, y_1:T)

Open in a new tab

To implement MCMC, a variety of software libraries are available. Some popular choices in the mathematical sciences include WinBUGS and JAGS which rely on the Gibbs method, and Stan, a probabilistic programming language which implements a hybrid Hamiltonian Monte Carlo method for enhanced sampling efficiency and flexibility.

As an illustration, consider the problem of smoothing in the linear Gaussian state-space model. Assuming a local-level model and, for the moment, known parameters, the smoothing operation can be performed using MCMC by drawing samples from the joint posterior distribution. The posterior distribution is specified in terms of the likelihood and prior distributions using Bayes theorem:

Equation 2.2.

(2.2)

The likelihood p(y_1:T|θ_0:T) relates to the cumulative product of the observation equation as, $\prod_{t = 1}^{t} p (y_{t} | θ_{t})$ and the prior distribution term p(θ_0:T) relates to the cumulative product of the state equation times the prior distribution as $\prod_{t = 1}^{T} p (θ_{t} | θ_{t - 1}) \cdot p (θ_{0})$ . This specification follows from the probability distribution formulation of the observation and state equations along with the prior distribution of the state.

Convergence of the Markov chain may be assessed using the effective sample size and the potential scale reduction factor, $\hat{R}$ . The effective sample size, or ESS, is the sample size assuming no correlation exists among samples. Small ESS in relation to the actual sample size may be an indication of convergence issues. The potential scale reduction factor $\hat{R}$ is related to the sample variance within Markov chain to between Markov chain, with a value less than one being desirable and indicative of adequate convergence of the Markov chain. Specifically, $\hat{R} = \sqrt{1 + \frac{B / W - 1}{n_{i}}}$ where W and B are within and between Markov chain sequence variance and n_i is the sample size of one Markov chain sequence.

Now suppose a slightly different problem is encountered, whereby a general state space model is regarded as linear Gaussian with unknown stochastic parameters. The model must now be modified to consider the state along with the unknown parameters as random variables. The target distribution for estimation is the joint posterior distribution p(θ_0:T, ψ|y_1:T). Further, suppose the variance of the state and observation noise are unknown and are to be jointly estimated. In specifying the model, the posterior is analogously manipulated using Bayes theorem:

Equation 2.3.

(2.3)

The likelihood term p(y_1:T|θ_0:T, ψ) relates to the cumulative product of the observation equation $\prod_{t = 1}^{T} p (y_{t} | θ_{t}, ψ)$ , whereas the prior term p(θ_0:T, ψ) relates to the cumulative product of the state $\prod_{t = 1}^{T} p (θ_{t} | θ_{t - 1}, ψ)$ times the priors p(θ₀) and p(ψ). For the prior distribution, it is reasonable to assume that the prior parameters are independent so that p(ψ) = p(W, V) = p(W) ⋅ p(V). Further, if no available information is available a priori on the distribution of W and V, a non-informative prior distribution such as a sufficiently wide uniform distribution can be used.

To obtain the marginal smoothing distribution p(θ_t|y_1:T), one may perform, in two phases, MCMC smoothing ∫p(θ_t, ψ|y_1:T) dψ followed by Kalman smoothing p(θ_t|y_1:T, ψ = ψ∗) where ψ∗ is the true value, typically fixed by MLE.

2.2. Sequential Monte Carlo method

Stochastic methods for the general state-space model include bootstrap filter (Gordon et al., 1993), auxiliary particle filter (Pitt & Shephard, 1999), Liu & West filter and Rao-Blackwellized Liu & West filter (Chen & Liu, 2000; Doucet et al., 2000; Robert & Roberts, 2021). These methods will be explained in detail.

MCMC operates as a batch method and can be inefficient if data is obtained sequentially in an online fashion due to the need for recalculation at each time point. Thus, a more intelligent sequential estimation in the general model, akin to the Kalman filter (Kalman, 1960b; Meinhold & Singpurwalla, 1983) for the linear model, is required. Several proposals have been made to deal with this problem, including unscented Kalman filter, extended Kalman filter, ensemble Kalman filter and Gaussian-sum filter to note a few, all of which sequentially estimate mean and covariance statistics of the posterior distribution. Particle filter approximates the posterior using a collection of particles possessing weight probabilities for each realization value.

To begin, importance sampling is accomplished as follows. Consider the adapted problem of sampling from proposal q instead of target distribution p:

\begin{array}{l} E_{p (θ)} [f (θ)] & = \int f (θ) p (θ) d θ \approx \frac{1}{N} \sum_{n = 1}^{N} f (θ^{(n)}) \\ = \int f (θ) \frac{p (θ)}{q (θ)} q (θ) d θ \\ = \int f (θ) w (θ) q (θ) d θ \\ = E_{q (θ)} [w (θ) f (θ)] \\ \approx \frac{1}{N} \sum_{n = 1}^{N} w (θ^{(n)}) f (θ^{(n)}), \end{array}

where $w = \frac{p}{q}$ is an importance function.

Sequential importance sampling (SIS) considers a weighted Monte Carlo approximation to p(θ_0:t, ψ|y_1:t) as

\hat{p} (θ_{0 : t}) = \sum_{n = 1}^{N} ω_{t}^{(n)} δ (θ_{0 : t} - θ_{0 : t}^{(n)}),

whereby in the sequential variation, weights are considered:

\begin{array}{l} ω_{t}^{(n)} & = \frac{p (θ_{0 : t}^{(n)} | y_{1 : t})}{q (θ_{0 : t}^{(n)} | y_{1 : t})} \\ = \frac{p (y_{t} | θ_{t}^{(n)}) p (θ_{t}^{(n)} | θ_{t - 1}^{(n)})}{q (θ_{t}^{(n)} | θ_{0 : t - 1}^{(n)}, y_{t})} \frac{p (θ_{0 : t - 1}^{(n)} | y_{1 : t - 1})}{q (θ_{0 : t - 1}^{(n)} | y_{1 : t - 1})} \\ = ω_{t - 1}^{(n)} \frac{p (θ_{t}^{(n)} | θ_{t - 1}^{(n)}) p (y_{t} | y_{1 : t - 1}, θ_{t}^{(n)})}{q (θ_{t}^{(n)} | θ_{0 : t - 1}^{(n)}, y_{1 : t})} . \end{array}

The method is a sequential Monte Carlo (SMC) (Doucet et al., 2001). SIS with resampling gives rise to particle filtering.

Algorithm 5

Particle filtering

Input. filtering distribution at time t − 1: ${θ_{t - 1}^{(n)}, ω_{t - 1}^{(n)}}_{n = 1}^{N}$

Time t update rule:

For n = 1 to N:

(realizations) Draw $θ_{t}^{(n)}$ from proposal $q (θ_{t} | θ_{0 : t - 1}^{(n)}, y_{1 : t})$

(weights) $ω_{t}^{(n)} \leftarrow ω_{t - 1}^{(n)} \frac{p (θ_{t}^{n} | θ_{t - 1}^{n}) p (y_{t} | y_{1 : t - 1}, θ_{t}^{n})}{q (θ_{t}^{n} | θ_{0 : t - 1}^{n}, y_{1 : t})}$

End

(normalization) $ω_{t}^{(n)} \leftarrow \frac{ω_{t}^{(n)}}{\sum_{n = 1}^{N} ω_{t}^{(n)}}$

(resampling) Sample from {1, …, N} with replacement, with probability proportional to $ω_{t}^{(n)}$ (n = 1, …, N) and set the index sequence k = {k₁, …, k_N} for resampling

Relabel $θ^{(k)} = {θ_{t}^{(k_{1})}, \dots, θ_{t}^{(k_{N})}}$ to obtain realizations $θ_{t}^{(n)}$ and reset every weight $ω_{t}^{(n)}$ to $\frac{1}{N}$

Output. filtering distribution at time $t : {θ_{t}^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N}$

Open in a new tab

Some properties of particle filter are provided. Setting the proposal function q equal to the state equation produces a bootstrap filter or Monte Carlo filter. Particle degeneracy may occur when weights are biased toward only few particles. Resampling reuses a limited number of particles efficiently and is necessary due to finite particles. Resampling eliminates particles with small weights and accumulates those with large weights in the filtering distribution. Effective sample size (ESS) (Kong et al., 1994), a proxy of the variance of the importance weights, may be used to determine when to resample. ESS provides an approximation of the number of independent samples from the target distribution that would be required to provide an estimate of comparable variance. An approximate solution that experiences accumulation of errors as it processes the series and the filter process should be restarted periodically to avoid deterioration of estimation accuracy.

Monte Carlo (bootstrap) filter (Gordon et al., 1993) is depicted next.

The mathematical foundation of SMC is grounded in probability theory. Target distribution p is approximated by weighted Monte Carlo samples, or particles

{θ_{0 : t}^{(n)}, ψ^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N} \sim p (θ_{0 : t}, ψ | y_{1 : t}),

such that $N \to \infty, ω_{t}^{(n)} > 0, \sum_{n = 1}^{N} ω_{t}^{(n)} = 1$ . Then for p-integrable function $Φ_{t} : R^{t \times p} \to R$ , the empirical distribution converges asymptotically to p as N → ∞:

\sum_{n = 1}^{N} ω_{t}^{(n)} Φ_{t} (θ_{0 : t}^{(n)}) \overset{a . s .}{⟶} E_{p} (Φ_{t}) = \int Φ_{t} (θ_{0 : t}) p (θ_{0 : t} | y_{1 : t}) d θ_{0 : t},

Some convergence results are as follows. Further, Chopin (2004) (Chopin, 2004) and Del Moral (2004) (Del Moral, 2004) contain extensive results. Let

\begin{array}{l} {\bar{Φ}}_{t} & = E_{p} (Φ_{t}) = \int Φ_{t} (θ_{0 : t}) p (θ_{0 : t} | y_{1 : t}) d θ_{0 : t}, \\ {\hat{Φ}}_{t} & = \int Φ_{t} (θ_{0 : t}) \hat{p} (θ_{0 : t} | y_{1 : t}) d θ_{0 : t} = \sum_{n = 1}^{N} ω_{t}^{(n)} Φ_{t} (θ_{0 : t}^{(n)}) . \end{array}

Then there exists a constant C such that for any r > 0,

Equation 2.4.

(2.4)

Equation 2.5.

(2.5)

Particle prediction is outlined below.

Algorithm 6

Particle prediction

Input. k − 1-steps-ahead predictive distribution at time $t + k - 1 : {θ_{t - 1}^{(n)}, ω_{t + k - 1}^{(n)}}_{n = 1}^{N}$

Time t + k update rule:

For n = 1 to N:

(realizations) Draw $θ_{t + k}^{(n)}$ from state equation $p (θ_{t + k} | θ_{t + k - 1}^{(n)})$

(weights) $ω_{t + k}^{(n)} \leftarrow ω_{t + k - 1}^{(n)}$

End

Output. k-Steps-ahead predictive distribution at time $t + k : {θ_{t + k}^{(n)}, ω_{t + k}^{(n)}}_{n = 1}^{N}$

Open in a new tab

Particle smoothing is accomplished by a forward filtering pass followed by a backward simulation.

Auxiliary particle filter (APF) resamples past particles with weights proportional to proposal $p (y_{t} | {\hat{θ}}_{t}^{(n)})$ , where ${\hat{θ}}_{t}^{(n)} = E [θ_{t} | θ_{t - 1}^{(n)}]$ is an estimate of the current state. The resampling is done by creating an index sequence, or auxiliary variable k, and then drawing realizations from the state equation. This prescreens the particles on the basis of the current observations. At random, it selects which of the particles from the previous recursion to propagate over time while conditioning on the current observation so that it is more likely to select a particle with a high likelihood. The APF can sometimes experience performance degradation due to bias of the finite particles in the prior, which reduces diversity of the particles at the outset.

Algorithm 7

Forward Filtering Backward Simulation

Input. Smoothing distribution at time t + 1: ${s_{t + 1}^{(n)}, ρ_{t + 1}^{(n)}}_{n = 1}^{N}$

Time t update rule:

For n = 1 to N:

(smoothing weights) $ρ_{t}^{(n)} \leftarrow ω_{t}^{(n)} p (s_{t + 1}^{(n)} | θ_{t}^{(n)})$

End

(normalization) $ρ_{t}^{(n)} \leftarrow \frac{ρ_{t}^{(n)}}{\sum_{n = 1}^{N} ρ_{t}^{(n)}}$

(resampling) Sample from {1, …, N} with replacement, with probability proportional to $ρ_{t}^{(n)}$ (n = 1, …, N) and set the index sequence k = {k₁, …, k_N} for resampling

Relabel $s_{t}^{(k)} = {s_{t}^{(k_{1})}, \dots, s_{t}^{(k_{N})}}$ to obtain realizations $s_{t}^{(n)}$ and reset every weight $ρ_{t}^{(n)}$ to $\frac{1}{N}$

Output. Smoothing distribution at time $t : {s_{t}^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N}$

Open in a new tab

Algorithm 8

Auxiliary particle filtering

Input. filtering distribution at time t − 1: ${θ_{t - 1}^{(n)}, ω_{t - 1}^{(n)}}_{n = 1}^{N}$

Time t update rule:

(resampling) Sample from {1, …, N} with replacement, with probability proportional to $ω_{t - 1}^{(n)} p (y_{t} | y_{1 : t - 1}, {\hat{θ}}_{t}^{(n)}), n = 1, \dots, N$ and set the index sequence k = {k₁, …, k_N} for resampling, where ${\hat{θ}}_{t}^{(n)} = E [θ_{t} | θ_{t - 1}^{(n)}]$

For n = 1 to N:

(realizations) Draw $θ_{t}^{(n)}$ from state equation $p (θ_{t} | θ_{t - 1}^{(k_{n})})$

(weights) $ω_{t}^{(n)} \leftarrow \frac{p (y_{t} | y_{1 : t - 1}, θ_{t}^{(n)})}{p (y_{t} | y_{1 : t - 1}, {\hat{θ}}_{t}^{(k_{n})})}$

End

(normalization) $ω_{t}^{(n)} \leftarrow \frac{ω_{t}^{(n)}}{\sum_{n = 1}^{N} ω_{t}^{(n)}}$

Output. filtering distribution at time $t : {θ_{t}^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N}$

Open in a new tab

For parameter estimation, a weighted kernel density method is used.

Algorithm 9

Kernel smoothing

Time t update rule:

moving average for parameters

(mean) $μ^{(n)} \leftarrow a ψ_{t - 1}^{(n)} + (1 - a) E_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

(variance reduction) $γ \leftarrow (1 - a^{2}) {Var}_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

For n = 1 to N:

(realizations) Draw $ψ_{t}^{(n)}$ from a continuous proposal with mean μ⁽ⁿ⁾ and variance γ

End

Open in a new tab

Combining APF with parameter estimation produces the widely-used Liu & West filter (Liu et al., 2001).

Algorithm 10

Liu and West filter

Input. filtering distribution at time $t - 1 : {ψ_{t - 1}^{(n)}, θ_{t - 1}^{(n)}, ω_{t - 1}^{(n)}}_{n = 1}^{N}$

Time t update rule:

moving average for parameters

(mean) $μ^{(n)} \leftarrow a ψ_{t - 1}^{(n)} + (1 - a) E_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

(variance reduction) $γ \leftarrow (1 - a^{2}) {Var}_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

(resampling) Sample from {1, …, N} with replacement, with probability proportional to $ω_{t - 1}^{(n)} p (y_{t} | y_{1 : t - 1}, {\hat{ψ}}_{t}^{(n)}, {\hat{θ}}_{t}^{(n)}), n = 1, \dots, N$ and set the index sequence k = {k₁, …, k_N} for resampling, where ${\hat{ψ}}_{t}^{(n)} = μ^{(n)}, {\hat{θ}}_{t}^{(n)} = E [θ_{t} | θ_{t - 1}^{(n)}, μ^{(n)}]$

For n = 1 to N:

(realizations of parameters) Draw $ψ_{t}^{(n)}$ from a continuous proposal with mean $μ^{(k_{n})}$ and variance γ

(realizations of state) Draw $θ_{t}^{(n)}$ from the state equation $p (θ_{t} | θ_{t - 1}^{(k_{n})}, ψ_{t}^{(n)})$

(weights) $ω_{t}^{(n)} \leftarrow \frac{p (y_{t} | y_{1 : t - 1}, ψ_{t}^{(n)}, θ_{t}^{(n)})}{p (y_{t} | y_{1 : t - 1}, {\hat{ψ}}_{t}^{(k_{n})}, {\hat{θ}}_{t}^{(k_{n})})}$

End

(normalization) $ω_{t}^{(n)} \leftarrow \frac{ω_{t}^{(n)}}{\sum_{n = 1}^{N} ω_{t}^{(n)}}$

Output. filtering distribution at time $t : {ψ_{t}^{(n)}, θ_{t}^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N}$

Open in a new tab

For fixed parameter ψ, particles {θ_t, ψ⁽ⁱ⁾} approximate p(θ_t, ψ|y_1:t) such that

p (ψ | y_{1 : t}) \approx \sum_{n = 1}^{N} f_{N} (ψ; μ^{(n)}, (1 - a^{2}) γ),

where $a = \frac{3 Δ - 1}{2 Δ}$ is the shrinkage factor for discount factor Δ ∈ (0, 1).

Rao-Blackwellization (Doucet et al., 2000; Liu & Chen, 1998) may be used to hybridize the Kalman and particle filters for the calculation of the joint filtering distribution p(θ_t, ψ|y_1:t). By Bayes theorem, p(θ_t, ψ|y_1:t) = p(θ_t|y_1:t, ψ) ⋅ p(ψ|y_1:t). Kalman and particle filter are applied to the first and second terms in the product, respectively. Rao-Blackwellization modifies the basic particle filter by treating realizations of the state θ_t as their parameters (mean m_t and covariance C_t of the filtering distribution) and Kalman filter is used to compute them at one time point. Further, the observation equation of the likelihood given state θ_t is treated as one-step-ahead predictive likelihood given parameters ψ whereby Kalman filter is used to compute them at one time point.

Rao-Blackwellized Liu & West particle filter is presented as follows.

Algorithm 11

Rao-Blackwellized Liu and West filter

Input. filtering distribution at time $t - 1 : {ψ_{t - 1}^{(n)}, m_{t - 1}^{(n)}, C_{t - 1}^{(n)}, ω_{t - 1}^{(n)}}_{n = 1}^{N}$

Time t update rule:

moving average for parameters

(mean) $μ^{(n)} \leftarrow a ψ_{t - 1}^{(n)} + (1 - a) E_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

(variance reduction) $γ \leftarrow (1 - a^{2}) {Var}_{ω_{t - 1}^{(n)}} [ψ_{t - 1}^{(n)}]$

(resampling) Sample from {1, …, N} with replacement, with probability proportional to $ω_{t - 1}^{(n)} p (y_{t} | y_{1 : t - 1}, {\hat{ψ}}_{t}^{(n)}, {\hat{θ}}_{t}^{(n)}), n = 1, \dots, N$ and set the index sequence k = {k₁, …, k_N} for resampling, where ${\hat{ψ}}_{t}^{(n)} = μ^{n}$ and calculate the one-step-ahead predictive likelihood $p (y_{t} | y_{t - 1}, {\hat{ψ}}_{t}^{(n)})$ as $N (y_{t}; f_{t}^{(n)}, Q_{t}^{(n)})$ using $f_{t}^{(n)}$ and $Q_{t}^{(n)}$ obtained from a Kalman filter with observations y_t, state ${m_{t - 1}^{(n)}, C_{t - 1}^{(n)}}$ , parameters μ⁽ⁿ⁾

For n = 1 to N:

(realizations of parameters) Draw $ψ_{t}^{(n)}$ from a continuous proposal with mean $μ^{(k_{n})}$ and variance γ

(realizations of state) Obtain $m_{t}^{(n)}$ and $C_{t}^{(n)}$ from a Kalman filter with observations y_t, state ${m_{t - 1}^{(k_{n})}, C_{t - 1}^{(k_{n})}}$ , parameters $ψ_{t}^{(n)}$ .

(weights) $ω_{t}^{(n)} \leftarrow \frac{p (y_{t} | y_{1 : t - 1}, ψ^{(n_{t})})}{p (y_{t} | y_{1 : t - 1}, {\hat{ψ}}_{t}^{(k_{n})})}$ where the numerator is calculated as $N (y_{t}; f_{t}^{(n)}, Q_{t}^{(n)})$ using realizations of state step and the denominator is calculated as $N (y_{t}; f_{t}^{(k_{n})}, Q_{t}^{(k_{n})})$ using resampling step

End

(normalization) $ω_{t}^{(n)} \leftarrow \frac{ω_{t}^{(n)}}{\sum_{n = 1}^{N} ω_{t}^{(n)}}$

Output. filtering distribution at time $t : {ψ_{t}^{(n)}, m_{t}^{(n)}, C_{t}^{(n)}, ω_{t}^{(n)}}_{n = 1}^{N}$

Open in a new tab

2.3. Online structural change detection

Structural breaks are permanent disruptions of the level, slope, or seasonal component of the time series which indicates notable changes in the observed process. Change point detection (Adams & MacKay, 2007) is a type of anomaly detection that is central to ascertaining structural change in time series. Treating the variance of the state noise w_t as time-varying via application of heavy-tailed distribution allows tracking of abrupt changes in the underlying state dynamics. Similarly, rare outliers may be handled by modelling the variance of the observation noise v_t as time-varying.

As a model for structural breaks, consider the random walk plus noise model:

Equation 2.6.

(2.6)

Equation 2.7.

(2.7)

The state noise w_t contains a scale factor in its variance that is time-varying. The degree of ‘rarity’ of a structural break can not be assumed, thus an assumption of normality may be inappropriate depending on the application.

Shrinkage priors aim to shrink small effects to zero while maintaining true large effects and increase the bias of the model in order to reduce variance and improve overall model performance (lower generalization error). Here, consider the following hierarchical priors, formulated as scale mixture of normals (West, 1987), on the w_t: ridge, local Student's t, Bayesian lasso, and horseshoe.

The ridge prior (Hsiang, 1975) parameter λ determines the degree of shrinkage, with larger values resulting in smaller prior variation leading to more shrinkage of the state noise toward zero.

\begin{array}{l} w_{t} | λ, σ^{2} & \sim Normal (0, \frac{σ^{2}}{λ}), for t = 1, \dots, T, \\ λ & \sim half - Cauchy (0, 1) . \end{array}

Local Student's t Prior (Shephard, 1994) implies a Student's t_ν prior on w_t (Bishop, 2006).

\begin{array}{l} w_{t} | τ_{t}^{2}, σ^{2} & \sim Normal (0, σ^{2} τ_{t}^{2}), \\ τ_{t}^{2} | ν, λ & \sim Inv - Gamma (\frac{ν}{2}, \frac{ν}{2 λ}) for t = 1, \dots, T, \\ λ & \sim half - Cauchy (0, 1) . \end{array}

Alternatively, setting $τ_{t}^{2} | ν \sim$ Inv-Gamma $(\frac{ν}{2}, \frac{ν}{2})$ gives w_t ∼ Student-t(ν) (Bishop, 2006).

Bayesian lasso (Park & Casella, 2008) is implemented as follows:

\begin{array}{l} w_{t} | τ_{t}^{2}, σ^{2} & \sim Normal (0, σ^{2} τ_{t}^{2}), \\ τ_{t}^{2} | λ^{2} & \sim Exponential (\frac{λ^{2}}{2}) for t = 1, \dots, T, \\ λ & \sim half - Cauchy (0, 1) \end{array}

Integrating out $τ_{t}^{2}$ yields double-exponential (Laplace) priors on the state noise:

w_{t} | λ, σ \sim Double - Exponential (0, \frac{σ}{λ}),

for t = 1 …, T.

Specifying the horseshoe (Carvalho et al., 2010):

\begin{array}{l} w_{t} | τ_{t}^{2} & \sim Normal (0, τ_{t}^{2}), \\ τ_{t} | λ & \sim half - Cauchy (0, λ) for t = 1, \dots, T, \\ λ & \sim half - Cauchy (0, σ) . \end{array}

For fixed λ = σ = 1, the prior for the shrinkage factor $κ_{t} = \frac{1}{1 + τ_{t}^{2}}$ resembles a horseshoe shaped Beta(0.5,0.5) prior. For large coefficients: κ_t ≈ 0 (no shrinkage) whereas for small coefficients: κ_t ≈ 1 (heavy shrinkage). A global-local shrinkage prior: time-specific local shrinkage component τ_t as well as a global shrinkage component λ (Van et al., 2019). The so-called sparsity-inducing priors are depicted below.

3. Applications to understanding COVID-19 data

To illustrate the aforementioned concepts, in Ontario by first considering, a Kalman smoothing approach with MCMC used for the state sampling Table 1 for a local level and seasonal model with COVID-19 case data from the province of Ontario, Canada (Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2022).

Table 1.

Parameter estimate statistics for the inference approach with MCMC state sampling.

Parameter	Mean	SD	95 % credible interval	n_eff	$\hat{R}$
W_μ	8.290e-03	8.114e-04	(6.817e-03, 1.001e-02)	1112.70	1.003537
W_γ	5.579e-04	9.392e-05	(3.977e-04, 7.693e-04)	244.82	1.014275
V	2.802e-03	5.289e-04	(1.831e-03, 3.897e-03)	440.24	1.007148

Open in a new tab

The results Table 2 of the local level and seasonal model for COVID-19 cases without MCMC state sampling are shown below.

Table 2.

Parameter estimate statistics for the inference approach without MCMC state sampling.

Parameter	Mean	SD	95 % credible interval	n_eff	$\hat{R}$
W_μ	8.425e-03	8.554e-04	(6.939e-03, 1.024e-02)	2234.12	1.001947
W_γ	2.208e-04	5.299e-05	(1.345e-04, 3.433e-04)	2482.81	1.003342
V	3.327e-03	5.392e-04	(2.275e-03, 4.398e-03)	2186.27	1.001372

Open in a new tab

3.1. Bayesian inference for nonlinear dynamic regression models

Multivariate Bayesian dynamic models and applications are presented. Bayesian inference can be readily used to model multiple time series such as disease percent positivity by age over time, case counts and hospitalization common trend identification, or case counts and hospitalization stochastic volatilities. Throughout, consider COVID-19 data obtained from the province of Ontario, Canada (Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2022).

First consider the following data of percent positivity by age over time.

An alternative view of positivity after the large epidemic wave is shown below.

Observed and OLS fitted curves at selected dates are plotted and shown below.

Some diagnostics of the OLS fitting are as follows.

Next, turn to dynamic estimation of positivity curve as follows.

\begin{array}{l} y_{t} & = F θ_{t} + v_{t}, v_{t} \sim N (0, V), \\ θ_{t} & = G θ_{t - 1} + w_{t}, w_{t} \sim N (0, W), \end{array}

where $y_{t} = {[y_{1, t}, \dots, y_{m, t}]}^{T}$ , $θ_{t} = {[β_{1, t}, β_{2, t}, β_{3, t}]}^{T}$ , $F = [\begin{array}{l} 1 & h_{2} (x_{1}) & h_{3} (x_{1}) \\ 1 & h_{2} (x_{2}) & h_{3} (x_{2}) \\ ⋮ & ⋮ & ⋮ \\ 1 & h_{2} (x_{m}) & h_{3} (x_{m}) \end{array}], G = diag (ψ_{1}, ψ_{2}, ψ_{3})$ , $V = diag (ϕ_{y, 1}^{- 1}, \dots, ϕ_{y, m}^{- 1}), W = diag (ϕ_{θ, 1}^{- 1}, ϕ_{y, 2}^{- 1}, ϕ_{y, 3}^{- 1})$ .

Computation of the MCMC smoothing and OLS estimates are conducted. The results are shown in the following plot and table Table 3.

Table 3.

MCMC posterior expectation approximations.

Parameter	Mean	SE
ψ₁	0.9737	0.0002
ψ₂	0.9841	0.0001
ψ₃	0.9885	0.0001
ϕ_θ,1	0.0520	0.0001
ϕ_θ,2	0.0535	0.0001
ϕ_θ,3	0.0802	0.0001
ϕ_y,1	0.0096	0.0001
ϕ_y,2	0.0245	0.0001
ϕ_y,3	0.0090	0.0001
ϕ_y,4	0.0130	0.0001
ϕ_y,5	0.0088	0.0001

Open in a new tab

Next, consider the application of common trend identification between case counts and hospitalizations. This approach is useful for reducing the dimension of multiple correlated time series.

For analysis, consider a dynamic factor model (DFM) which can be used to extract a common stochastic trend from multiple integrated time series. The goal is to explain fluctuations of various series by common latent factors that affect a set of variables simultaneously.

DFM parameter estimates are computed and recorded in the following table Table 4.

Table 4.

DFM parameter estimates.

Parameter	α	σ_μ	σ₁	σ₂	σ₁₂
Posterior mean	222046.20	2.88e-05	3.9832	3299	−10126.43
Std. Dev.	3.996223	9.26e-08	0.06	876.97	6.22
5 % quantile	222038.4	2.86e-05	3.87	1580.12	−10138.62
95 % quantile	222054.1	2.90e-05	4.10	5017.79	−10114.23

Open in a new tab

The posterior mean of common stochastic trend is shown in the following figure. The result shows that in this application, the trend closely follows the pattern of the hospitalizations.

Now, consider multivariate stochastic volatility model, which consists of correlated shocks in the count series and correlated log variances.

Diagnostics of leave-one-out information criterion show extreme values for Aug. 8, 2021 when there was a rapid fall in cases and Apr. 18, 2022 when there was a sharp drop in hospitalizations.

3.2. Application of online structural change detection method

Next, consider a simulated data set with two notable change points at t = 60 and t = 140.

The normal distribution filter results show that the mean estimate from the particle filter closely follows the observations throughout the time series. The time-varying Kalman filter mean is almost identical to the particle filter mean, indicating a strong agreement between the two methods. The normal prior provides stable and accurate estimates, effectively capturing the overall trend without significant deviation.

The performance of the different priors is quantitatively evaluated by their log-likelihood values, which measure the fit of the models to the observed data Table 5. Higher log-likelihood values indicate a better fit. The following table summarizes the log-likelihoods for each prior.

Table 5.

Log-likelihood of the models considered.

Prior	ℓ(V, W)
Normal	1579.879
Student's t	1910.972
Laplace	1675.949
Horseshoe	2346.218

Open in a new tab

The horseshoe prior has the highest log-likelihood value (2346.218), indicating that it provides the best fit to the data among the priors considered.

The above plots of the time-varying factors for state noise variance τ_t using different distributional assumptions allows for the determination of the impact of these assumptions on the detection of structural changes in the data. The results are depicted in the plots, and show the mean and 50 % intervals of the time-varying factors under various priors: normal, Student's t, Laplace, and horseshoe.

In summary, from the parameter estimates, it is clear that the normal distribution model struggles with identifying the change points while the heavy-tailed distributions are able to identify the change points. The Laplace model gives the correct change points if the parameter is thresholded at 1.5 while the t-distribution model at 10 and the horseshoe distribution model at 7, owing to the differences in the scales due to the distribution construction.

Finally, it is also interesting to observe that the data contains a less abrupt change at t = 96 for which there are notable, but lower parameter signals present in the parameter estimate plots as well.

The effective sample size (ESS) for state noise variance using different distributional assumptions is a crucial metric for evaluating the performance of sampling methods in capturing the underlying data distribution efficiently.

4. Discussion

Combining MCMC and Kalman filter may be used to draw a sample from p(θ_0:T, ψ|y_1:T) as follows. By Bayes theorem, p(θ_0:T, ψ|y_1:T) = p(θ_0:T|ψ, y_1:T) ⋅ p(ψ|y_1:T). Then, perform in the following order.

1.
Parameter estimation: Draw a sample from p(ψ|y_1:T) by MCMC;
2.
State estimation: Draw a sample from p(θ_0:T|ψ, y_1:T) by FFBS.

In the above illustration, the marginal smoothing distribution is p(θ_t|y_1:T). The MCMC method estimates ∫p(θ_t, ψ|y_1:T)dθ whereas Kalman smoothing calculates p(θ_t|y_1:T, ψ = ψ∗) where ψ∗ is the true value. In theory, the two approaches should produce similar results, however there is no guarantee in practical application.

The estimation accuracy can be improved by various means. In theory, the performance gains can be made by increasing the number of iterations of the Markov chain. More recently, the replica exchange algorithm has been shown to deal with the problem of having the Markov chain getting trapped in a localized region of the target distribution. Practically, the general state-space model can be estimated by resorting to combining the effectiveness of the Kalman filter in the linear Gaussian state-space model with the MCMC approach which can improve estimation accuracy. Initially, the linear Gaussian state-space model is used with unknown parameters regarded as random variables. The model can then be treated as a linear Gaussian state-space model if the parameters are known or can be estimated by maximum likelihood estimation. If the parameters are unknown and to be jointly estimated along with the underlying states, the target posterior distribution p(θ_0:T, ψ|y_1:T) requires the general state-space model.

A combined algorithm is obtained by using MCMC to estimate the parameters ψ and then using this estimation result in a Kalman filter. MCMC estimates the posterior distribution p(ψ|y_1:T) ∝ p(y_1:T|ψ) ⋅ p(ψ). The likelihood term of the linear Gaussian state-space model can be written according to:

\begin{array}{l} ℓ (ψ) & = \log p (y_{1}, \dots, y_{T}; ψ) \\ = \sum_{t = 1}^{T} \log p (y_{t} | y_{1 : t - 1}; ψ) \\ = - \frac{1}{2} \sum_{t = 1}^{T} \log | Q_{t} | - \frac{1}{2} \sum_{t = 1}^{T} \frac{{(y_{t} - f_{t})}^{2}}{Q_{t}} . \end{array}

and the prior distribution can be specified to be non-informative. The maximum likelihood estimates can then be fed into the Kalman filter for estimation. Alternatively, FFBS can efficiently provide samples from the joint posterior distribution p(θ_0:T, ψ|y_1:T) = p(θ_0:T|ψ, y_1:T) ⋅ p(ψ|y_1:T).

Specifically, the term for the state follows linear Gaussian state-space model which can be efficiently sampled by using FFBS. As in the first approach, the distribution of the parameters given the observations is sampled from using the Kalman filter augmented MCMC method. The key when estimating random parameters in the linear Gaussian state-space model in tandem with the state is to utilize MCMC for the parameter estimation and this is what the two phase approach above accomplishes. That is, the two-step approach uses MCMC to draw samples from p(ψ|y_1:T) and then FFBS to draw samples from p(θ_0:T|ψ, y_1:T) to provide the joint posterior distribution p(θ_0:T, ψ|y_1:T) = p(θ_0:T|ψ, y_1:T) ⋅ p(ψ|y_1:T) as illustrated above. Such algorithm fusion proves to be helpful and will be returned to for stochastic simulation with the particle filter.

The comparative analysis of ESS under different priors provides valuable insights into their respective sampling efficiencies and robustness. The normal prior maintains a consistently high ESS with minimal variability, making it suitable for stable data with minimal structural changes. The Student's t prior, with its sensitivity to outliers, shows more variability in ESS but maintains high efficiency overall. The Laplace prior, promoting sparsity, experiences more frequent dips in ESS but quickly recovers, indicating a balance between stability and adaptability. The horseshoe prior, with its significant ESS fluctuations, excels in detecting structural changes and sparsity, making it highly effective for complex data structures with abrupt shifts.

This further highlights the importance of selecting appropriate priors based on the data characteristics and specific analysis requirements. The normal prior is ideal for stable data, while the Student's t and Laplace priors offer advantages in handling outliers and promoting sparsity, respectively. The horseshoe prior stands out for its ability to adapt to complex data structures, despite the greater fluctuations in ESS.

The application of the Rao-Blackwellized Liu & West filtering methodology with a horseshoe prior to the Ontario COVID-19 case count data (Ontario Agency for Health Protection and Promotion (Public Health Ontario), 2022) provides insightful results. The algorithm identifies two notable clusters of large values for the time-varying factor τ_t. By thresholding these values, two distinct change points are detected. These change points are plotted along with the original COVID-19 case count data, highlighting periods where significant shifts in the data occurred.

In contrast, the Pruned Exact Linear Time (PELT) and binary segmentation algorithm (Killick & Eckley, 2014) identifies only one change point, located close to the largest τ_t signal. This suggests that while PELT is effective at detecting significant changes, it may overlook smaller yet important shifts in the data. The detected change point by PELT closely aligns with the most prominent spike in the τ_t values, corroborating the significance of this specific change.

The Bayesian change point detection method (Erdman & Emerson, 2008) experiences difficulty in clearly identifying change points within the original dataset. The posterior probability exceeds 0.5 in multiple regions, indicating uncertainty and potential over-detection of change points. This method's sensitivity leads to multiple identified regions, making it challenging to pinpoint distinct change points with high confidence.

Comparing the methods, the horseshoe prior in the Rao-Blackwellized Liu & West filter shows a balanced ability to detect significant change points while maintaining specificity. The PELT algorithm, although robust, may miss smaller changes, focusing on the most pronounced shifts. The Bayesian method's high sensitivity can lead to over-detection, identifying multiple regions of change with lower confidence.

In practical applications, such as analyzing COVID-19 case counts, selecting the appropriate change point detection method is crucial. The prior-based filter provides a clear advantage by identifying significant changes without over-detection, offering a reliable tool for understanding and responding to shifts in epidemic data. This comparison illustrates the importance of method selection based on the data characteristics and the specific needs of the analysis, ensuring accurate and actionable insights.

5. Conclusion

The methodology presented here is widely applicable and can be used as part of multivariate time series analysis. For example, the method is adaptable to time-varying coefficient regression prediction for which piecewise linear regression methods have been implemented for the change detection (Liu et al., 2023). In addition, previous works have provided an integrated quantitative approach for data assimilation, prediction, and anomaly detection in real-time public health surveillance (Bettencourt et al. Zeng, 2007), emphasizing the development of dynamic probabilistic models to predict future disease cases and detect anomalies using Bayesian methods and advanced information technology. Another line of research has employed Bayesian model averaging with change-point models to estimate the timing and magnitude of vaccine-associated changes, while controlling for seasonality and other covariates (Kürüm et al., 2017).

We have demonstrated how dynamic regression models provide a framework for analyzing relationships between time series data and state-space models are able to capture the signal from the data, determine unexpected movement, perform anomaly detection and forecast with uncertainty quantification. Stochastic numerical methods for estimation such as Markov chain Monte Carlo and particle filters may be used for the general state-space model.

Our state-space modelling analysis on Ontario's COVID-19 surveillance data specifically identified two critical structural change points, corresponding to significant shifts in case count trajectories, using a Rao-Blackwellized particle filter augmented by a horseshoe prior. These change points, occurring during heightened transmission periods, align with key COVID-19 waves observed in Ontario. The model's sensitivity to abrupt shifts allowed it to accurately capture periods of rapid case increase, which are crucial for timely public health interventions. By integrating dynamic regression models with MCMC and SMC methods, we explored relationships between case counts, hospitalizations, and wastewater signals, yielding a detailed view of how these covariates interact and drive transmission. The horseshoe prior, selected for its ability to shrink small effects while highlighting meaningful changes, effectively isolated high-magnitude changes in case counts, providing reliable indicators of emerging transmission waves. Compared to other priors like normal and Laplace, the horseshoe prior demonstrated superior fit with higher log-likelihood values, reinforcing its utility in epidemic modelling where structural changes are frequent and impactful. These findings suggest that adaptive state-space modelling can enhance surveillance systems by identifying key inflection points in epidemic trends, thereby supporting proactive public health responses in Ontario.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: None.

Acknowledgements

The author wishes to express gratitude to Professor Jianhong Wu for his invaluable discussions and support throughout this work. This work was partially funded by the NSERC/Mitacs/Sanofi Alliance program.

Handling Editor: Dr Yijun Lou

References

Adams R.P., MacKay D.J.C. Bayesian online changepoint detection. 2007. https://arxiv.org/abs/0710.3742
Barber D., Cemgil A.T., Chiappa S. Cambridge University Press; 2011. Bayesian time series models. [Google Scholar]
Bettencourt L.M.A., et al. In: Zeng D., editor. Vol. 4506. Lecture Notes in Computer Science. Springer; Berlin, Heidelberg: 2007. Towards real time epidemiology: Data assimilation, modeling and anomaly detection of health surveillance data streams; pp. 79–90. (Intelligence and security informatics: Biosurveillance). et al. [Google Scholar]
Bishop C.M. Springer; 2006. Pattern recognition and machine learning. [Google Scholar]
Brooks S., et al. CRC Press; 2011. Handbook of Markov chain Monte Carlo. [Google Scholar]
Carvalho C.M., Polson N.G., Scott J.G. The horseshoe estimator for sparse signals. Biometrika. 2010;97(2):465–480. [Google Scholar]
Chen R., Liu J. Mixture kalman filter. Journal of the Royal Statistical Society: Series B. 2000;62:493–508. [Google Scholar]
Chopin N. Central limit theorem for sequential Monte Carlo and its application to Bayesian inference. Annals of Statistics. 2004;32(6):2385–2411. [Google Scholar]
Chopin N., Papaspiliopoulos O. Springer; 2020. An introduction to sequential Monte Carlo. [Google Scholar]
Commandeur J.J., Koopman S.J. Oxford University Press; 2007. An introduction to state space time series analysis. [Google Scholar]
Del Moral P. Springer; 2004. Feynman-kac formulae. [Google Scholar]
Doucet A., de Freitas N., Gordon N. Springer; 2001. Sequential Monte Carlo methods in practice. [Google Scholar]
Doucet A., et al. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. 2000. Rao–Blackwellised particle filtering for dynamic Bayesian networks; pp. 176–183. [Google Scholar]
Dukic V., Lopes H.F., Polson N.G. Tracking epidemics with Google flu trends data and a state-space SEIR model. Journal of the American Statistical Association. 2012;107(500):1410–1426. doi: 10.1080/01621459.2012.713876. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durbin J., Koopman S.J. 2nd ed. Oxford University Press; 2012. Time series analysis by state space methods. [Google Scholar]
Erdman C., Emerson J.W. bcp: an R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software. 2008;23:1–13. [Google Scholar]
Gamerman D., Lopes H.F. 2nd ed. CRC Press; 2006. Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. [Google Scholar]
Gordon N.J., Salmond D.J., Smith A.F.M. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. [DOI] [Google Scholar]
Harvey A.C. Cambridge University Press; 1989. Forecasting, structural time series models and the kalman filter. [Google Scholar]
Hsiang T.C. A bayesian view on ridge regression. The Statistician. 1975;24(4):267. [Google Scholar]
Kalman R.E. A new approach to linear filtering and prediction problems. Transactions of the ASME – Journal of Basic Engineering (Series D) 1960;82:35–45. [Google Scholar]
Kalman R.E. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering Series D. 1960;82:35–45. [Google Scholar]
Killick R., Eckley I.A. changepoint: an R package for changepoint analysis. Journal of Statistical Software. 2014;58(3):1–19. [Google Scholar]
Kong A., Liu J.S., Wong W.H. Sequential imputations and bayesian missing data problems. Journal of the American Statistical Association. 1994;89(425):278–288. [Google Scholar]
Kürüm E., et al. Bayesian model averaging with change points to assess the impact of vaccination and public health interventions. Epidemiology. 2017;28(6):889–897. doi: 10.1097/EDE.0000000000000719. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J.S., Chen R. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association. 1998;443(93):1032–1044. [Google Scholar]
Liu J., et al. A new time-varying coefficient regression approach for analyzing infectious disease data. Scientific Reports. 2023;13 doi: 10.1038/s41598-023-41551-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J., West M. In: chap. Combined parameter and state estimation in simulation-based filtering. Doucet A., de Freitas N., Gordon N., editors. Springer; 2001. Sequential Monte Carlo methods in practice. [Google Scholar]
Meinhold R.J., Singpurwalla N.D. Understanding the kalman filter. The American Statistician. 1983;37(2) [Google Scholar]
Migon H., et al. In: Day D., Rao C., editors. Vol. 19. Elsevier B.V.; 2005. Bayesian dynamic models; pp. 553–588. (Handbook of statistics). [Google Scholar]
Ontario Agency for Health Protection and Promotion (Public Health Ontario) Ontario COVID-19 data tool. 2022. https://www.publichealthontario.ca/en/data-and-analysis/infectious-disease/covid-19-data-surveillance/covid-19-data-tool?tab=trends
Osthus D., et al. Forecasting seasonal influenza with a state-space SIR model. Annals of Applied Statistics. 2017;11(1):202–224. doi: 10.1214/16-AOAS1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Park T., Casella G. The bayesian lasso. Journal of the American Statistical Association. 2008;103(482):681–686. [Google Scholar]
Pitt M.K., Shephard N. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association. 1999;94(446):590–599. [Google Scholar]
Prado R., Ferreira M.A.R., West M. 2nd ed. CRC Press; 2021. Time series: Modeling, computation, and inference. [Google Scholar]
Robert C.P., Roberts G.O. To appear in the international statistical review. 2021. Rao–blackwellization in the MCMC era. [DOI] [Google Scholar]
Shephard N. Partial non-Gaussian state space. Biometrika. 1994;81(1):115–131. [Google Scholar]
Van Erp S., Oberski D.L., Mulder J. Shrinkage priors for Bayesian penalized regression. Journal of Mathematical Psychology. 2019;89:31–50. [Google Scholar]
West M. Biometrika. 1987. On scale mixtures of normal distributions; pp. 646–648. [Google Scholar]

[bib1] Adams R.P., MacKay D.J.C. Bayesian online changepoint detection. 2007. https://arxiv.org/abs/0710.3742

[bib2] Barber D., Cemgil A.T., Chiappa S. Cambridge University Press; 2011. Bayesian time series models. [Google Scholar]

[bib3] Bettencourt L.M.A., et al. In: Zeng D., editor. Vol. 4506. Lecture Notes in Computer Science. Springer; Berlin, Heidelberg: 2007. Towards real time epidemiology: Data assimilation, modeling and anomaly detection of health surveillance data streams; pp. 79–90. (Intelligence and security informatics: Biosurveillance). et al. [Google Scholar]

[bib4] Bishop C.M. Springer; 2006. Pattern recognition and machine learning. [Google Scholar]

[bib5] Brooks S., et al. CRC Press; 2011. Handbook of Markov chain Monte Carlo. [Google Scholar]

[bib6] Carvalho C.M., Polson N.G., Scott J.G. The horseshoe estimator for sparse signals. Biometrika. 2010;97(2):465–480. [Google Scholar]

[bib7] Chen R., Liu J. Mixture kalman filter. Journal of the Royal Statistical Society: Series B. 2000;62:493–508. [Google Scholar]

[bib8] Chopin N. Central limit theorem for sequential Monte Carlo and its application to Bayesian inference. Annals of Statistics. 2004;32(6):2385–2411. [Google Scholar]

[bib9] Chopin N., Papaspiliopoulos O. Springer; 2020. An introduction to sequential Monte Carlo. [Google Scholar]

[bib10] Commandeur J.J., Koopman S.J. Oxford University Press; 2007. An introduction to state space time series analysis. [Google Scholar]

[bib11] Del Moral P. Springer; 2004. Feynman-kac formulae. [Google Scholar]

[bib12] Doucet A., de Freitas N., Gordon N. Springer; 2001. Sequential Monte Carlo methods in practice. [Google Scholar]

[bib13] Doucet A., et al. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. 2000. Rao–Blackwellised particle filtering for dynamic Bayesian networks; pp. 176–183. [Google Scholar]

[bib14] Dukic V., Lopes H.F., Polson N.G. Tracking epidemics with Google flu trends data and a state-space SEIR model. Journal of the American Statistical Association. 2012;107(500):1410–1426. doi: 10.1080/01621459.2012.713876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Durbin J., Koopman S.J. 2nd ed. Oxford University Press; 2012. Time series analysis by state space methods. [Google Scholar]

[bib16] Erdman C., Emerson J.W. bcp: an R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software. 2008;23:1–13. [Google Scholar]

[bib17] Gamerman D., Lopes H.F. 2nd ed. CRC Press; 2006. Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. [Google Scholar]

[bib18] Gordon N.J., Salmond D.J., Smith A.F.M. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. [DOI] [Google Scholar]

[bib19] Harvey A.C. Cambridge University Press; 1989. Forecasting, structural time series models and the kalman filter. [Google Scholar]

[bib20] Hsiang T.C. A bayesian view on ridge regression. The Statistician. 1975;24(4):267. [Google Scholar]

[bib21] Kalman R.E. A new approach to linear filtering and prediction problems. Transactions of the ASME – Journal of Basic Engineering (Series D) 1960;82:35–45. [Google Scholar]

[bib22] Kalman R.E. A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering Series D. 1960;82:35–45. [Google Scholar]

[bib23] Killick R., Eckley I.A. changepoint: an R package for changepoint analysis. Journal of Statistical Software. 2014;58(3):1–19. [Google Scholar]

[bib24] Kong A., Liu J.S., Wong W.H. Sequential imputations and bayesian missing data problems. Journal of the American Statistical Association. 1994;89(425):278–288. [Google Scholar]

[bib25] Kürüm E., et al. Bayesian model averaging with change points to assess the impact of vaccination and public health interventions. Epidemiology. 2017;28(6):889–897. doi: 10.1097/EDE.0000000000000719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Liu J.S., Chen R. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association. 1998;443(93):1032–1044. [Google Scholar]

[bib27] Liu J., et al. A new time-varying coefficient regression approach for analyzing infectious disease data. Scientific Reports. 2023;13 doi: 10.1038/s41598-023-41551-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Liu J., West M. In: chap. Combined parameter and state estimation in simulation-based filtering. Doucet A., de Freitas N., Gordon N., editors. Springer; 2001. Sequential Monte Carlo methods in practice. [Google Scholar]

[bib29] Meinhold R.J., Singpurwalla N.D. Understanding the kalman filter. The American Statistician. 1983;37(2) [Google Scholar]

[bib30] Migon H., et al. In: Day D., Rao C., editors. Vol. 19. Elsevier B.V.; 2005. Bayesian dynamic models; pp. 553–588. (Handbook of statistics). [Google Scholar]

[bib31] Ontario Agency for Health Protection and Promotion (Public Health Ontario) Ontario COVID-19 data tool. 2022. https://www.publichealthontario.ca/en/data-and-analysis/infectious-disease/covid-19-data-surveillance/covid-19-data-tool?tab=trends

[bib32] Osthus D., et al. Forecasting seasonal influenza with a state-space SIR model. Annals of Applied Statistics. 2017;11(1):202–224. doi: 10.1214/16-AOAS1000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Park T., Casella G. The bayesian lasso. Journal of the American Statistical Association. 2008;103(482):681–686. [Google Scholar]

[bib34] Pitt M.K., Shephard N. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association. 1999;94(446):590–599. [Google Scholar]

[bib35] Prado R., Ferreira M.A.R., West M. 2nd ed. CRC Press; 2021. Time series: Modeling, computation, and inference. [Google Scholar]

[bib36] Robert C.P., Roberts G.O. To appear in the international statistical review. 2021. Rao–blackwellization in the MCMC era. [DOI] [Google Scholar]

[bib37] Shephard N. Partial non-Gaussian state space. Biometrika. 1994;81(1):115–131. [Google Scholar]

[bib38] Van Erp S., Oberski D.L., Mulder J. Shrinkage priors for Bayesian penalized regression. Journal of Mathematical Psychology. 2019;89:31–50. [Google Scholar]

[bib39] West M. Biometrika. 1987. On scale mixtures of normal distributions; pp. 646–648. [Google Scholar]

PERMALINK

State-space modelling for infectious disease surveillance data: Stochastic simulation techniques and structural change detection

Christopher D Prashad

Abstract

Graphical abstract

1. Introduction

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

Fig. 13.

Fig. 14.

Fig. 15.

Fig. 16.

Fig. 17.

Fig. 18.

Fig. 19.

Fig. 20.

Fig. 21.

Fig. 22.

Fig. 23.

Fig. 24.

Fig. 25.

Fig. 26.

Fig. 27.

Fig. 28.

Fig. 29.

Fig. 30.

Fig. 31.

Fig. 32.

Fig. 33.

2. Materials and methods

2.1. Markov chain Monte Carlo method

Algorithm 1

Algorithm 2

Algorithm 3

Algorithm 4

2.2. Sequential Monte Carlo method

Algorithm 5

Algorithm 6

Algorithm 7

Algorithm 8

Algorithm 9

Algorithm 10

Algorithm 11

2.3. Online structural change detection

3. Applications to understanding COVID-19 data

Table 1.

Table 2.

3.1. Bayesian inference for nonlinear dynamic regression models

Table 3.

Table 4.

3.2. Application of online structural change detection method

Table 5.

4. Discussion

5. Conclusion

Declaration of competing interest

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases