EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number

Oswaldo Gressani; Jacco Wallinga; Christian L Althaus; Niel Hens; Christel Faes

doi:10.1371/journal.pcbi.1010618

. 2022 Oct 10;18(10):e1010618. doi: 10.1371/journal.pcbi.1010618

EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number

Oswaldo Gressani ^1,^*, Jacco Wallinga ^2,³, Christian L Althaus ⁴, Niel Hens ^1,⁵, Christel Faes ¹

Editor: Claudio José Struchiner⁶

PMCID: PMC9584461 PMID: 36215319

Abstract

In infectious disease epidemiology, the instantaneous reproduction number $R_{t}$ is a time-varying parameter defined as the average number of secondary infections generated by an infected individual at time t. It is therefore a crucial epidemiological statistic that assists public health decision makers in the management of an epidemic. We present a new Bayesian tool (EpiLPS) for robust estimation of the time-varying reproduction number. The proposed methodology smooths the epidemic curve and allows to obtain (approximate) point estimates and credible intervals of $R_{t}$ by employing the renewal equation, using Bayesian P-splines coupled with Laplace approximations of the conditional posterior of the spline vector. Two alternative approaches for inference are presented: (1) an approach based on a maximum a posteriori argument for the model hyperparameters, delivering estimates of $R_{t}$ in only a few seconds; and (2) an approach based on a Markov chain Monte Carlo (MCMC) scheme with underlying Langevin dynamics for efficient sampling of the posterior target distribution. Case counts per unit of time are assumed to follow a negative binomial distribution to account for potential overdispersion in the data that would not be captured by a classic Poisson model. Furthermore, after smoothing the epidemic curve, a “plug-in’’ estimate of the reproduction number can be obtained from the renewal equation yielding a closed form expression of $R_{t}$ as a function of the spline parameters. The approach is extremely fast and free of arbitrary smoothing assumptions. EpiLPS is applied on data of SARS-CoV-1 in Hong-Kong (2003), influenza A H1N1 (2009) in the USA and on the SARS-CoV-2 pandemic (2020-2021) for Belgium, Portugal, Denmark and France.

Author summary

The instantaneous reproduction number $R_{t}$ is a key statistic that provides important insights into an epidemic outbreak as it informs about the average number of secondary infections engendered by an infectious agent. We present a flexible Bayesian approach called EpiLPS (Epidemiological modeling with Laplacian-P-Splines) for efficient estimation of the epidemic curve and $R_{t}$ based on daily case count data and the serial interval distribution. Computational speed and absence of arbitrary assumptions on smoothing makes EpiLPS an interesting tool for estimation of the reproduction number. Our methodology is validated through different simulation scenarios by using the associated R software package (https://cran.r-project.org/package=EpiLPS). We also demonstrate the use of EpiLPS on real data from two historical outbreaks and on the SARS-CoV-2 pandemic.

This is a PLOS Computational Biology Methods paper.

Introduction

The instantaneous reproduction number $R_{t}$ is a time-varying parameter defined as the average number of secondary cases generated by an infectious individual at time t. During epidemic outbreaks, $R_{t}$ provides a snapshot (often on a daily basis) that quantifies the extent to which a given infectious disease transmits in a population and is therefore an important tool that assists governmental organizations in the management of a public health crisis. The reproduction number is also a good proxy for measuring the real-time growth phase of an epidemic and as such, constitutes a key signal about the transmission potential of the outbreak and the required control effort. For this reason, having a robust, accurate and timely estimator of $R_{t}$ is a crucial matter that has attracted considerable interest in developing new statistical approaches during the last two decades as summarized in [1]. The paper of [2] compares several methods for estimating $R_{t}$ and gives clear insights about the main challenges and obstacles that have to be faced. They recommend the method of [3] and its associated EpiEstim package [4] as an appropriate and accurate tool for near real-time estimation of the instantaneous reproduction number. Another recent approach is proposed in [5], where a recursive Bayesian smoother based on Kalman filtering is used to derive a robust estimate of $R_{t}$ in periods of low incidence. The EpiNow2 package [6] also provides interesting extensions and implementations of current best practices for precise estimation and forecast of the reproduction number using a Bayesian latent variable framework. Spline based approaches have shown to be a useful tool for flexible modeling of the reproduction number. [7] use penalized radial splines for estimating $R_{t}$ under a Bayesian setting with misreported data and [8] accelerated the computational implementation by replacing the Markov chain Monte Carlo (MCMC) scheme with Laplace approximations. From a frequentist perspective, [9] uses truncated polynomials and radial basis splines to model the series of new infections and a derivative thereof as a candidate estimator for the reproduction number.

In this article, we propose a new Bayesian approach termed “EpiLPS” for estimating $R_{t}$ based on case incidence data and the serial interval (SI) distribution (the time elapsed between the onset of symptoms in an infector and the onset of symptoms in the secondary cases generated by that infector). Our estimator of $R_{t}$ is based on epidemic renewal equations [10, 11] and Laplacian-P-splines smoothing of the mean number of incidence cases. Time series of new cases by day of reporting (or day of symptom onset) are assumed to follow a negative binomial distribution to account for potential excess variability as frequently encountered in epidemiological count data. Algorithms related to Laplace approximations and evaluations of B-spline bases are coded in C++ and embedded in the R language through the Rcpp package [12], making computational speed another key strength of EpiLPS as $R_{t}$ can be estimated in seconds. In addition, EpiLPS can also be used to obtain a smoothed estimate of the epidemic curve that can be of potential interest to further visualize an epidemic outbreak.

The proposed Bayesian methodology is based on a latent Gaussian model for the B-spline amplitudes and opens up two possible paths for inference. The first is called LPSMAP, a fully sampling-free approach based on Laplace approximations to the conditional posterior of B-spline coefficients. The hyperparameter vector is fixed at its maximum a posteriori and credible intervals of $R_{t}$ are computed via the “delta” method. The second path is called LPSMALA and is a MCMC approach based on the Langevin diffusion for efficient exploration of the posterior distribution of latent variables. The latter approach is computationally heavier than LPSMAP but has the merit of taking into account the uncertainty surrounding the hyperparameters. The underlying Metropolis-within-Gibbs structure keeps the practical implementation to a fairly simple level and the computational cost is reasonable even for long chains.

Compared to existing methods, EpiLPS resembles EpiEstim from a methodological point of view in the sense that $R_{t}$ is estimated from incidence time series and a serial interval distribution, yet the two approaches fundamentally differ in many aspects. First, the methodology of [3] assumes that incidence at time t is Poisson distributed, while EpiLPS assumes a negative binomial model. Second, as our approach uses penalized spline based approximations, prior specifications are imposed on the roughness penalty parameter and not directly on $R_{t}$ as in EpiEstim. Third and most importantly, EpiLPS is free of any sliding window specification, while EpiEstim relies on a user-defined time window. This subjective time window choice is the key driving force that determines how smooth the estimated $R_{t}$ trajectory will be. In EpiLPS, the optimal amount of smoothing is data-driven and objectively estimated (through the penalty parameter) within the Bayesian model. An R package for EpiLPS has been developed and is available at https://cran.r-project.org/package=EpiLPS. The software also allows to compute the Cori et al. (2013) [3] estimate of $R_{t}$ for the sake of comparison.

The manuscript is organized as follows. We first present the Laplacian-P-splines model for smoothing count data and show how the Laplace approximation applies to the conditional posterior of the B-spline amplitudes and also derive the (approximate) posterior of the hyperparameter vector to be optimized. This yields the maximum a posteriori (MAP) estimate of the spline vector via Laplacian-P-splines (LPSMAP). We then use LPSMAP to propose a “plug-in” estimate of $R_{t}$ based on renewal equations and proceed to the computation of credible intervals. An alternative path for estimation of $R_{t}$ based on MCMC is also presented. The latter approach uses Langevin dynamics for efficient sampling of the target posterior distribution and is termed LPSMALA for “Laplacian-P-splines with a Metropolis-adjusted Langevin algorithm”. Next, we assess the performance of EpiLPS in various simulation scenarios and make comparisons with EpiEstim. Finally, we apply EpiLPS to real world epidemic outbreaks before concluding with a discussion.

Methods

Negative binomial model for case incidence data

Let $D = {y_{t}, t = 1, \dots, T}$ be a time series of counts during an epidemic of T days with $y_{t} \in N$ (set of non-negative integers) denoting the number of cases by reporting date or by date of symptom onset. We assume that the number of cases on day t follows a negative binomial distribution y_t ∼ NegBin(μ(t), ρ), with $μ (t), ρ \in R_{+}^{*} ≔ {x \in R | x > 0}$ and probability mass function (see e.g. [13, 14]):

\begin{matrix} p (y_{t} | μ (t), ρ) = \frac{Γ (y_{t} + ρ)}{Γ (y_{t} + 1) Γ (ρ)} {(\frac{μ (t)}{μ (t) + ρ})}^{y_{t}} {(\frac{ρ}{ρ + μ (t)})}^{ρ}, \end{matrix}

(1)

where Γ(⋅) is the gamma function. The above parameterization is frequently encountered in epidemiology [15] and yields a mean $E (y_{t}) = μ (t)$ and variance $V (y_{t}) = μ (t) + μ {(t)}^{2} / ρ$ , so that ρ is the parameter responsible for overdispersion (variance larger than the mean) that is absent in a Poisson setting. In the limiting case ${lim}_{ρ \to + \infty} V (y_{t}) = μ (t) = E (y_{t})$ and we recover the mean-variance equality of the Poisson model. The key argument in favor of a negative binomial distribution is thus its ability to capture the often encountered feature of overdispersion present in infectious disease count data [16]. We assume that μ(t) evolves smoothly over the time course of the epidemic and model it with cubic B-splines [17]:

\begin{matrix} log (μ (t)) = \sum_{k = 1}^{K} θ_{k} b_{k} (t) = θ^{⊤} b (t), \end{matrix}

(2)

where θ = (θ₁, …, θ_K)^⊤ is the vector of B-spline amplitudes to be estimated and b(⋅) = (b₁(⋅), …, b_K(⋅))^⊤ is a cubic B-spline basis defined on the domain $T = [r_{l}, T]$ , where r_l is a lower bound on the time axis, typically the first day of the epidemic (i.e. r_l = 1). The philosophy behind P-splines consists in specifying a “large” number K of basis functions together with a discrete roughness penalty λθ^⊤Pθ as a counterforce to the induced flexibility of the fit. The parameter λ > 0 acts as a tuning parameter calibrating the “degree” of smoothness and $P = D_{r}^{⊤} D_{r} + ε I_{K}$ is a penalty matrix built from rth order difference matrices D_r of dimension (K − r) × K perturbed by an ε-multiple (here ε = 10⁻⁶) of the K-dimensional identity matrix I_K to ensure full rankedness. There are several attractive reasons to use P-splines for smoothing the epidemic curve and $R_{t}$ . First, as the P-splines setting specifies an abundant number of B-spline basis functions coupled with a penalty on the spline coefficients to control for overfitting, the resulting μ(t) fit is smooth and estimates can be obtained for any t on the continuous time domain. Second, even if the number K of B-splines is free to choose, the shape of the fitted $R_{t}$ curve is actually regulated by the smoothing parameter λ and hence only negligibly affected by the arbitrary choice of K, provided it is large enough [18]. Third, the intrinsic sparseness of P and of the B-spline basis matrix is computationally appealing as it softens the algorithmic implementation and yields numerically stable routines [19, 20]. Another key advantage of P-splines smoothers is their natural formulation in a Bayesian framework by translating difference penalties on contiguous B-spline coefficients into Gaussian random walk smoothness priors [21]. Following the latter reference, we impose a Gaussian prior on the vector of spline coefficients $θ | λ \sim N_{dim (θ)} (0, Q_{λ}^{- 1})$ , with precision matrix Q_λ = λP. For full Bayesian inference, the following priors are imposed on the model hyperparameters. Following [22], a robust Gamma prior is specified for the roughness penalty parameter $λ | δ \sim G (ϕ / 2, (ϕ δ) / 2)$ , where $G (a, b)$ is a Gamma distribution with mean a/b and variance a/b², ϕ = 2 and δ is an additional dispersion parameter with hyperprior $δ \sim G (a_{δ} = 10, b_{δ} = 10)$ . This prior specification favors “small” λ values and translates the belief that a wiggly $R_{t}$ fit is more inclined to arise during the epidemic period as opposed to an oversmoothed fit. Finally, the following uninformative prior is imposed on the overdispersion parameter $ρ \sim G (a_{ρ} = 0.0001, b_{ρ} = 0.0001)$ . Let η ≔ (λ, ρ)^⊤ denote the vector of hyperparameters. The full Bayesian model is thus:

\begin{matrix} y_{t} | μ (t), ρ & \sim & NegBin (μ (t), ρ), \\ log (μ (t)) & = & θ^{⊤} b (t), \\ θ | λ & \sim & N_{dim (θ)} (0, Q_{λ}^{- 1}), \\ λ | δ & \sim & G (ϕ / 2, (ϕ δ) / 2), \\ δ & \sim & G (a_{δ}, b_{δ}), \\ ρ & \sim & G (a_{ρ}, b_{ρ}) . \end{matrix}

Laplace approximation to the conditional posterior of θ

The Laplace approximation has two key roles in the proposed EpiLPS methodology. First, it determines the approximating distribution to the (conditional) posterior of the spline vector θ that will be used to estimate the average incidence of cases at time t, i.e. $E (y_{t})$ and hence $R_{t}$ via the renewal equation. Second, the variance-covariance matrix of the Laplace approximation is used to quantify the uncertainty of the instantaneous reproduction number through a “delta” method in LPSMAP and is also introduced in the proposal distribution of the LPSMALA algorithm to form the skeleton of the correlation structure for the spline components. The synergy between Laplace approximations and P-splines has already been shown to be very effective for modeling count data (see for instance [23], in the context of generalized additive models). The log-likelihood for the negative binomial model is given by:

\begin{matrix} ℓ (θ, ρ; D) \dot{=} \sum_{t = 1}^{T} {g (y_{t}, ρ) + y_{t} θ^{⊤} b (t) + ρ log (ρ) - (y_{t} + ρ) log (exp (θ^{⊤} b (t)) + ρ)}, \end{matrix}

(3)

with g(y_t, ρ) = log Γ(y_t + ρ) − log Γ(ρ) and $\dot{=}$ denoting equality up to an additive constant. The gradient of the log-likelihood with respect to the spline coefficients is:

\begin{matrix} \nabla_{θ} ℓ (θ, ρ; D) = {(\frac{\partial ℓ (θ, ρ; D)}{\partial θ_{1}}, \dots, \frac{\partial ℓ (θ, ρ; D)}{\partial θ_{K}})}^{⊤}, \end{matrix}

where:

\begin{matrix} \frac{\partial ℓ (θ, ρ; D)}{\partial θ_{k}} = \sum_{t = 1}^{T} y_{t} b_{k} (t) - \sum_{t = 1}^{T} \frac{(y_{t} + ρ) exp (θ^{⊤} b (t))}{(exp (θ^{⊤} b (t)) + ρ)} b_{k} (t), k = 1, \dots, K . \end{matrix}

The Hessian of the log-likelihood with respect to the B-spline amplitudes is:

\begin{matrix} \nabla_{θ}^{2} ℓ (θ, ρ; D) = (\begin{matrix} \frac{\partial^{2} ℓ (θ, ρ; D)}{\partial θ_{1}^{2}} & \dots & \frac{\partial^{2} ℓ (θ, ρ; D)}{\partial θ_{1} \partial θ_{K}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} ℓ (θ, ρ; D)}{\partial θ_{K} \partial θ_{1}} & \dots & \frac{\partial^{2} ℓ (θ, ρ; D)}{\partial θ_{K}^{2}} \end{matrix}), \end{matrix}

with entries:

\begin{matrix} \frac{\partial^{2} ℓ (θ, ρ; D)}{\partial θ_{k} \partial θ_{l}} = - \sum_{t = 1}^{T} ρ (y_{t} + ρ) \frac{exp (θ^{⊤} b (t))}{{(exp (θ^{⊤} b (t)) + ρ)}^{2}} b_{k} (t) b_{l} (t), k, l = 1, \dots, K . \end{matrix}

Using Bayes’ rule, the conditional posterior of θ for a given η is:

\begin{matrix} p (θ | η, D) & \propto & L (θ, ρ; D) p (θ | λ) \\ \propto & exp (ℓ (θ, ρ; D) - \frac{λ}{2} θ^{⊤} P θ), \end{matrix}

(4)

where $L (θ, ρ; D)$ denotes the likelihood function. The gradient and Hessian of the log-likelihood (3) can be used to compute the gradient and Hessian of the (log-)conditional posterior (4), namely:

\begin{matrix} \nabla_{θ} log p (θ | η, D) = \nabla_{θ} ℓ (θ, ρ; D) - λ P θ, \\ \nabla_{θ}^{2} log p (θ | η, D) = \nabla_{θ}^{2} ℓ (θ, ρ; D) - λ P . \end{matrix}

The above two equations will be used iteratively in a Newton-Raphson algorithm to obtain the Laplace approximation to the conditional posterior of θ:

\begin{matrix} {\tilde{p}}_{G} (θ | η, D) = N_{dim (θ)} (θ^{*} (η), Σ^{*} (η)), \end{matrix}

(5)

where θ*(η) and Σ*(η) is the mode and variance-covariance respectively after convergence of the Newton-Raphson algorithm. The latter two quantities are functions of the hyperparameter vector η. An intuitive choice for η is to fix it at its maximum a posteriori. This is the option retained here, although it is also possible to work with a grid-based approach [23, 24].

Hyperparameter optimization

The hyperparameter vector η = (λ, ρ)^⊤ will be calibrated by posterior optimization. Following [25] and [24], the hyperparameter vector can be approximated as follows:

\begin{matrix} \tilde{p} (η, δ | D) \propto \frac{L (θ, ρ; D) p (θ | λ) p (λ | δ) p (δ) p (ρ)}{{\tilde{p}}_{G} (θ | η, D)} |_{θ = θ^{*} (η)} . \end{matrix}

(6)

Approximation (6) can be written extensively as:

\begin{matrix} \tilde{p} (η, δ | D) & \propto & λ^{\frac{K + ϕ}{2} - 1} δ^{\frac{ϕ}{2} + a_{δ} - 1} exp (- δ (\frac{ϕ λ}{2} + b_{δ})) ρ^{a_{ρ} - 1} \\ \times | Σ^{*} {(η) |}^{\frac{1}{2}} exp (ℓ (θ^{*} (η), ρ; D) - \frac{λ}{2} θ^{* ⊤} (η) P θ^{*} (η) - b_{ρ} ρ), \end{matrix}

where the K/2 power of λ comes from the determinant $| Q_{λ}^{- 1} |^{- 1 / 2} = {| λ P |}^{1 / 2} \propto λ^{K / 2}$ . As $δ^{\frac{ϕ}{2} + a_{δ} - 1} exp (- δ (\frac{ϕ λ}{2} + b_{δ}))$ is the kernel of a Gamma distribution for the dispersion parameter δ, the following integral can be analytically solved:

\begin{matrix} \int_{0}^{+ \infty} \tilde{p} (η, δ | D) d δ & = & \tilde{p} (η | D) \\ \propto & λ^{\frac{K + ϕ}{2} - 1} {(\frac{ϕ λ}{2} + b_{δ})}^{- (\frac{ϕ}{2} + a_{δ})} ρ^{a_{ρ} - 1} \\ \times | Σ^{*} {(η) |}^{\frac{1}{2}} exp (ℓ (θ^{*} (η), ρ; D) - \frac{λ}{2} θ^{* ⊤} (η) P θ^{*} (η) - b_{ρ} ρ) . \end{matrix}

Using the transformation of variables (ensuring numerical stability during optimization) w = log(ρ), v = log(λ), one can show that $\tilde{p} (η | D)$ can be written as follows after using the multivariate transformation method:

\begin{matrix} \tilde{p} (\tilde{η} | D) & \propto & exp {(v)}^{\frac{K + ϕ}{2}} {(\frac{ϕ exp (v)}{2} + b_{δ})}^{- (\frac{ϕ}{2} + a_{δ})} exp {(w)}^{a_{ρ}} \\ \times | Σ^{*} (\tilde{η}) |^{\frac{1}{2}} exp (ℓ (θ^{*} (\tilde{η}), exp (w); D) - \frac{exp (v)}{2} θ^{* ⊤} (\tilde{η}) P θ^{*} (\tilde{η}) - b_{ρ} exp (w)), \end{matrix}

where $\tilde{η} = {(w, v)}^{⊤}$ . The approximated log-posterior becomes:

\begin{matrix} log \tilde{p} (\tilde{η} | D) & \dot{=} & 0.5 log | Σ^{*} (\tilde{η}) | + 0.5 (K + ϕ) v + a_{ρ} w - (0.5 ϕ + a_{δ}) log (0.5 ϕ exp (v) + b_{δ}) \\ + ℓ (θ^{*} (\tilde{η}), exp (w); D) - 0.5 exp (v) θ^{* ⊤} (\tilde{η}) P θ^{*} (\tilde{η}) - b_{ρ} exp (w) . \end{matrix}

(7)

Eq (7) is numerically optimized and yields ${\tilde{η}}^{*} = {argmax}_{\tilde{η}} log \tilde{p} (\tilde{η} | D)$ . Plugging the latter vector into the Laplace approximation (5), we obtain the estimate $\hat{θ} = θ^{*} ({\tilde{η}}^{*})$ of the spline vector. The latter can be seen as a MAP estimate of θ. Thus, the approximated (conditional) posterior of the spline vector is:

\begin{matrix} {\tilde{p}}_{G} (θ | {\tilde{η}}^{*}, D) = N_{dim (θ)} (θ^{*} ({\tilde{η}}^{*}), Σ^{*} ({\tilde{η}}^{*})), \end{matrix}

(8)

and can be used to construct credible intervals for functions that depend on θ, such as $R_{t}$ as shown in the following section.

Estimation of $R_{t}$ with LPSMAP

The renewal equation “plug-in” estimate

In this section, we show how the negative binomial model for smoothing incidence counts can be used to estimate $R_{t}$ through the renewal equation. Let φ = {φ₁, …, φ_k} be a known k-dimensional vector representing the serial interval (SI) distribution, where φ_s is the probability that the SI is equal to s day(s), i.e. $φ_{s} = P (S I = s)$ . We also assume $\sum_{s = 1}^{k} φ_{s} = 1$ and $P (S I \leq 0) = P (S I > k) = 0$ . The renewal model [10, 11] gives a mathematical statement of equality between the mean incidence of cases at time step t and a product between the reproduction number $R_{t}$ and a convolution involving antecedent cases and the serial interval distribution:

\begin{matrix} E (y_{t}) = R_{t} Λ_{t}, \end{matrix}

(9)

where $Λ_{t} = \sum_{s = 1}^{t - 1} φ_{s} y_{t - s}$ denotes the number of circulating cases that contribute to active transmission, also known as total infectiousness at time t [5]. Rearranging Eq (9) and taking the length k of the serial interval into account, we obtain an equation with the instantaneous reproduction number on the left-hand side:

\begin{matrix} R_{t} = {\begin{matrix} E (y_{t}) & for t = 1, \\ E (y_{t}) {(\sum_{s = 1}^{t - 1} φ_{s} y_{t - s})}^{- 1} & for 2 \leq t \leq k, \\ E (y_{t}) {(\sum_{s = 1}^{k} φ_{s} y_{t - s})}^{- 1} & for k < t \leq T . \end{matrix} \end{matrix}

(10)

Our Bayesian “plug-in” estimator of $R_{t}$ at time step t is obtained by replacing the average number of cases $E (y_{t}) = μ (t)$ by the estimated average $\hat{μ} (t) = exp ({\hat{θ}}^{⊤} b (t))$ and by replacing y_t−s by $\hat{μ} (t - s) = exp ({\hat{θ}}^{⊤} b (t - s))$ :

\begin{matrix} {\hat{R}}_{t} = {\begin{matrix} exp ({\hat{θ}}^{⊤} b (t)) & for t = 1, \\ exp ({\hat{θ}}^{⊤} b (t)) {(\sum_{s = 1}^{t - 1} φ_{s} exp ({\hat{θ}}^{⊤} b (t - s)))}^{- 1} & for 2 \leq t \leq k, \\ exp ({\hat{θ}}^{⊤} b (t)) {(\sum_{s = 1}^{k} φ_{s} exp ({\hat{θ}}^{⊤} b (t - s)))}^{- 1} & for k < t \leq T . \end{matrix} \end{matrix}

(11)

Note that the MAP estimate of the overdispersion parameter affects the estimate $\hat{μ} (t)$ via $\hat{θ}$ . Using the indicator function $I (\cdot)$ , i.e. $I (A) = 1$ if condition A is true and $I (A) = 0$ otherwise, the above estimator can be written in a single line:

\begin{matrix} {\hat{R}}_{t} & = & exp ({\hat{θ}}^{⊤} b (t)) {I (t = 1) + {(\sum_{s = 1}^{t - 1} φ_{s} exp ({\hat{θ}}^{⊤} b (t - s)))}^{- 1} I (2 \leq t \leq k) \\ + {(\sum_{s = 1}^{k} φ_{s} exp ({\hat{θ}}^{⊤} b (t - s)))}^{- 1} I (k < t \leq T)} . \end{matrix}

(12)

Credible intervals for $R_{t}$

Using the functional relationship between $R_{t}$ and θ as in Eq (12), the log of the instantaneous reproduction number can be written as:

\begin{matrix} log R_{t} & ≔ & h (θ | t) \\ = & θ^{⊤} b (t) + log ζ (θ), \end{matrix}

with

\begin{matrix} ζ (θ) & = & I (t = 1) + {(\sum_{s = 1}^{t - 1} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 1} I (2 \leq t \leq k) \\ + {(\sum_{s = 1}^{k} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 1} I (k < t \leq T) . \end{matrix}

Note that h(θ|t) is seen here as a function of the spline vector θ for a given time point t. A (1 − α) × 100% approximate credible interval for $R_{t}$ is obtained via a “delta” method. Consider a first-order Taylor expansion of h(θ|t) around $θ^{*} ({\tilde{η}}^{*})$ (henceforth θ* for the sake of a light notation), the mean of the Laplace approximated posterior of the spline vector in (8):

\begin{matrix} h (θ | t) \approx h (θ^{*} | t) + {(θ - θ^{*})}^{⊤} \nabla h (θ | t) |_{θ = θ^{*}}, \end{matrix}

(13)

where the kth entry of the gradient vector ∇h(θ|t) = (∂h(θ|t)/∂θ₁, …, ∂h(θ|t)/∂θ_K)^⊤ is:

\begin{matrix} \frac{\partial h (θ | t)}{θ_{k}} & = & b_{k} (t) + ζ^{- 1} (θ) \frac{\partial ζ (θ)}{\partial θ_{k}} . \\ \frac{\partial ζ (θ)}{\partial θ_{k}} & = & - {(\sum_{s = 1}^{t - 1} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 2} \sum_{s = 1}^{t - 1} φ_{s} exp (θ^{⊤} b (t - s)) b_{k} (t - s) I (2 \leq t \leq k) \\ - {(\sum_{s = 1}^{k} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 2} \sum_{s = 1}^{k} φ_{s} exp (θ^{⊤} b (t - s)) b_{k} (t - s) I (k < t \leq T) . \end{matrix}

It follows that for k = 1, …, K, we have:

\begin{matrix} \frac{\partial h (θ | t)}{θ_{k}} & = & b_{k} (t) + {0 I (t = 1) \\ - {(\sum_{s = 1}^{t - 1} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 1} \sum_{s = 1}^{t - 1} φ_{s} exp (θ^{⊤} b (t - s)) b_{k} (t - s) I (2 \leq t \leq k) \\ - {(\sum_{s = 1}^{k} φ_{s} exp (θ^{⊤} b (t - s)))}^{- 1} \sum_{s = 1}^{k} φ_{s} exp (θ^{⊤} b (t - s)) b_{k} (t - s) I (k < t \leq T)} . \end{matrix}

The Taylor expansion in (13) is a linear combination of the vector θ that is a posteriori (approximately) Gaussian due to the Laplace approximation. As the family of Gaussian distributions is closed under linear combinations, it follows that h(θ|t) (and hence $log R_{t}$ ) is a posteriori also (approximately) Gaussian with mean $E (h (θ | t)) \approx h (θ^{*} | t)$ and variance $V (h (θ | t)) \approx \nabla^{⊤} h (θ | t) {|_{θ = θ^{*}} Σ^{*} \nabla h (θ | t) |}_{θ = θ^{*}}$ , where $Σ^{*} ≔ Σ^{*} ({\tilde{η}}^{*})$ is the covariance matrix of the Laplace approximation (8). This suggests to write:

\begin{matrix} (log R_{t} | D) \approx N_{1} (h (θ^{*} | t), \nabla^{⊤} h (θ | t) {|_{θ = θ^{*}} Σ^{*} \nabla h (θ | t) |}_{θ = θ^{*}}) . \end{matrix}

(14)

The accuracy of the variance approximation in (14) can be improved through a scaling of the covariance matrix Σ* by multiplying it with the scaling factor $κ_{t}^{\hat{ρ}} ≔ {(1 + {\hat{ρ}}^{- 1} \hat{μ} (t))}^{- 1}$ , corresponding to the estimated mean-to-variance ratio $E (y_{t}) / V (y_{t})$ at time step t (see S2 Appendix). The (approximate) posterior distribution for $R_{t}$ is thus given by $(R_{t} | D) \sim L o g N o r m (μ_{R_{t}}^{*}, σ_{R_{t}}^{2 *})$ , i.e. a lognormal distribution with parameters $μ_{R_{t}}^{*} = h (θ^{*} | t)$ and $σ_{R_{t}}^{2 *} = \nabla^{⊤} h (θ | t) {|_{θ = θ^{*}} κ_{t}^{\hat{ρ}} Σ^{*} \nabla h (θ | t) |}_{θ = θ^{*}}$ . A quantile-based (1 − α) × 100% approximate credible interval for $R_{t}$ is thus $C I_{R_{t}}^{1 - α} = exp (μ_{R_{t}}^{*} \pm z_{α / 2} σ_{R_{t}}^{*})$ , where z_α/2 is the α/2-upper quantile of a standard normal variate.

Estimation of $R_{t}$ with LPSMALA

In Bayesian statistics, posterior distributions obtained with Bayes’ theorem often entail a high degree of complexity and are typically not analytically tractable. To circumvent this problem, MCMC methods have been developed for generating samples from (possibly unnormalized) target distributions [26]. One of the most popular MCMC methods together with the Gibbs sampler [27] is the Metropolis-Hastings (MH) algorithm originally proposed by [28] and later generalized by [29]. In this section, we propose to implement a modified version of the Metropolis-adjusted Langevin algorithm (MALA) [30] within the EpiLPS framework. The major advantage of MALA as compared to MH algorithms is that the proposal distribution is based upon a discretized approximation of the Langevin diffusion that uses the gradient of the target posterior distribution. These “smarter” proposals make use of additional information about the target density so that algorithms based on Langevin dynamics can converge at sub-geometric rates and tend to be more efficient than naive random-walk Metropolis algorithms [31, 32].

This motivates our choice for embedding a MALA algorithm in EpiLPS as an efficient way of obtaining MCMC samples for inference on the instantaneous reproduction number $R_{t}$ via the renewal equation. The end-user will thus have a fully flexible choice regarding the underlying approach for estimating $R_{t}$ either via Laplacian-P-splines, where the uncertainty surrounding the parameter λ responsible for smoothing is ignored and λ is fixed at its maximum a posteriori (LPSMAP); or via a modified MALA algorithm, where the uncertainty surrounding the penalty (and overdispersion) parameter is fully taken into account (LPSMALA). The approach permits to obtain samples from the joint posterior of the spline vector and the penalty and overdispersion parameters. The latter can then be injected in functionals of the spline vector to obtain smooth estimates of the epidemic curve as well as the instantaneous reproduction number. Another advantage is that highest posterior density intervals can be easily calculated with LPSMALA.

Conditional posteriors for a “Metropolis-within-Gibbs”

Joint posterior of (ζ, λ)

Let ζ = (θ^⊤, ρ)^⊤ be the (K + 1)-dimensional vector gathering the B-spline coefficients θ and the overdispersion parameter ρ. Using Bayes’ theorem, the joint posterior distribution for ζ, λ and δ is:

\begin{matrix} p (ζ, λ, δ | D) & = & \frac{p (D | ζ, λ, δ) p (ζ, λ, δ)}{p (D)} \\ \propto & L (ζ; D) p (θ | λ) p (λ | δ) p (δ) p (ρ) \\ \propto & exp (ℓ (ζ; D)) p (θ | λ) p (λ | δ) p (δ) p (ρ) . \end{matrix}

(15)

The analytical formulas of the chosen priors are:

\begin{matrix} p (θ | λ) & \propto & λ^{\frac{K}{2}} exp (- 0.5 λ θ^{⊤} P θ), \\ p (λ | δ) & \propto & δ^{\frac{ϕ}{2}} λ^{\frac{ϕ}{2} - 1} exp (- 0.5 ϕ δ λ), \\ p (δ) & \propto & δ^{a_{δ} - 1} exp (- b_{δ} δ), \\ p (ρ) & \propto & ρ^{a_{ρ} - 1} exp (- b_{ρ} ρ) . \end{matrix}

Injecting the above priors into (15) yields:

\begin{matrix} p (ζ, λ, δ | D) & \propto & exp (ℓ (ζ; D) - b_{ρ} ρ - 0.5 λ θ^{⊤} P θ) ρ^{a_{ρ} - 1} λ^{\frac{K + ϕ}{2} - 1} \\ \times δ^{(\frac{ϕ}{2} + a_{δ}) - 1} exp (- δ (0.5 ϕ λ + b_{δ})) . \end{matrix}

(16)

Conditional posteriors of ζ, λ and δ

The following conditional posterior distributions can be directly obtained from (16):

\begin{matrix} p (ζ | λ, δ, D) & \propto & ρ^{a_{ρ} - 1} exp (ℓ (ζ; D) - b_{ρ} ρ - 0.5 λ θ^{⊤} P θ), \end{matrix}

(17)

\begin{matrix} (λ | ζ, δ, D) & \sim & G (0.5 (K + ϕ), 0.5 (θ^{⊤} P θ + δ ϕ)), \end{matrix}

(18)

\begin{matrix} (δ | ζ, λ, D) & \sim & G (0.5 ϕ + a_{δ}, 0.5 ϕ λ + b_{δ}) . \end{matrix}

(19)

Sampling from the joint posterior $p (ζ, λ, δ | D)$

As the full conditionals $p (ζ | λ, δ, D)$ , $p (λ | ζ, δ, D)$ and $p (δ | ζ, λ, D)$ are available, we follow a “Metropolis-within-Gibbs” strategy to sample the joint posterior $p (ζ, λ, δ | D)$ . In particular, the hyperparameters λ and δ will be sampled in a Gibbs step, while ζ will be sampled using a modified Langevin-Hastings algorithm. This approach is presented in [33] in the context of Bayesian density estimation (see also [34] for the use of MALA in a proportional hazards model and [35] for a recent implementation in mixture cure models). We adapt the algorithm of the latter reference to our EpiLPS methodology. In particular, the variance-covariance matrix in the Langevin diffusion will be replaced by the variance-covariance matrix obtained with LPSMAP. The correlation structure borrowed from LPSMAP improves convergence and chain mixing.

The modified Metropolis-adjusted Langevin algorithm

In what follows, we prefer to work under the log(⋅) parameterization for ρ, i.e. w = log(ρ) and denote by $\tilde{ζ} = {(θ^{⊤}, w)}^{⊤}$ , the (K + 1)-dimensional vector of B-spline amplitudes and (log) overdispersion w. Under this parameterization, the conditional posterior of $\tilde{ζ}$ given λ and δ can be obtained from (17) by using the transformation method of random variables:

\begin{matrix} p (\tilde{ζ} | λ, δ, D) \propto exp {(w)}^{a_{ρ}} exp (ℓ (\tilde{ζ}; D) - b_{ρ} exp (w) - 0.5 λ θ^{⊤} P θ), \end{matrix}

(20)

with the following log-likelihood under the reparameterization:

\begin{matrix} ℓ (\tilde{ζ}; D) & \dot{=} & \sum_{t = 1}^{T} {log Γ (y_{t} + exp (w)) - log Γ (exp (w)) + y_{t} θ^{⊤} b (t) + exp (w) w \\ - (y_{t} + exp (w)) log (exp (θ^{⊤} b (t)) + exp (w))} . \end{matrix}

(21)

Let us denote by ${\tilde{ζ}}^{(m - 1)} \in R^{(K + 1)}$ the state of the chain at iteration (m − 1). In the Langevin-Hastings algorithm, the proposal for the vector $\tilde{ζ}$ at iteration m is a draw from the following multivariate Gaussian distribution:

\begin{matrix} {\tilde{ζ}}^{(prop)} \sim N_{(K + 1)} ({\tilde{ζ}}^{(m - 1)} + 0.5 ϱ Σ_{L H} \nabla_{\tilde{ζ}} log p (\tilde{ζ} | λ, δ, D) |_{\tilde{ζ} = {\tilde{ζ}}^{(m - 1)}}, ϱ Σ_{L H}), \end{matrix}

(22)

where ϱ > 0 is a tuning parameter that has to be carefully chosen in order to reach a desired acceptance rate and Σ_LH is the following block-diagonal variance-covariance matrix:

\begin{matrix} Σ_{L H} = (\begin{matrix} Σ^{*} & 0 \\ 0 & 1 \end{matrix}), \end{matrix}

(23)

where Σ* is the K-dimensional covariance matrix obtained with LPSMAP. The gradient of $log p (\tilde{ζ} | λ, δ, D) = ℓ (\tilde{ζ}; D) - 0.5 λ θ^{⊤} P θ - b_{ρ} exp (w) + a_{ρ} w$ can be decomposed as follows:

\begin{matrix} \nabla_{\tilde{ζ}} log p (\tilde{ζ} | λ, δ, D) = {(\nabla_{θ}^{⊤} log p (\tilde{ζ} | λ, δ, D), \frac{\partial log p (\tilde{ζ} | λ, δ, D)}{\partial w})}^{⊤}, \end{matrix}

(24)

and is analytically available (see S1 Appendix for more details). All the quantities related to the Langevin-Hastings proposal have been analytically derived, so that the draw in (22) can be obtained (for a given value of λ and δ). As in a classic MH algorithm, the next step consists in computing the acceptance probability:

\begin{matrix} π ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)}) = min {1, \frac{p ({\tilde{ζ}}^{(prop)} | λ, δ, D)}{p ({\tilde{ζ}}^{(m - 1)} | λ, δ, D)} \frac{q ({\tilde{ζ}}^{(prop)}, {\tilde{ζ}}^{(m - 1)})}{q ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)})}}, \end{matrix}

(25)

where q(⋅, ⋅) denotes the (Gaussian) proposal distribution and $p (\cdot | λ, δ, D)$ the target (conditional) posterior distribution. Finally, we generate a uniform random variable $u \sim U (0, 1)$ and accept the proposed vector ${\tilde{ζ}}^{(prop)}$ if $u \leq π ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)})$ and reject it otherwise. While iterating through the Metropolis-within-Gibbs algorithm, the tuning parameter $ϱ$ is automatically adapted to reach the optimal acceptance rate of 0.57 [31, 36, 37]. The pseudo-code below summarizes the LPSMALA algorithm.

LPSMALA algorithm to sample the posterior $p (θ, ρ, λ, δ | D)$ .

1: Fix initial values m = 0, λ⁽⁰⁾, δ⁽⁰⁾, ϱ⁽⁰⁾ and ${\tilde{ζ}}^{(0)} = {(θ^{(0) ⊤}, w^{(0)})}^{⊤}$ .

2: for m = 1, …, M do

3: (Langevin-Hastings)

4: Compute Langevin diffusion: $E ({\tilde{ζ}}^{(m - 1)}) = {\tilde{ζ}}^{(m - 1)} + 0.5 ϱ^{(m - 1)} Σ_{L H} \nabla_{\tilde{ζ}} log p (\tilde{ζ} | λ^{(m - 1)}, δ^{(m - 1)}, D) |_{\tilde{ζ} = {\tilde{ζ}}^{(m - 1)}}$ .

5: Generate a proposal: ${\tilde{ζ}}^{(prop)} \sim N_{(K + 1)} (E ({\tilde{ζ}}^{(m - 1)}), ϱ^{(m - 1)} Σ_{L H})$ .

6: Compute acceptance probability: $π ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)}) = min {1, \frac{p ({\tilde{ζ}}^{(prop)} | λ^{(m - 1)}, δ^{(m - 1)}, D)}{p ({\tilde{ζ}}^{(m - 1)} | λ^{(m - 1)}, δ^{(m - 1)}, D)} \frac{q ({\tilde{ζ}}^{(prop)}, {\tilde{ζ}}^{(m - 1)})}{q ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)})}}$ .

7: Draw $u \sim U (0, 1)$ .

8: if u ≤ π set ${\tilde{ζ}}^{(m)} = {\tilde{ζ}}^{(prop)}$ (accept), else ${\tilde{ζ}}^{(m)} = {\tilde{ζ}}^{(m - 1)}$ (reject).

9: (Gibbs sampler)

10: Draw $δ^{(m)} \sim G (0.5 ϕ + a_{δ}, 0.5 ϕ λ^{(m - 1)} + b_{δ})$ ,

11: Draw $λ^{(m)} \sim G (0.5 (K + ϕ), 0.5 (θ^{(m) ⊤} P θ^{(m)} + δ^{(m)} ϕ))$ .

12: (Adaptive tuning)

13: Update $ϱ^{(m)} = H^{2} (\sqrt{ϱ^{(m - 1)}} + m^{- 1} (π ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)}) - 0.57))$ .

14: end for

The adaptive tuning part (line 13) involves the step function $H (z) = ϵ I (z < ϵ) + z I (ϵ \leq z \leq A) + A I (z > A)$ , with ϵ = 10⁻⁴ and $A = 10^{4}$ , see [33] for details. Finally, the ratio $q ({\tilde{ζ}}^{(prop)}, {\tilde{ζ}}^{(m - 1)}) q^{- 1} ({\tilde{ζ}}^{(m - 1)}, {\tilde{ζ}}^{(prop)})$ entering the computation of the acceptance probability (line 6) is derived in S1 Appendix.

Posterior inference with LPSMALA

Provided the LPSMALA algorithm is iterated long enough, say after $\tilde{M}$ iterations, MCMC theory certifies that $S = {(θ^{(m) ⊤}, ρ^{(m)}, λ^{(m)}, δ^{(m)})}_{m = \tilde{M} + 1}^{M}$ can be viewed as random draws from the target posterior distribution $p (θ, ρ, λ, δ | D)$ . Note that a convenient starting point for the initial values of the parameters might be to fix them at their LPSMAP estimate. Given the sample $S$ , inference on quantities that are functions of θ becomes straightforward in the sense that point estimates and credible intervals can be easily obtained. A point estimate for the mean number of incidence counts at time t is taken to be the posterior mean (after discarding the burn-in phase):

\begin{matrix} \hat{μ} (t) = \frac{1}{M - \tilde{M}} \sum_{m = \tilde{M} + 1}^{M} exp (θ^{(m) ⊤} b (t)) . \end{matrix}

(26)

Note also that $S$ can be used to compute highest posterior density intervals of μ(t) at any point t. Using the renewal equation and the MCMC sample, one can apply the “plug-in” method and recover the following estimate of the instantaneous reproduction number at time point t:

\begin{matrix} {\hat{R}}_{t} & = & \frac{1}{M - \tilde{M}} \sum_{m = \tilde{M} + 1}^{M} exp (θ^{(m) ⊤} b (t)) {I (t = 1) + {(\sum_{s = 1}^{t - 1} φ_{s} exp (θ^{(m) ⊤} b (t - s)))}^{- 1} I (2 \leq t \leq k) \\ + {(\sum_{s = 1}^{k} φ_{s} exp (θ^{(m) ⊤} b (t - s)))}^{- 1} I (k < t \leq T)} . \end{matrix}

(27)

Also, using $S$ , one can compute a highest posterior density interval of $R_{t}$ at time step t.

Results

Setting of the simulation study

In this section, a numerical study is implemented with nine epidemic scenarios to assess the accuracy with which EpiLPS is able to track the target reproduction number over time. EpiLPS results are compared with the instantaneous reproduction number estimate from the EpiEstim package [3] using three sliding windows options (the default weekly windows, three days windows and daily windows). In addition, we disentangle between comparisons of EpiLPS against EpiEstim with $R_{t}$ estimates reported on the last day of a window following the convention of [3] and $R_{t}$ estimates reported at the midpoint of a smoothing window following the best practice recommendation of [2]. For EpiLPS, K = 40 (cubic) B-splines are specified with a second-order penalty and a chain length of 3 000 for LPSMALA (including a burn-in of size 1 000). In each scenario, S = 100 incidence time series of T days are simulated (initiated with 10 index cases). The epidemic data generating process computes the mean incidence at a given day t, i.e. μ(t) according to the renewal equation and the incidence of cases at time point t is sampled from the negative binomial distribution y_t ∼ NegBin(μ(t), ρ). The simulation study also accounts for varying degrees of overdispersion by using different values of ρ in the considered scenarios. Furthermore, the incidence data are generated according to three different serial interval distributions, namely φ_FLU, φ_SARS and φ_MERS corresponding to an influenza, a SARS-CoV-1 and a MERS-CoV like serial interval, respectively. The discretized version of the SI distributions are computed by using the Cori et al. (2013) [3] discretization formula assuming a (shifted) Gamma distribution. In Scenario 1, a constant instantaneous reproduction number $R_{t} = 1.3$ is considered. Scenario 2 imitates an intervention strategy, so that $R_{t} = 2$ until a sudden drop to $R_{t} = 0.9$ occurs at day t = 20. The latter scenario allows to check whether EpiLPS is able to quickly react to a sudden change in $R_{t}$ . Scenario 3 is characterized by a more wiggly structure for $R_{t}$ and Scenario 4 considers the case of a vanishing epidemic with a monotonic decreasing reproduction number. Scenarios 5–8 assume the same functional form for $R_{t}$ as in Scenarios 1–4 but with a different serial interval distribution. In Scenario 9, the $R_{t}$ function is chosen in such a way that there is a single large wave in the early phase of the epidemic and a more stable pattern (with smaller waves) in the late phase. Table 1 summarizes the time domain of the epidemic curve, the target $R_{t}$ function, the serial interval distribution and its associated source(s) in the literature.

Table 1. Time domain of the epidemic curve, assumed functional form of the reproduction number, serial interval distribution and its associated source(s) in the literature for the nine scenarios considered in the simulation study.

Scenario	Time domain of epidemic curve	$R_{t}$ target function	Serial Interval Mean (SD), days	Reference for Serial Interval
1	$T = [1, 40]$	$R_{t} = 1.3$	φ_FLU 2.6 (1.5)	Ferguson et al. (2005) [38] Cori et al. (2013) [3]
2		$R_{t} = 2 I (t < 20) + 0.9 I (t \geq 20)$
3		$R_{t} = 0.25 + exp (cos (t / 7))$
4		$R_{t} = exp (cos (t / 15))$
5	$T = [1, 40]$	$R_{t} = 1.3$	φ_SARS 8.4 (3.8)	Lipsitch et al. (2003) [39] Cori et al. (2013) [3]
6		$R_{t} = 2 I (t < 20) + 0.9 I (t \geq 20)$
7		$R_{t} = 0.25 + exp (cos (t / 7))$
8		$R_{t} = exp (cos (t / 15))$
9	$T = [1, 60]$	$R_{t} = 0.5 (exp (sin (π t / 9)) + 1.5 exp (cos (4 / t)))$	φ_MERS 6.8(4.1)	Cauchemez et al. (2016) [40]

Open in a new tab

The simulation study is organized as follows. First, we compare EpiLPS with EpiEstim using the convention of Cori and colleagues, namely reporting the $R_{t}$ estimate at the end of the smoothing window, which is well suited for real-time estimation. The latter approach reports the $R_{t}$ estimate computed in the window [t − ω; t], where ω denotes the window width. Next, the Gostic et al. (2020) [2] recommendation is used, where the $R_{t}$ estimate is reported at the center of the window, i.e. [t − ω/2; t + ω/2]. Concentrating on the window midpoint avoids lagged $R_{t}$ estimates at the cost of ruling out estimates at the last ω/2 time points as, in that case, the upper bound of the window reaches future calendar days for which data is not yet available. Fig 1 summarizes the two window structures used in the simulation study.

Fig 1 — Illustration of smoothing windows of width ω to estimate $R_{t}$ with EpiEstim. (a) Cori et al. (2013) [3] convention with sliding windows [t − ω; t], where $R_{t}$ is reported at the end of the window. (b) Gostic et al. (2020) [2] recommendation with centered sliding windows [t − ω/2; t + ω/2], where $R_{t}$ is reported at the midpoint of the window. Under the midpoint rule, $R_{t}$ estimates for the last ω/2 time units are unavailable ∅.

Comparing EpiLPS with EpiEstim at window boundary

The performance indicators computed for each scenario include the average bias, mean square error (MSE), coverage probability (CP) and width (CI^Δ) of 90% and 95% credible intervals for the $R_{t}$ estimator (see detailed formulas in S2 Appendix) with EpiLPS and EpiEstim, respectively. Estimates obtained during the first week of the epidemic are ignored as they may be subject to serious bias due to the poor information carried by the (few) incident cases in such an early phase. Therefore, the performance measures are computed as an average over days t = 8, …, T, where T is the upper bound of $T$ . For a chosen time window, the performance measures for EpiEstim are computed by comparing the true value of the reproduction number at time step t with the estimated reproduction number (and credible interval) obtained at the end of the chosen time window (cf. Fig 1). A detailed description of the data generating process and the figures of the estimated $R_{t}$ trajectories for all scenarios are provided in S2 Appendix.

The performance measures given in Tables 2 and 3 provide interesting insights into the behavior of EpiLPS and EpiEstim across the considered scenarios. In terms of bias, EpiLPS is really competitive against EpiEstim as both LPSMAP and LPSMALA outperform EpiEstim (no matter the time window size) in Scenarios 4–8. For the remaining scenarios, the bias between the two competing methods is more or less similar. Regarding the MSE, EpiLPS exhibits smaller values as compared to EpiEstim with three days and daily windows respectively across all scenarios. Moreover, specifying smaller time windows in EpiEstim leads (generally) to an increase in MSE and also an increase in bias. A close inspection of the coverage probability of credible intervals reveals that EpiLPS has close to nominal coverage in almost all scenarios. This is however not the case for EpiEstim, especially for weekly and three days windows, where severe to mild undercoverage is observed. Also, EpiEstim tends to show more severe undercoverage in scenarios where data is more overdispersed (see e.g. Scenario 4). More importantly, even when EpiEstim approaches the nominal coverage probability (with daily windows), it has much wider credible interval width (and so less precision) as compared to EpiLPS in almost all scenarios.

Table 2. Results for EpiLPS and EpiEstim in Scenarios 1–8 for S = 100 simulated epidemics.

The Bias, MSE, coverage probability (CP) and width (CI^Δ) of 90% and 95% credible intervals for $R_{t}$ are averaged over days t = 8, …, 40. For EpiEstim, $R_{t}$ is reported at the end of the window.

Scenario	Method	Bias	MSE	CP _90%	CP _95%	$C I_{90 %}^{Δ}$	$C I_{95 %}^{Δ}$
1 φ_FLU	LPSMAP	-0.012	0.016	90.970	95.333	0.399	0.477
	LPSMALA	-0.012	0.017	93.000	96.727	0.454	0.544
	EpiEstim (7d windows)	-0.007	0.012	88.970	94.242	0.330	0.394
	EpiEstim (3d windows)	0.004	0.025	88.636	93.515	0.466	0.555
	EpiEstim (1d windows)	0.036	0.075	88.515	93.970	0.776	0.927
2 φ_FLU	LPSMAP	-0.004	0.013	77.030	82.424	0.175	0.209
	LPSMALA	-0.003	0.010	92.697	93.909	0.313	0.375
	EpiEstim (7d windows)	0.071	0.043	67.970	73.303	0.142	0.169
	EpiEstim (3d windows)	0.027	0.021	77.818	84.424	0.182	0.217
	EpiEstim (1d windows)	0.003	0.016	83.424	89.242	0.287	0.342
3 φ_FLU	LPSMAP	-0.009	0.015	91.970	96.242	0.396	0.474
	LPSMALA	-0.008	0.017	92.545	96.152	0.451	0.541
	EpiEstim (7d windows)	-0.023	0.093	23.394	29.333	0.279	0.332
	EpiEstim (3d windows)	-0.006	0.033	76.939	84.939	0.434	0.518
	EpiEstim (1d windows)	0.044	0.073	89.576	94.394	0.761	0.909
4 φ_FLU	LPSMAP	0.000	0.002	89.879	94.091	0.189	0.226
	LPSMALA	-0.001	0.003	92.152	96.455	0.175	0.208
	EpiEstim (7d windows)	0.152	0.027	9.485	11.273	0.105	0.125
	EpiEstim (3d windows)	0.056	0.007	40.970	47.667	0.133	0.158
	EpiEstim (1d windows)	0.003	0.008	80.848	88.455	0.209	0.250
5 φ_SARS	LPSMAP	0.013	0.232	93.030	96.667	1.903	2.356
	LPSMALA	0.005	0.246	91.061	96.273	1.897	2.376
	EpiEstim (7d windows)	0.061	0.162	83.344	89.781	1.089	1.301
	EpiEstim (3d windows)	0.146	0.395	82.844	89.438	1.619	1.939
	EpiEstim (1d windows)	0.388	1.091	86.938	92.281	2.783	3.362
6 φ_SARS	LPSMAP	0.001	0.201	90.939	95.394	1.660	2.044
	LPSMALA	-0.012	0.218	91.485	96.242	1.731	2.157
	EpiEstim (7d windows)	0.115	0.202	72.562	79.812	0.909	1.085
	EpiEstim (3d windows)	0.114	0.330	77.719	85.219	1.295	1.548
	EpiEstim (1d windows)	0.238	0.843	81.312	88.219	2.162	2.602
7 φ_SARS	LPSMAP	-0.008	0.288	96.121	98.606	2.272	3.023
	LPSMALA	-0.005	0.384	92.182	96.545	1.929	2.485
	EpiEstim (7d windows)	0.040	0.278	75.062	82.781	1.090	1.304
	EpiEstim (3d windows)	0.160	0.410	85.125	91.656	1.700	2.044
	EpiEstim (1d windows)	0.493	1.199	89.562	94.031	3.051	3.707
8 φ_SARS	LPSMAP	0.021	0.187	91.667	95.303	1.416	1.750
	LPSMALA	0.005	0.191	91.061	96.121	1.522	1.900
	EpiEstim (7d windows)	0.217	0.206	69.312	77.281	0.838	1.000
	EpiEstim (3d windows)	0.157	0.311	79.375	86.781	1.163	1.391
	EpiEstim (1d windows)	0.250	0.725	82.594	89.281	1.950	2.350

Open in a new tab

Table 3. Results for EpiLPS and EpiEstim in Scenario 9 for S = 100 simulated epidemics.

The Bias, MSE, coverage probability (CP) and width (CI^Δ) of 90% and 95% credible intervals for $R_{t}$ are averaged over days t = 8, …, 60. For EpiEstim, $R_{t}$ is reported at the end of the window.

Scenario	Method	Bias	MSE	CP _90%	CP _95%	$C I_{90 %}^{Δ}$	$C I_{95 %}^{Δ}$
9 φ_MERS	LPSMAP	-0.026	0.078	90.566	94.528	0.849	1.021
	LPSMALA	-0.024	0.082	92.981	96.925	0.935	1.128
	EpiEstim (7d windows)	-0.002	0.347	41.981	49.698	0.705	0.841
	EpiEstim (3d windows)	0.048	0.184	79.925	86.830	1.035	1.236
	EpiEstim (1d windows)	0.160	0.423	85.585	91.358	1.766	2.121

Open in a new tab

Figs 2 to 4 summarize the epidemic curves and the trajectories obtained for the estimated $R_{t}$ with LPSMAP (blue curves) and EpiEstim under weekly sliding windows (green curves) for selected scenarios. These figures highlight the flexibility and the precision with which Laplacian-P-splines are able to capture the reproduction number over the time course of the epidemic. The dashed (dotted) curves represent the pointwise median (computed over the S = 100 estimates) of $R_{t}$ with LPSMAP (EpiEstim). For LPSMAP, it closely follows the true pattern of $R_{t}$ even under strong nonlinearities as in Fig 4. The EpiEstim trajectories appear shifted to the right of the target $R_{t}$ curve. This lag is due to the fact that for weekly sliding windows, the $R_{t}$ estimate provided by EpiEstim at the end of the window is entirely based on data from past days and is therefore lagged compared to the target (instantaneous) $R_{t}$ . This shift effect can be corrected by decreasing the time window (e.g., using daily windows) at the cost of more “noisy” trajectories. Even then, the median $R_{t}$ estimates of EpiEstim appear to capture less precisely the target $R_{t}$ function as compared to LPSMAP/LPSMALA (see S2 Appendix) across most of the considered scenarios.

Fig 2 — (Left) Simulated incidence data for Scenario 2. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the end of the window. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

Fig 4 — (Left) Simulated incidence data for Scenario 9. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the end of the window. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

Fig 3 — (Left) Simulated incidence data for Scenario 3. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the end of the window. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

To summarize, this simulation study sheds light on the trade-off faced by the Cori method when estimating the instantaneous reproduction number. Choosing a weekly sliding window as a default option in EpiEstim can lead to a forward shifted (and so inaccurate) estimate of $R_{t}$ . Smaller time windows in EpiEstim alleviate the lag effect, but the price to pay is that the fitted $R_{t}$ trajectory is wiggly (undersmoothing) as it captures more variation than necessary [2].

EpiLPS does not suffer from such a trade-off as the latter is naturally solved by P-splines. In fact, one could say that the time window size in EpiEstim is analogue to the smoothing parameter λ in EpiLPS as these quantities will be key for the resulting smoothness of the fit. The major advantage with EpiLPS is that λ is estimated naturally within the Bayesian model (either via maximum a posteriori estimation or MCMC), while the choice of the time window in EpiEstim is chosen freely outside the model.

Comparing EpiLPS with EpiEstim at window midpoint

To correct for the lag effect in EpiEstim resulting from reporting the reproduction number estimate at the end of the window, Gostic and colleagues recommend to report it at the center of the window to obtain an estimate that is more accurately oriented in time. It is therefore important to compare the performance of EpiLPS against this “corrected” EpiEstim output as it is considered a best practice for a retrospective usage and, as such, is a legitimate candidate against EpiLPS which is by nature only partially real-time (see next section). We therefore run the entire simulations for all scenarios one more time accounting for the corrected EpiEstim output under weekly windows ω = 6 and three days windows ω = 2, where the estimated $R_{t}$ is computed at the window midpoint. Results for a daily window (ω = 0) are identical to those reported in Tables 2 and 3, as sliding windows become degenerate intervals at each time step [t, t]. The performance measures are reported in Table 4 and Figs 5 to 7 summarize the estimated trajectories for the same scenarios as in the previous section for the sake of comparison. As expected, the resulting $R_{t}$ trajectories for EpiEstim are now closer to the target and the lag effect has disappeared. Despite this improvement, the performance indicators clearly highlight that EpiLPS outperforms EpiEstim in all scenarios except Scenario 1, where the numbers are of a similar order of magnitude. In general, the EpiLPS approach is less biased and provides credible intervals with close to nominal coverage. Even when correcting for the reporting of $R_{t}$ at the middle of the window, EpiEstim results are less accurate, especially regarding credible intervals with weekly windows that can strongly undercover. This has important implications regarding the recommendation of using EpiLPS in practice and detailed recommendation guidelines are provided below.

Table 4. Simulation results for EpiLPS and EpiEstim in Scenarios 1–9 for S = 100 simulated epidemics.

The performance indicators in Scenarios 1–8 for $R_{t}$ are averaged over days t = 8,…,37 for LPSMAP, LPSMALA and weekly windows (EpiEstim) and over days t = 8,…,39 for 3 days windows under EpiEstim. In Scenario 9, the performance indicators for $R_{t}$ are averaged over days t = 8,…,57 for LPSMAP, LPSMALA and weekly windows (EpiEstim) and over days t = 8,…,59 for 3 days windows under EpiEstim. For EpiEstim, $R_{t}$ is reported at the window midpoint.

Scenario	Method	Bias	MSE	CP _90%	CP _95%	$C I_{90 %}^{Δ}$	$C I_{95 %}^{Δ}$
1 φ_FLU	LPSMAP	-0.013	0.017	91.433	95.567	0.415	0.497
	LPSMALA	-0.013	0.018	93.033	96.567	0.467	0.560
	EpiEstim (7d windows)	-0.010	0.011	88.533	93.900	0.303	0.361
	EpiEstim (3d windows)	0.001	0.023	88.531	93.406	0.452	0.539
2 φ_FLU	LPSMAP	-0.004	0.015	75.900	81.267	0.179	0.214
	LPSMALA	-0.003	0.011	92.300	93.467	0.322	0.386
	EpiEstim (7d windows)	-0.032	0.036	65.733	71.167	0.106	0.126
	EpiEstim (3d windows)	-0.007	0.015	77.500	84.156	0.166	0.198
3 φ_FLU	LPSMAP	-0.009	0.013	92.033	96.267	0.368	0.440
	LPSMALA	-0.010	0.014	92.233	95.967	0.407	0.487
	EpiEstim (7d windows)	0.020	0.016	79.233	86.500	0.268	0.319
	EpiEstim (3d windows)	0.014	0.023	89.031	94.094	0.431	0.514
4 φ_FLU	LPSMAP	0.000	0.002	89.200	93.667	0.192	0.230
	LPSMALA	0.000	0.003	92.300	96.367	0.179	0.214
	EpiEstim (7d windows)	-0.029	0.005	34.800	43.200	0.074	0.089
	EpiEstim (3d windows)	-0.005	0.003	80.781	87.281	0.119	0.142
5 φ_SARS	LPSMAP	0.016	0.227	93.567	96.933	1.897	2.345
	LPSMALA	0.001	0.234	91.400	96.433	1.887	2.359
	EpiEstim (7d windows)	0.057	0.160	82.967	89.333	1.065	1.272
	EpiEstim (3d windows)	0.146	0.395	82.844	89.438	1.619	1.939
6 φ_SARS	LPSMAP	0.003	0.203	91.233	95.433	1.682	2.067
	LPSMALA	-0.016	0.217	91.967	96.467	1.764	2.193
	EpiEstim (7d windows)	0.010	0.154	74.967	82.400	0.856	1.022
	EpiEstim (3d windows)	0.080	0.316	78.469	86.000	1.295	1.548
7 φ_SARS	LPSMAP	-0.003	0.187	96.267	98.733	1.895	2.401
	LPSMALA	-0.024	0.193	92.467	96.667	1.649	2.096
	EpiEstim (7d windows)	0.082	0.155	84.433	91.267	1.049	1.256
	EpiEstim (3d windows)	0.185	0.410	86.219	92.125	1.700	2.044
8 φ_SARS	LPSMAP	0.023	0.198	91.700	95.267	1.447	1.773
	LPSMALA	0.004	0.201	92.067	96.733	1.595	1.987
	EpiEstim (7d windows)	0.026	0.125	78.067	86.733	0.770	0.919
	EpiEstim (3d windows)	0.095	0.294	80.906	87.531	1.163	1.391
9 φ_SARS	LPSMAP	-0.026	0.081	91.340	95.240	0.870	1.047
	LPSMALA	-0.025	0.084	93.200	97.140	0.948	1.142
	EpiEstim (7d windows)	-0.004	0.087	77.180	84.920	0.675	0.805
	EpiEstim (3d windows)	0.047	0.141	85.788	91.500	1.024	1.223

Open in a new tab

Fig 5 — (Left) Simulated incidence data for Scenario 2. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the window midpoint. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

Fig 7 — (Left) Simulated incidence data for Scenario 9. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the window midpoint. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

Fig 6 — (Left) Simulated incidence data for Scenario 3. (Center) Estimated trajectories of $R_{t}$ for each simulated dataset with LPSMAP. (Right) Estimated trajectories of $R_{t}$ with EpiEstim using weekly sliding windows and $R_{t}$ reported at the window midpoint. The pointwise median estimate of $R_{t}$ for EpiLPS (dashed) and EpiEstim (dotted) is also shown.

Real-time considerations

EpiEstim is a powerful tool to estimate $R_{t}$ in real-time and is probably the best tool currently available to deliver timely estimates of the reproduction number [41]. EpiLPS can be considered a real-time approach only to a certain degree, where the real-time concept is partially present but fundamentally different from the one proposed in EpiEstim. By real-time method, we mean a method for which an estimate of the reproduction number at time t uses data up to (and including) time t. Let us assume that EpiLPS is applied on epidemic data over a specific period, say, $T = [1, T]$ . For time points t = 1, …, T − 1, EpiLPS is clearly non real-time as the global smoothing of $R_{t}$ on the “bandwidth” $T$ will be computed based on past, current and future data values. However, at the domain boundary (time point T), the EpiLPS estimate of $R_{t}$ will exclusively make use of data up to time T and is therefore real-time (in the same sense as EpiEstim).

The EpiLPS real-time characteristic for this last time point is however only retained temporarily, as if applied (the next day) over the period $T^{*} = [1, T + 1]$ , the estimate of the reproduction number at time T is not real-time anymore since it will be computed based on data up to time T and the “future” data value available at time point T+ 1. For EpiEstim, the real-time characteristic of the $R_{t}$ estimate is retained for any time point t and is therefore more suitable for timely estimation. The real-time properties of EpiLPS and EpiEstim are compared and illustrated in Fig 8.

Fig 8 — EpiLPS provides real-time estimates of $R_{t}$ only at the boundary of the considered domain and estimates at preceding time points are retrospective. On the contrary, estimates of $R_{t}$ with EpiEstim are always real-time and therefore preferred for a timely usage.

The extensive simulation results provided here, suggest that EpiLPS imposes itself as a robust retrospective estimation method. In particular, it seriously addresses a challenge faced by many existing methods, namely that $R_{t}$ estimates typically lead or lag the true value [2]. EpiLPS is therefore a powerful retrospective tool to estimate the reproduction number during and/or after epidemic outbreaks. It is however less preferable than EpiEstim for real-time estimation and should therefore be used with care for timely purposes.

Computing time and sensitivity analyses

The computational time of the EpiLPS algorithm is mainly affected by the number K of B-splines specified in the basis and the total number of days T of the epidemic. Table 5 gives an overview of the real elapsed time in seconds required to run the EpiLPS routines for different (T, K) couples. Obviously, LPSMAP requires far less computational resources as it is a completely sampling-free approach relying on the MAP estimate of the hyperparameter vector. Even with an epidemic of roughly two months and K = 60, LPSMAP is extremely fast and delivers results in a fraction of a second. LPSMALA needs a larger computational budget as the algorithm relies on an iterative sampling scheme (MCMC). However, even for (T = 60, K = 60), LPSMALA requires less than 10 seconds, which is a relatively reasonable time given the number of parameters involved in the model.

Table 5. Computational time (real elapsed time in seconds) of LPSMAP and LPSMALA (with a chain length of 3 000) for different combinations of T (total number of days of the epidemic) and K (total number of B-splines in the basis).

EpiLPS algorithm running on an Intel Xeon E-2186M CPU @2.90GHz with 16Go RAM.

Method		K = 20	K = 30	K = 40	K = 50	K = 60
LPSMAP	T = 20	0.122	0.105	0.140	0.188	0.253
	T = 30	0.091	0.124	0.185	0.242	0.326
	T = 40	0.096	0.135	0.201	0.255	0.337
	T = 50	0.083	0.109	0.156	0.200	0.276
	T = 60	0.074	0.110	0.148	0.181	0.287
LPSMALA	T = 20	3.098	3.499	4.151	4.832	5.886
	T = 30	3.653	4.043	4.776	5.505	6.548
	T = 40	4.167	4.663	5.425	6.126	7.238
	T = 50	5.061	5.545	6.253	7.151	8.108
	T = 60	5.913	6.362	7.151	8.062	9.074

Open in a new tab

We assessed the sensitivity of the EpiLPS estimated reproduction number with respect to model inputs that are free to choose in order to check whether EpiLPS is robust with respect to different parameter choices. In particular, we focus on the sensitivity of the $R_{t}$ fit (with LPSMAP) to the number K of B-splines and to the parameters a_δ and b_δ of the Gamma hyperprior on δ. The sensitivity analyses are implemented in S2 Appendix and reveal a negligible sensitivity of the estimated $R_{t}$ curve with respect to the above-mentioned parameters. We also discuss the sensitivity of the reproduction number estimates when computed over time domains of increasing width, for instance on [1, T₁] and [1, T₂] with T₂ > T₁. This gives an idea of the magnitude of variation in the estimated $R_{t}$ in the domain [1, T₁] when EpiLPS is actually fitted on the wider domain [1, T₂]. Results show that despite having values of $R_{t}$ that vary (in the past) when applied to larger time domains due to the global smoothing approach inherent in EpiLPS, the estimated values of the reproduction number remain reasonably close to the target. S3 Appendix provides ancillary results on the estimation performance of the overdispersion parameter ρ and sensitvity analyses of the computed credibles intervals for $R_{t}$ with respect to different couples (a_δ, b_δ).

Application to observed case counts in infectious disease epidemics

Epidemics of SARS-CoV-1 and influenza A H1N1

In this section, the LPSMALA algorithm is applied on two historical outbreak datasets presented in [3]. In particular, we consider the 2003 SARS outbreak in Hong Kong and the 2009 pandemic influenza in a school in Pennsylvania (USA). We use K = 40 B-splines with a second-order penalty and the serial interval distributions provided in the EpiEstim package [4]. The LPSMALA algorithm is implemented with a chain of length 25 000. Acceptance rates for the generated chains are close to the optimal value of 57% and the posterior samples have converged according to the Geweke (1992) [42] diagnostic test (at the 1% level of significance). Fig 9 shows the smoothed epidemic curves and the estimated $R_{t}$ for the two outbreaks. Results for the SARS data show that the reproduction number reaches a first peak during the third week, where ${\hat{R}}_{t} = 9.67$ (95% CI: 5.19–16.47) and a second more moderate peak around week 6 with ${\hat{R}}_{t} = 2.78$ (95% CI: 1.82–3.82). After day t = 43, the epidemic is under control and $R_{t}$ smoothly decays below 1. For the pandemic influenza in Pennsylvania, in the end of the second week $R_{t}$ is around 2.05 (95% CI: 1.21–3.06). During the middle of the third week, the situation is less severe and $R_{t}$ points below 1. As noted in [3], a few cases appeared in the last days of the epidemic generating an upward trend in $R_{t}$ estimates.

Fig 9 — (Left column) EpiLPS fit for the epidemic curve (top) and the instantaneous reproduction number $R_{t}$ (bottom) of the SARS outbreak in Hong Kong, 2003. (Right column) EpiLPS fit for the epidemic curve (top) and the instantaneous reproduction number $R_{t}$ (bottom) of the pandemic influenza in Pennsylvania, 2009. The shaded area corresponds to the 95% credible interval at each day.

Application on the SARS-CoV-2 pandemic

The EpiLPS methodology is illustrated on the SARS-CoV-2 pandemic using publicly available data from the Covid-19 Data Hub [43] and its associated COVID19 package on CRAN (https://cran.r-project.org/package=COVID19). Country-level data on hospitalizations for Belgium, Denmark, Portugal and France from April 5th, 2020 to October 31st, 2021 is used and a serial interval distribution with a mean of 3 days (and standard deviation of 2.48 days) is assumed [44] discretized as φ = {0.344, 0.316, 0.168, 0.104, 0.068}. In Fig 10, the estimated reproduction number obtained with EpiLPS and EpiEstim respectively, is shown for the four countries. Results are obtained with the LPSMAP algorithm using K = 30 B-splines and a second-order penalty. The gray shaded surface corresponds to 95% (pointwise) credible intervals for $R_{t}$ with LPSMAP and the dashed curves are for EpiEstim. From a computational perspective, it takes less than 3 seconds to fit the EpiLPS model for the four countries. The fitted reproduction numbers reflect the different waves of the COVID-19 pandemic and the rise in infections in the beginning of September 2021. We also see that EpiLPS tends to follow the same trend as the estimates provided by EpiEstim, the only difference is that LPSMAP estimates appear globally smoother with credible intervals that are less wide for Belgium, Denmark and Portugal.

Fig 10 — The shaded area corresponds to the 95% credible interval at each day. Dashed curves are results obtained with EpiEstim (with weekly sliding windows and estimated $R_{t}$ reported at the end of the window).

Discussion

EpiLPS (an acronym for Epidemiological modeling with Laplacian-P-Splines) is a fast and flexible tool for Bayesian estimation of the instantaneous reproduction number $R_{t}$ during epidemic outbreaks. The tool is flexible in the sense that (penalized) spline based approximations provide smoothed estimates of $R_{t}$ with little computational effort and without the constraint of imposing any sliding window assumption that could potentially affect the timing and accuracy of the estimator. Moreover, the end user has the choice between a fully sampling-free approach (LPSMAP) or an efficient MCMC gradient-based approach with Langevin diffusions (LPSMALA) for inference. The available EpiLPS package (https://cran.r-project.org/package=EpiLPS) allows public health policy makers to analyze incoming data faster than existing methods relying on classic MCMC samplers, thus permitting them to be better informed when taking decisions on control measures for infectious disease outbreaks. Simulation studies in this paper provide encouraging results and support EpiLPS as being a robust tool capable of a precise tracking of $R_{t}$ over time. The EpiLPS software package and the early website version (https://epilps.com) provide additional guiding material about the proposed methodology.

EpiLPS cannot be termed a real-time method in the same sense as in the Cori method and is therefore less preferred than EpiEstim for real-time analysis. Conceptually, EpiLPS and EpiEstim both use data from the past (EpiLPS also uses data from the future) to estimate the instantaneous reproduction number, but the mechanisms underlying the use of past observations differ. The method of Cori looks back in time only as far as the width of the chosen time window in terms of infected individuals. EpiLPS on the contrary has a stronger reach as the P-splines smoother approximates the reproduction number globally (or blockwise), over the entire domain of the epidemic curve, i.e. retrospectively and also including future values (except for the estimate of $R_{t}$ at the last day of the domain of the epidemic curve which makes use of the current day value and past values). This difference has important consequences and implies advantages as well as disadvantages. The advantage of working with a time window option as in EpiEstim is that one can control how far back in time to look in order to compute the desired $R_{t}$ estimate. This is not an option in EpiLPS as the penalty parameter, the key driver of the degree of smoothness of the fitted $R_{t}$ curve, is estimated within the model and is not fixed by the user. There is however no free lunch and the downside of having a time window choice in EpiEstim implies to face a trade-off between potential oversmoothing (with a wide time window) and undersmoothing (with a narrow time window). This trade-off is virtually absent in the EpiLPS setting as P-splines internally deal with the smoothing problem.

It is evident that when applying EpiLPS sequentially over time on epidemic curves with wider and wider domain length such as [1, T₁], [1, T₂], [1, T₃] with 1 < T₁ < T₂ < T₃, the $R_{t}$ estimate over past days (for instance t ∈ [1, T₁]) will inevitably change as EpiLPS is by nature a global smoother. This past variability should not be seen as a drawback as it is essentially an “update” taking into account the fact that the method works with an epidemic curve with a longer domain. The real question is whether the past variability of the $R_{t}$ estimate remains in a close neighborhood of the “true” value of the reproduction number for past days. On that side, the complete simulation study is rather convincing as it shows that EpiLPS is an accurate method that is successful in capturing the evolution of $R_{t}$ over time.

There are also other aspects with respect to which EpiEstim and EpiLPS differ. For instance, prior specification in EpiEstim assumes a Gamma distributed prior on the reproduction number which is conjugate to the Poisson likelihood (EpiEstim assumes that incidence at time step t is Poisson distributed), so that the posterior of $R_{t}$ also has a Gamma distribution. In EpiLPS, the prior(s) are not directly imposed on the reproduction number, but on the spline parameters (and hyperparameters) and the resulting posterior distribution of $R_{t}$ with LPSMAP is approximated by a lognormal distribution. Regarding computational complexity, EpiEstim and LPSMAP deliver estimates almost instantly, while LPSMALA requires a larger computing budget as it is a MCMC algorithm. We therefore recommend using LPSMALA over shorter epidemic durations and LPSMAP on longer outbreaks over several months. Our analysis suggests that EpiLPS might be more accurate than EpiEstim in presence of overdispersed epidemiological data, especially when it comes to quantify the uncertainty of $R_{t}$ as EpiLPS is shown to have narrower credible intervals with good coverage performance. A main limitation is that EpiLPS is more prone to numerical instability (e.g. during hyperparameter optimization or in the Newton-Raphson algorithm for the Laplace approximation) than EpiEstim, although such problems were not encountered here. Finally, it is also worth mentioning that $R_{t}$ estimates delivered by EpiLPS (and EpiEstim) are prone to potential biasing effects [2, 45] since the serial interval is used as a surrogate for the generation interval (time elapsed between infection events of an infector and an infectee) as the latter is less easily observed.

The EpiLPS project opens up several future research directions. A possible extension would be to formulate the EpiLPS model within a zero-inflated (Poisson) framework to cope with incidence time series characterized by an excess of zero counts. Another interesting extension would be to adapt the model to allow for regional variation and imported cases. Moreover, akin to EpiEstim, the EpiLPS methodology could be further developed to explicitly account for uncertainty in the serial interval distribution. Finally, in face of long-lasting epidemic scenarios involving several variants characterized by different levels of virulence, it would be useful to extend the EpiLPS methodology to allow for smooth transitions of the estimated reproduction number accompanying the evolution of variants.

Supporting information

S1 Appendix. Details for the LPSMALA algorithm.

Analytical gradient for the Langevin-Hastings proposal and analytical version of the ratio of proposal distributions in the LPSMALA algorithm.

(PDF)

Click here for additional data file.^{(193KB, pdf)}

S2 Appendix. Simulation results and computational time.

Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports $R_{t}$ at the window boundary, sensitivity analyses and computational time of EpiLPS.

(PDF)

Click here for additional data file.^{(1.7MB, pdf)}

S3 Appendix. Further simulation and sensitivity results.

Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports $R_{t}$ at the window midpoint and additional sensitivity analyses.

(PDF)

Click here for additional data file.^{(1.2MB, pdf)}

Data Availability

Simulation results and real data applications in this paper can be fully reproduced with the code available on the GitHub repository https://github.com/oswaldogressani/EpiLPS-ArticleCode based on the EpiLPS package version 1.0.6 available on CRAN (https://cran.r-project.org/package=EpiLPS).

Funding Statement

This project is funded by the European Union’s Research and Innovation Action (https://cordis.europa.eu/project/id/101003688) under the H2020 work programme, EpiPose grant number 101003688. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. White LF, Moser CB, Thompson RN, Pagano M. Statistical estimation of the reproductive number from case notification data. American Journal of Epidemiology. 2021;190(4):611–620. doi: 10.1093/aje/kwaa211 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PloS Computational Biology. 2020;16(12):1–21. doi: 10.1371/journal.pcbi.1008409 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cori A. EpiEstim: estimate time varying reproduction numbers from epidemic curves (CRAN); 2021. Available from: https://cran.r-project.org/web/packages/EpiEstim/index.html.
5. Parag KV. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PloS Computational Biology. 2021;17(9):1–23. doi: 10.1371/journal.pcbi.1009347 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Abbott S, Hellewell J, Sherratt K, Gostic K, Hickson J, Badr HS, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020. Available from: https://zenodo.org/record/3957490#.YzmFeUxBxf8. [Google Scholar]
7. Azmon A, Faes C, Hens N. On the estimation of the reproduction number based on misreported epidemic data. Statistics in Medicine. 2014;33(7):1176–1192. doi: 10.1002/sim.6015 [DOI] [PubMed] [Google Scholar]
8. Gressani O, Faes C, Hens N. An approximate Bayesian approach for estimation of the reproduction number under misreported epidemic data, MedRxiv [Preprint]; 2021. Available from: 10.1101/2021.05.19.21257438. [DOI] [PubMed] [Google Scholar]
9.Pircalabelu E. A spline-based time-varying reproduction number for modelling epidemiological outbreaks. LIDAM Discussion Paper ISBA; 2021. Available from: http://hdl.handle.net/2078.1/244926.
10. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PloS one. 2007;2(8):e758. doi: 10.1371/journal.pone.0000758 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007;274(1609):599–604. doi: 10.1098/rspb.2006.3754 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, et al. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18. doi: 10.18637/jss.v040.i08 [DOI] [Google Scholar]
13. Anscombe FJ. Sampling theory of the Negative Binomial and logarithmic series distributions. Biometrika. 1950;37(3/4):358–382. doi: 10.2307/2332388 [DOI] [PubMed] [Google Scholar]
14. Piegorsch WW. Maximum likelihood estimation for the Negative Binomial dispersion parameter. Biometrics. 1990;46(3):863–867. doi: 10.2307/2532104 [DOI] [PubMed] [Google Scholar]
15. Lloyd-Smith JO. Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PloS one. 2007;2(2):e180. doi: 10.1371/journal.pone.0000180 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Imai C, Armstrong B, Chalabi Z, Mangtani P, Hashizume M. Time series regression model for infectious disease and weather. Environmental Research. 2015;142:319–327. doi: 10.1016/j.envres.2015.06.040 [DOI] [PubMed] [Google Scholar]
17. Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11(2):89–121. doi: 10.1214/ss/1038425655 [DOI] [Google Scholar]
18. Frasso G, Lambert P. Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone. Biostatistics. 2016;17(4):779–792. doi: 10.1093/biostatistics/kxw027 [DOI] [PubMed] [Google Scholar]
19. Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. A review of spline function procedures in R. BMC Medical Research Methodology. 2019;19(46):1–16. doi: 10.1186/s12874-019-0666-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Eilers PHC, Marx BD. Practical Smoothing: The Joys of P-splines. Cambridge University Press; 2021. [Google Scholar]
21. Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13(1):183–212. doi: 10.1198/1061860043010 [DOI] [Google Scholar]
22. Jullion A, Lambert P. Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models. Computational Statistics & Data Analysis. 2007;51(5):2542–2558. doi: 10.1016/j.csda.2006.09.027 [DOI] [Google Scholar]
23. Gressani O, Lambert P. Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis. 2021;154:107088. doi: 10.1016/j.csda.2020.107088 [DOI] [Google Scholar]
24. Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using Integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(2):319–392. doi: 10.1111/j.1467-9868.2008.00700.x [DOI] [Google Scholar]
25. Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81(393):82–86. doi: 10.1080/01621459.1986.10478240 [DOI] [Google Scholar]
26. Chib S, Greenberg E. Markov Chain Monte Carlo Simulation Methods in Econometrics. Econometric Theory. 1996;12(3):409–431. doi: 10.1017/S0266466600006794 [DOI] [Google Scholar]
27. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence. 1984;PAMI-6(6):721–741. doi: 10.1109/TPAMI.1984.4767596 [DOI] [PubMed] [Google Scholar]
28. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 1953;21(6):1087–1092. doi: 10.1063/1.1699114 [DOI] [Google Scholar]
29. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109. doi: 10.1093/biomet/57.1.97 [DOI] [Google Scholar]
30. Roberts GO, Tweedie RL. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli. 1996;2(4):341–363. doi: 10.2307/3318418 [DOI] [Google Scholar]
31. Roberts GO, Rosenthal JS. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1998;60(1):255–268. doi: 10.1111/1467-9868.00123 [DOI] [Google Scholar]
32. Roberts GO, Rosenthal JS. Optimal Scaling for Various Metropolis-Hastings Algorithms. Statistical Science. 2001;16(4):351–367. doi: 10.1214/ss/1015346320 [DOI] [Google Scholar]
33. Lambert P, Eilers PHC. Bayesian density estimation from grouped continuous data. Computational Statistics & Data Analysis. 2009;53(4):1388–1399. doi: 10.1016/j.csda.2008.11.022 [DOI] [Google Scholar]
34. Lambert P, Eilers PHC. Bayesian proportional hazards model with time-varying regression coefficients: A penalized Poisson regression approach. Statistics in Medicine. 2005;24(24):3977–3989. doi: 10.1002/sim.2396 [DOI] [PubMed] [Google Scholar]
35. Gressani O, Faes C, Hens N. Laplacian-P-splines for Bayesian inference in the mixture cure model. Statistics in Medicine. 2022;41(14):2602–2626. doi: 10.1002/sim.9373 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Haario H, Saksman E, Tamminen J. An Adaptive Metropolis Algorithm. Bernoulli. 2001;7(2):223–242. doi: 10.2307/3318737 [DOI] [Google Scholar]
37. Atchadé YF, Rosenthal JS. On adaptive Markov chain Monte Carlo algorithms. Bernoulli. 2005;11(5):815–828. [Google Scholar]
38. Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437(7056):209–214. doi: 10.1038/nature04017 [DOI] [PubMed] [Google Scholar]
39. Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, et al. Transmission dynamics and control of severe acute respiratory syndrome. Science. 2003;300(5627):1966–1970. doi: 10.1126/science.1086616 [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Cauchemez S, Nouvellet P, Cori A, Jombart T, Garske T, Clapham H, et al. Unraveling the drivers of MERS-CoV transmission. Proceedings of the national academy of sciences. 2016;113(32):9081–9086. doi: 10.1073/pnas.1519235113 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Nash RK, Nouvellet P, Cori A. Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. PloS Digital Health. 2022;1(6):e0000052. doi: 10.1371/journal.pdig.0000052 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Geweke J. Evaluating the accurating of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics. 1992;4:169–193. [Google Scholar]
43. Guidotti E, Ardia D. COVID-19 Data Hub. Journal of Open Source Software. 2020;5(51):2376. doi: 10.21105/joss.02376 [DOI] [Google Scholar]
44. Kremer C, Braeye T, Proesmans K, André E, Torneri A, Hens N. Observed serial intervals of SARS-CoV-2 for the Omicron and Delta variants in Belgium based on contact tracing data, 19 November to 31 December 2021, MedRxiv [Preprint]; 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Britton T, Scalia Tomba G. Estimation in emerging epidemics: biases and remedies. Journal of the Royal Society Interface. 2019;16(150). doi: 10.1098/rsif.2018.0670 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010618.r001

Decision Letter 0

Tom Britton, Claudio José Struchiner

10 Mar 2022

Dear Dr Gressani,

Thank you very much for submitting your manuscript "EpiLPS: a fast and flexible Bayesian tool for near real-time estimation of the time-varying reproduction number" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Claudio José Struchiner, M.D., Sc.D.

Associate Editor

PLOS Computational Biology

Tom Britton

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this study, the authors propose a new method and R package to estimate the time-varying epidemic reproduction number. They use a simulation study to evaluate the performance of their method and compare it to an existing approach (EpiEstim), and present estimated reproduction numbers for historical SARS-CoV-1 and influenza epidemics as well as for SARS-Cov-2 in four European countries. I think the approach is interesting, particularly because it seems to allow for superspreading which few currently available methods do, but I have a number of queries that I would like to see addressed. A caveat: I haven’t had time to look into all the mathematical formulas in detail but for those I have looked into in depth they were all correct.

------------------------

Major comments

------------------------

Added value of the work

------------------------------------------------

I think in the introduction and throughout the manuscript you could make it clearer from early on 1) why splines are useful (what are they trying to achieve in plain language compared to non spline based approaches) and 2) what is the unique aspect that your approach tackles (I think the use of a negative binomial likelihood, which I think you should emphasize more and again explain in plain English why it’s a useful feature, e.g. to model superspreading). I have to say I was very excited by the prospect of this paper precisely because I was hoping it would address gaps in the literature on estimating the reproduction number in the presence of super-spreading; but this doesn’t seem to be specifically addressed in the end, particularly in the simulation study which used a Poisson offspring distribution (see further comments below). So in summary, could the authors more clearly explain what makes their method different / better than others and could they show in their simulation study examples illustrating this more clearly?

Simulation study

------------------------

I thought the simulation study could be hugely improved to better demonstrate the contexts in which the method performs well / better than others.

- First, why did you simulate from a Poisson and not a Negative binomial (see my comment above)? This seems very bizarre since I think this may be the main strength of your approach. I would suggest you expand your simulation study to demonstrate how well the method performs at various levels of superspreading / overdispersion. I would anticipate that benefits over for example EpiEstim may appear more clearly in the presence of overdispersion, which is not accounted for in most other approaches for estimation of R.

- Second, how did you choose the serial interval for each scenario? It seems weird to have different (rather ad hoc) ones for each scenario as it’s then hard to tell whether differences between scenarios come from the Rt profile or the change in SI. I would suggest using a couple of SI (maybe matching diseases e.g. flu and ebola for a short and longer SI?) and considering both SIs for each scenario.

- Third, how did you choose the values of K = 20 and second order penalty for the simulation study? Please consider alternative values (as you do with the window length for EpiEstim); I anticipate results may be quite sensitive to this choice.

- Finally, for EpiEstim results, how did you interpret these? i.e. for the 7 day time window, what “true” value of Rt (used for simulation) did you compare the 7 day average for? Rt at the window midpoint? The start? The end? It is very suspicious that in all scenarios (Figures 1-4, but maybe clearest Figure 3 the EpiEstim estimates seem to be “delayed”, as if you are not comparing like to like. And what priors did you use for EpiEstim? Also, what results would you obtain if you used 1 day windows for EpiEstim?

Potential limitations of the work

------------------------------------------------

- The authors claim that “alleviating the window size assumption (as in EpiLPS) can be of interest” but they introduce a number of other parameters, including K, the number of spline knots, the choice of which is I think as arbitrary as that of the time window in EpiEstim albeit less easy to interpret. The authors should present sensitivity of their results to these parameters and add some discussion.

- Ability to detect sudden changes in Rt over time: “We assume that μ(t) evolves smoothly over the time course of the epidemic” � Does this mean the method would not be very appropriate to estimate rapid changes in R(t) for example the reduction in R associated with a lockdown or a similar sudden control measure? This is what your simulation study suggests, and should be added to the discussion, along with more discussion on the contexts in which the authors expect their method to be particularly useful.

- In figure 1 your approach seems able to detect a drop in Rt before it actually happens � how is that possible? This suggests to me that estimates of Rt will be informed by data from later (after t), which would potentially preclude real time use. Could the authors clarify and discuss this.

- What is the effect of the different priors and hyperpriors?

- A lot of the methodology is based on the Laplace approximation and I would have liked to see some evidence and/or discussion about the extent to which this approximation is going to be valid in the context of the renewal equation.

- Generally more discussion should be added on the caveats / subjective choices, and which types of epidemic contexts do we think this method will perform well / better than other methods.

Methodological clarifications

------------------------------------------------

In section 2.4 I don’t think rho (the overdispersion parameter) appears at all. Could you please clarify the dependency on rho in this section?

Section 2.4.1, can you state more clearly what phi1, etc are; I assume phi1 = P(SI = 1)? Please define k more clearly. You should also highlight an assumption which is that p(SI <= 0) = 0 = p(SI > k). Please also more clarify in equation (9) whether y_t is the incidence of infections or symptomatic infections. In equation 9, please clearly state the values of s over which the sum is taken

Does the method lead to estimating the overdispersion parameter rho? If so, what estimates did you get for SARS-CoV-1 and 2 and flu and how do they compare with estimates from the literature obtained from analysing contact tracing data for example?

Related to my point about sensitivity to the number of spline knots K, I wonder how robust rho estimation would be to different choices for K, as I suspect it is hard to disentangle variations of Rt in time (the magnitude of which will be driven by K) from overdispersion (rho). I am not entirely sure these two are identifiable and would have hoped the authors show more clearly if this is true or not.

Comparison between your two methods and with EpiEstim

------------------------------------------------------------------------

It is not clear how or when one should use LPSMALA versus LPSMAP; I would recommend that this is discussed further and that results of the two approaches are compared more systematically in figures. For example why did you use LPSMALA for SARS and Flu but LPSMAP for covid?

In Figure 5: how do these compare to the EpiEstim estimates? Cori et al. used 5 datasets but you only looked at 2 here, was there a specific reason? Could you include them all?

Implementation

------------------------

Could the authors give some indication of computing time please.

Also, it is great that the R package is available on CRAN, but a shame that a vignette is not available. I would suggest that the authors use the SARS and flu examples for example as a basis for a vignette which would allow users to get started (I did notice some code is available to reproduce the results but the readme is not very informative so I think a vignette would be much better; I was using the link from the data availability section). The example in the epilps function runs well and fast (even with the LPSMALA method which is encouraging but I haven’t checked convergence was achieved).

------------------------

Minor comments

------------------------

Intro: I would avoid talking about “severity” of the outbreak in the context of the reproduction number as this may be confusing since R only measures transmissibility, not severity.

Equation 4: please clarify the notations, is L(theta, rho, D) the likelihood?

Do you really need to have both equations 12 and 13?

Reviewer #2: This paper presents a variant of the by now well known Cori et al (2013, 2021) approach to R_t estimation in an infectious disease outbreak based on smoothing the incidence count time series with splines.

The approach has the same basic problems as similar approaches to R_t, namely not accounting for cases not included in the time series of reported cases, delays in reporting, mixtures of differently infective strains, different incidences in different population groups.

Furthermore, a fixed serial time distribution is assumed (note that not all notified cases are due to symptom appearance and that recent research points to generation/serial times changing during an epidemic due to interventions and public behaviour...).

All this notwithstanding, R_t has turned out to be an important summary of population disease progress, used by decision makers, media and the public.

The paper is essentially concentrated on the computational part of R_t estimation as noted by the authors in the paragraph spanning pages 4-5, not on innovation of the concept of R_t itself. In fact, 14 pages out of 25 are essentially devoted to mathematical/statistical details of the used algorithms. Further 4 pages show the results of simulations of the proposed method compared to the EpiEstim method and finally there are 2 pages showing the results of the proposed new method applied to some real data.

Thus, the paper advocates use of the EpiLPS method as being slightly better than EpiEstim based on some simulations and with the theoretical advantage of not having to choose widths of moving windows in data. Perhaps, for a journal like PLoS Comp Biol, a much shorter paper with all the maths moved to an electronic appendix, would have been sufficient and quite adequate.

Some minor comments

On several occasions, R_t is called a "metric" in conflict with the usual mathematical meaning of the word.

In section 2.1, 3rd line, "infections" are called "contaminations", which may not be the most suitable synonym...

The four simulation scenarios are reasonable, but why are the serial interval distributions different?

Section 3.1 shows the proposed method seemingly having better adaptability to the "true" R_t but the comparison is with EpiEstim used with two fixed choices of window. Which choices of window widths have actually been advocated by EpiEstim users?

Reviewer #3: This is a very technical paper describing a method for estimating the reproduction ratio from epidemic data.

The method uses well known approaches (MCMC, laplace approximmation) that are combined and provide

an improvement in computation time.

1 - The method cannot be presented as "real-time". Real time should be reserved for methods

really attempting to estimate "in real time", that is characterized by

estimating R(t) from data up to time t. Here, as is common in such presentations, the authors

look at a global estimate obtained from the whole data. The R(t) estimate obtained from the renewal

equation is actually informed from the global spline smoothing.

2 - In the end, it is not very clear if the method should be prefered to the EpiEstim method. It is mentionned

that the coverage is better, but from what I read on EpiEstim they are using a window and estimating an average R(t) over the window,

with a corresponding confidence interval. So In the end, I'm not convinced that the comparison as presented here is fair,

because the confidence interval is for R(t) in your approach and for an average R(t) in the other case.

3 - There is one strange feature in the estimate seen in fig 2 for example : the R(t) starts to decrease before the date of intervention.

Do I have to tell you that this feature is always unwelcome? Indeed, this is exactly what will be used to say that "evidently" the intervention

was not responsible for the decrease in R(t) because "it started before the intervention". How could it be possible to change

the prior on the smoothing parameter automatically to change this?

4 - I suppose that running the method "in real time" during the course of an epidemic

would lead to changes in the estimates over the whole period because there would be changes in the smoothing parameter.

I would find it rather difficult to recommend the method if the "past" estimates would change as time passes.

5 - Are the confidence intervals "point-wise" or "trajectory-wise"?

6 - The method boasts using the NB formulation, but it is not shown how this makes the method more relevant. For example, was there

evidence in the examples that this was required? what was the difference between assuming NB or Poisson?

7 - I'm surprised that the daily pattern is not considered at all. In most countries, there was a strong such pattern. How comes

the spline does not try to adapt here to this pattern, since the prior seems to be favoring wiggliness?

8 - The model for using the NB does not really account for individual variation in transmission, where one would assume

that the offspring of each case is Xi = NB(R(t), rho). In this case, a more sensible description is i(t) = Sum Xi = NB(R(t) i(t-1), rho i(t-1))

assuming here for simplicity a generation time of one time unit. That is the dispersion parameter increases with incidence

This would be of interest to see whether you can accomodate this

description in here since it is much more relevant.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No: although this is public data I am not sure it is available with the paper or in the package

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. 2022 Oct 10;18(10):e1010618. doi: 10.1371/journal.pcbi.1010618.r002

Author response to Decision Letter 0

28 May 2022

Attachment

Submitted filename: Answers_to_reviewers.pdf

Click here for additional data file.^{(321KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010618.r003

Decision Letter 1

Tom Britton, Claudio José Struchiner

29 Jun 2022

Dear Dr Gressani,

Thank you very much for submitting your manuscript "EpiLPS: a fast and flexible Bayesian tool for estimation of the time-varying reproduction number" for consideration at PLOS Computational Biology.

When you are ready to resubmit, please upload the following:

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Sincerely,

Claudio José Struchiner, M.D., Sc.D.

Associate Editor

PLOS Computational Biology

Tom Britton

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I appreciate the changes made by the authors, which I think have improved the manuscript, but I still have critical concerns about the work presented.

My three main concerns are as follows.

First, much of the paper is focused on a comparison with EpiEstim, but I think the comparison is not made in a fair way. Indeed Rt is compared to the Rt EpiEstim would estimate on the time window ]t-w;t] where w is the width of the estimation window. EpiEstim authors do explain this is the only way to have a fair evaluation of the real-time ability to recover Rt. But it does mean the estimates are “lagged”. If you are not tied by the real-time assumption, then a much better estimate of Rt is that obtained from EpiEstim on the time interval centered around t: [t – w/2; t + w/2]. This does require data after t but will no longer result in a lag. Since the method presented here is not applicable in real-time, and uses data after t, I think a much fairer baseline comparator for this new method would be this non lagged, non-real-time version of EpiEstim. I don’t think the article is publishable without showing this as I believe the performance of that version of EpiEstim will be much better. Hopefully the new methods developed by the authors will be even better (and looking at the figure results, I think they may well be) but this needs to be demonstrated. A related point re EpiEstim is that one of the apparent added value of the new method is that it can deal with overdispersion, unlike EpiEstim, however I could not see an obvious improvement of the method’s performance over EpiEstim’s in scenarios with more overdispersion. I suggest the authors comment on this. I would also like to have more details on how the performance indicators computed in table 2 are calculated – is this on average across the whole epidemic? Because it seems from figures as if EpiEstim has for example wider CrI in the early epidemic periods, but quite narrow in the late epidemics.

My second concern is that this is not applicable in real-time. Although the authors acknowledge it, they don’t really show the behaviour of the method would have if applied in real-time. Most of the recent applications of EpiEstim have been to track the transmissibility of SARS-CoV-2 in real-time, so the authors should 1) show how their method performs in real-time in an extended simulation study, or if not then more clearly that it SHOULD not be applied in real time and 2) if not aimed at real-time, more clearly highlight the contexts in which this would be a good method to use (e.g. for retrospective analyses of an epidemic).

At the moment without such practical application guidelines, this feels like a very theoretical exercise, and I don’t know to what extent it is useful.

My third worry is that although the theoretical foundations of this novel approach seem valid, it may feel a bit like a black box to the user. For example, the scaling of the covariance matrix is briefly mentioned in the main text but it’s unclear to me whether it’s up to the user to do this, and if so how. Without a vignette with more instructions on how to use this tool I wonder whether one could really use it in practice. Similarly, although sensitivity analyses are now presented to demonstrate the impact of changing the number of splines, I feel like the user is not given an easy rule of thumb that they could follow re what value they should choose.

A final comment is on the evaluation of the overdispersion parameter. Even though it is considered a nuisance parameter, and I understand it is not the primary objective of the method to estimate it, I would have liked to see how well it is re-estimated in the simulation study.

Overall, as currently written, I think it is unclear in what context this method may be useful, and if it would truly outperform EpiEstim if a fair comparison was undertaken, I therefore encourage the authors to further revise their manuscript to address these issues.

I also add some more minor comments below

P21 “calibrating different values of rho” should be “using different values of rho”?

End of p22: I struggle to understand how coverage can be so similar between two methods whilst one has much wider credible intervals; can you explain? Is this because one is really biased?

End of P24 you should also talk about the “price to pay” with the new method namely non applicability in real time

Figure 3 actually suggests more precise estimates of EpiEstim in that scenario?

Last sentence of section 3.1 --> I think this is really the main advantage of the new method and should be highlighted a bit more in the intro and conclusion; but the non-real-time nature of the approach should also be more emphasised

Figure 6: would be useful to see the underlying data (hospitalisations) and add some discussions about implications of using a proxy for infections (with alpha being more severe than previous variants R is going to be overestimated at the transition between the two when using hospitalisations).

First sentence of the discussion, you say “during epidemic outbreaks” but I think that is a push since the method won’t work in real-time. You should clarify what contexts and analyses this method will be useful for.

P31 “EpiLPS and EpiEstim both use data from the past to estimate the instantaneous reproduction number” --> I think that is incorrect, i.e. EpiLPS also uses data from the future. If you want to demonstrate otherwise you need to present results of a real-time analysis.

P31 “The method of Cori looks back in time only as far as the width of the chosen time window” --> That is also incorrect, it uses data from before that time window as well (it considers individuals infected in that time window but potential infectors from before as well as during that window).

P31 “over the entire domain of the epidemic curve”, i.e. including the future, I think this is important to highlight.

P31 “there is however no free lunch” --> you should also highlight limitations of EpiLPS here namely the non real time nature which presumably greatly limit its applicability

P32, sentence starting “on that side” --> again you haven’t shown that EpiLPS will work if you are trying to estimate Rt in real time which is a major limitation so please clarify here that it is an accurate method to capture the PAST evolution of Rt.

P32 when recommending to use LPSMALA for shorter epidemics, perhaps highlight that it performs better than LPSMAP?

P32 “EpiLPS might be more accurate than EpiEstim in presence of overdispersed epidemiological data” --> what about in the absence of overdispersion? Again a detailed guide as to when one is best to use compared to the other would be really useful

Figure 11 supp2 and next ones: could you show how the CrI are affected by the changes in a_delta and b_delta parameters? This is important as it’s also one of the criteria you use to measure performance. Also in some of these figures the bias is actually quite a lot larger with low valus of a_delta and b_delta, which you should comment on in the matin text.

Table 4 supp2 please explicitly say which assumptions (e.g. time window of EpiEstim) you are using.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

Data Requirements:

Reproducibility:

PLoS Comput Biol. 2022 Oct 10;18(10):e1010618. doi: 10.1371/journal.pcbi.1010618.r004

Author response to Decision Letter 1

29 Jul 2022

Attachment

Submitted filename: Answers_to_reviewers_v2.pdf

Click here for additional data file.^{(267.6KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010618.r005

Decision Letter 2

Tom Britton, Claudio José Struchiner

14 Sep 2022

Dear Dr Gressani,

Thank you very much for submitting your manuscript "EpiLPS: a fast and flexible Bayesian tool for estimation of the time-varying reproduction number" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Claudio José Struchiner, M.D., Sc.D.

Academic Editor

PLOS Computational Biology

Tom Britton

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for taking the time to take on board my main comment from the last round of review. Although I am happy with these new result, I am afraid I still have some comments as detailed below.

Why present the new results in supplementary material? I think these are the main results and should be shown in the main text, whilst the results at the end of the interval currently in section 3.2 should be transferred to supplementary material.

The sensitivity analyses should also be altered to use this new window definition.

The main text results show that the best performance of EpiEstim in terms of nominal coverage is often obtained for a 1d window, therefore the authors should add this to their new results for completeness (both tables and figures in the current supplement 3), and so the reader can fully assess the comparison between EpiEstim with the 3 time windows and EpiLPS.

Adding figures for new sensitivity analyses on the github repository, with no explanation and no legend, is in my opinion not sufficient, these should be included as supplementary material and appropriately documented inthere.

Minor:

Page 11 what do you mean by “(in)directly”?

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Figure Files:

Data Requirements:

Reproducibility:

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. 2022 Oct 10;18(10):e1010618. doi: 10.1371/journal.pcbi.1010618.r006

Author response to Decision Letter 2

21 Sep 2022

Attachment

Submitted filename: Answers_to_reviewers_Minrev.pdf

Click here for additional data file.^{(221.1KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010618.r007

Decision Letter 3

Tom Britton, Claudio José Struchiner

30 Sep 2022

Dear Dr Gressani,

We are pleased to inform you that your manuscript 'EpiLPS: a fast and flexible Bayesian tool for estimation of the time-varying reproduction number' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Claudio José Struchiner, M.D., Sc.D.

Academic Editor

PLOS Computational Biology

Tom Britton

Section Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010618.r008

Acceptance letter

Tom Britton, Claudio José Struchiner

4 Oct 2022

PCOMPBIOL-D-22-00102R3

EpiLPS: a fast and flexible Bayesian tool for estimation of the time-varying reproduction number

Dear Dr Gressani,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Details for the LPSMALA algorithm.

Analytical gradient for the Langevin-Hastings proposal and analytical version of the ratio of proposal distributions in the LPSMALA algorithm.

(PDF)

Click here for additional data file.^{(193KB, pdf)}

S2 Appendix. Simulation results and computational time.

Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports $R_{t}$ at the window boundary, sensitivity analyses and computational time of EpiLPS.

(PDF)

Click here for additional data file.^{(1.7MB, pdf)}

S3 Appendix. Further simulation and sensitivity results.

Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports $R_{t}$ at the window midpoint and additional sensitivity analyses.

(PDF)

Click here for additional data file.^{(1.2MB, pdf)}

Attachment

Submitted filename: Answers_to_reviewers.pdf

Click here for additional data file.^{(321KB, pdf)}

Attachment

Submitted filename: Answers_to_reviewers_v2.pdf

Click here for additional data file.^{(267.6KB, pdf)}

Attachment

Submitted filename: Answers_to_reviewers_Minrev.pdf

Click here for additional data file.^{(221.1KB, pdf)}

Data Availability Statement

[pcbi.1010618.ref001] 1. White LF, Moser CB, Thompson RN, Pagano M. Statistical estimation of the reproductive number from case notification data. American Journal of Epidemiology. 2021;190(4):611–620. doi: 10.1093/aje/kwaa211 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref002] 2. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PloS Computational Biology. 2020;16(12):1–21. doi: 10.1371/journal.pcbi.1008409 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref003] 3. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref004] 4.Cori A. EpiEstim: estimate time varying reproduction numbers from epidemic curves (CRAN); 2021. Available from: https://cran.r-project.org/web/packages/EpiEstim/index.html.

[pcbi.1010618.ref005] 5. Parag KV. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PloS Computational Biology. 2021;17(9):1–23. doi: 10.1371/journal.pcbi.1009347 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref006] 6. Abbott S, Hellewell J, Sherratt K, Gostic K, Hickson J, Badr HS, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020. Available from: https://zenodo.org/record/3957490#.YzmFeUxBxf8. [Google Scholar]

[pcbi.1010618.ref007] 7. Azmon A, Faes C, Hens N. On the estimation of the reproduction number based on misreported epidemic data. Statistics in Medicine. 2014;33(7):1176–1192. doi: 10.1002/sim.6015 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref008] 8. Gressani O, Faes C, Hens N. An approximate Bayesian approach for estimation of the reproduction number under misreported epidemic data, MedRxiv [Preprint]; 2021. Available from: 10.1101/2021.05.19.21257438. [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref009] 9.Pircalabelu E. A spline-based time-varying reproduction number for modelling epidemiological outbreaks. LIDAM Discussion Paper ISBA; 2021. Available from: http://hdl.handle.net/2078.1/244926.

[pcbi.1010618.ref010] 10. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PloS one. 2007;2(8):e758. doi: 10.1371/journal.pone.0000758 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref011] 11. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007;274(1609):599–604. doi: 10.1098/rspb.2006.3754 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref012] 12. Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, et al. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18. doi: 10.18637/jss.v040.i08 [DOI] [Google Scholar]

[pcbi.1010618.ref013] 13. Anscombe FJ. Sampling theory of the Negative Binomial and logarithmic series distributions. Biometrika. 1950;37(3/4):358–382. doi: 10.2307/2332388 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref014] 14. Piegorsch WW. Maximum likelihood estimation for the Negative Binomial dispersion parameter. Biometrics. 1990;46(3):863–867. doi: 10.2307/2532104 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref015] 15. Lloyd-Smith JO. Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PloS one. 2007;2(2):e180. doi: 10.1371/journal.pone.0000180 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref016] 16. Imai C, Armstrong B, Chalabi Z, Mangtani P, Hashizume M. Time series regression model for infectious disease and weather. Environmental Research. 2015;142:319–327. doi: 10.1016/j.envres.2015.06.040 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref017] 17. Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11(2):89–121. doi: 10.1214/ss/1038425655 [DOI] [Google Scholar]

[pcbi.1010618.ref018] 18. Frasso G, Lambert P. Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone. Biostatistics. 2016;17(4):779–792. doi: 10.1093/biostatistics/kxw027 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref019] 19. Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. A review of spline function procedures in R. BMC Medical Research Methodology. 2019;19(46):1–16. doi: 10.1186/s12874-019-0666-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref020] 20. Eilers PHC, Marx BD. Practical Smoothing: The Joys of P-splines. Cambridge University Press; 2021. [Google Scholar]

[pcbi.1010618.ref021] 21. Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13(1):183–212. doi: 10.1198/1061860043010 [DOI] [Google Scholar]

[pcbi.1010618.ref022] 22. Jullion A, Lambert P. Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models. Computational Statistics & Data Analysis. 2007;51(5):2542–2558. doi: 10.1016/j.csda.2006.09.027 [DOI] [Google Scholar]

[pcbi.1010618.ref023] 23. Gressani O, Lambert P. Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis. 2021;154:107088. doi: 10.1016/j.csda.2020.107088 [DOI] [Google Scholar]

[pcbi.1010618.ref024] 24. Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using Integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(2):319–392. doi: 10.1111/j.1467-9868.2008.00700.x [DOI] [Google Scholar]

[pcbi.1010618.ref025] 25. Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81(393):82–86. doi: 10.1080/01621459.1986.10478240 [DOI] [Google Scholar]

[pcbi.1010618.ref026] 26. Chib S, Greenberg E. Markov Chain Monte Carlo Simulation Methods in Econometrics. Econometric Theory. 1996;12(3):409–431. doi: 10.1017/S0266466600006794 [DOI] [Google Scholar]

[pcbi.1010618.ref027] 27. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence. 1984;PAMI-6(6):721–741. doi: 10.1109/TPAMI.1984.4767596 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref028] 28. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 1953;21(6):1087–1092. doi: 10.1063/1.1699114 [DOI] [Google Scholar]

[pcbi.1010618.ref029] 29. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109. doi: 10.1093/biomet/57.1.97 [DOI] [Google Scholar]

[pcbi.1010618.ref030] 30. Roberts GO, Tweedie RL. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli. 1996;2(4):341–363. doi: 10.2307/3318418 [DOI] [Google Scholar]

[pcbi.1010618.ref031] 31. Roberts GO, Rosenthal JS. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1998;60(1):255–268. doi: 10.1111/1467-9868.00123 [DOI] [Google Scholar]

[pcbi.1010618.ref032] 32. Roberts GO, Rosenthal JS. Optimal Scaling for Various Metropolis-Hastings Algorithms. Statistical Science. 2001;16(4):351–367. doi: 10.1214/ss/1015346320 [DOI] [Google Scholar]

[pcbi.1010618.ref033] 33. Lambert P, Eilers PHC. Bayesian density estimation from grouped continuous data. Computational Statistics & Data Analysis. 2009;53(4):1388–1399. doi: 10.1016/j.csda.2008.11.022 [DOI] [Google Scholar]

[pcbi.1010618.ref034] 34. Lambert P, Eilers PHC. Bayesian proportional hazards model with time-varying regression coefficients: A penalized Poisson regression approach. Statistics in Medicine. 2005;24(24):3977–3989. doi: 10.1002/sim.2396 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref035] 35. Gressani O, Faes C, Hens N. Laplacian-P-splines for Bayesian inference in the mixture cure model. Statistics in Medicine. 2022;41(14):2602–2626. doi: 10.1002/sim.9373 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref036] 36. Haario H, Saksman E, Tamminen J. An Adaptive Metropolis Algorithm. Bernoulli. 2001;7(2):223–242. doi: 10.2307/3318737 [DOI] [Google Scholar]

[pcbi.1010618.ref037] 37. Atchadé YF, Rosenthal JS. On adaptive Markov chain Monte Carlo algorithms. Bernoulli. 2005;11(5):815–828. [Google Scholar]

[pcbi.1010618.ref038] 38. Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437(7056):209–214. doi: 10.1038/nature04017 [DOI] [PubMed] [Google Scholar]

[pcbi.1010618.ref039] 39. Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, et al. Transmission dynamics and control of severe acute respiratory syndrome. Science. 2003;300(5627):1966–1970. doi: 10.1126/science.1086616 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref040] 40. Cauchemez S, Nouvellet P, Cori A, Jombart T, Garske T, Clapham H, et al. Unraveling the drivers of MERS-CoV transmission. Proceedings of the national academy of sciences. 2016;113(32):9081–9086. doi: 10.1073/pnas.1519235113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref041] 41. Nash RK, Nouvellet P, Cori A. Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. PloS Digital Health. 2022;1(6):e0000052. doi: 10.1371/journal.pdig.0000052 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref042] 42. Geweke J. Evaluating the accurating of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics. 1992;4:169–193. [Google Scholar]

[pcbi.1010618.ref043] 43. Guidotti E, Ardia D. COVID-19 Data Hub. Journal of Open Source Software. 2020;5(51):2376. doi: 10.21105/joss.02376 [DOI] [Google Scholar]

[pcbi.1010618.ref044] 44. Kremer C, Braeye T, Proesmans K, André E, Torneri A, Hens N. Observed serial intervals of SARS-CoV-2 for the Omicron and Delta variants in Belgium based on contact tracing data, 19 November to 31 December 2021, MedRxiv [Preprint]; 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010618.ref045] 45. Britton T, Scalia Tomba G. Estimation in emerging epidemics: biases and remedies. Journal of the Royal Society Interface. 2019;16(150). doi: 10.1098/rsif.2018.0670 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number

Oswaldo Gressani

Jacco Wallinga

Christian L Althaus

Niel Hens

Christel Faes

Roles

Abstract

Author summary

Introduction

Methods

Negative binomial model for case incidence data

Laplace approximation to the conditional posterior of θ

Hyperparameter optimization

Estimation of Rt with LPSMAP

The renewal equation “plug-in” estimate

Credible intervals for Rt

Estimation of Rt with LPSMALA

Conditional posteriors for a “Metropolis-within-Gibbs”

The modified Metropolis-adjusted Langevin algorithm

Posterior inference with LPSMALA

Results

Setting of the simulation study

Table 1. Time domain of the epidemic curve, assumed functional form of the reproduction number, serial interval distribution and its associated source(s) in the literature for the nine scenarios considered in the simulation study.

Fig 1.

Comparing EpiLPS with EpiEstim at window boundary

Table 2. Results for EpiLPS and EpiEstim in Scenarios 1–8 for S = 100 simulated epidemics.

Table 3. Results for EpiLPS and EpiEstim in Scenario 9 for S = 100 simulated epidemics.

Fig 2.

Fig 4.

Fig 3.

Comparing EpiLPS with EpiEstim at window midpoint

Table 4. Simulation results for EpiLPS and EpiEstim in Scenarios 1–9 for S = 100 simulated epidemics.

Fig 5.

Fig 7.

Fig 6.

Real-time considerations

Fig 8. Real-time properties of EpiLPS (top) and EpiEstim (bottom) when applied on domains T=[1,T] and T*=[1,T+1].

Computing time and sensitivity analyses

Table 5. Computational time (real elapsed time in seconds) of LPSMAP and LPSMALA (with a chain length of 3 000) for different combinations of T (total number of days of the epidemic) and K (total number of B-splines in the basis).

Application to observed case counts in infectious disease epidemics

Epidemics of SARS-CoV-1 and influenza A H1N1

Fig 9.

Application on the SARS-CoV-2 pandemic

Fig 10. Estimated reproduction number from 2020–04-05 to 2021–10-31 for Belgium, Denmark, Portugal and France with LPSMAP using K = 30 B-splines and a second-order penalty.

Discussion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Tom Britton

Claudio José Struchiner

Roles

Author response to Decision Letter 0

Decision Letter 1

Tom Britton

Claudio José Struchiner

Roles

Author response to Decision Letter 1

Decision Letter 2

Tom Britton

Claudio José Struchiner

Roles

Author response to Decision Letter 2

Decision Letter 3

Tom Britton

Claudio José Struchiner

Roles

Acceptance letter

Tom Britton

Claudio José Struchiner

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Estimation of $R_{t}$ with LPSMAP

Credible intervals for $R_{t}$

Estimation of $R_{t}$ with LPSMALA

Fig 8. Real-time properties of EpiLPS (top) and EpiEstim (bottom) when applied on domains $T = [1, T]$ and $T^{*} = [1, T + 1]$ .