Estimating the basic reproduction number at the beginning of an outbreak

Sawitree Boonpatcharanon; Jane M Heffernan; Hanna Jankowski

doi:10.1371/journal.pone.0269306

. 2022 Jun 17;17(6):e0269306. doi: 10.1371/journal.pone.0269306

Estimating the basic reproduction number at the beginning of an outbreak

Sawitree Boonpatcharanon ^1,^#, Jane M Heffernan ^2,^3,^*,^#, Hanna Jankowski ^2,^3,^#

Editor: Inés P Mariño⁴

PMCID: PMC9205483 PMID: 35714080

Abstract

We compare several popular methods of estimating the basic reproduction number, R₀, focusing on the early stages of an epidemic, and assuming weekly reports of new infecteds. We study the situation when data is generated by one of three standard epidemiological compartmental models: SIR, SEIR, and SEAIR; and examine the sensitivity of the estimators to the model structure. As some methods are developed assuming specific epidemiological models, our work adds a study of their performance in both a well-specified (data generating model and method model are the same) and miss-specified (data generating model and method model differ) settings. We also study R₀ estimation using Canadian COVID-19 case report data. In this study we focus on examples of influenza and COVID-19, though the general approach is easily extendable to other scenarios. Our simulation study reveals that some estimation methods tend to work better than others, however, no singular best method was clearly detected. In the discussion, we provide recommendations for practitioners based on our results.

Introduction

The basic reproduction number, R₀, (also called the basic reproductive ratio) is defined as the expected number of new infections produced by a single (typical/average) infectious individual, when introduced into a totally susceptible population. R₀ is used in epidemiological studies of infectious diseases to gauge how contagious/transmissible an infectious disease is: if R₀ < 1, the disease will die out, and if R₀ > 1 infection can increase in the population. It is also used to determine how effective vaccination or other disease mitigation strategies need to be in order to protect populations from infection.

At the outset of an infectious disease outbreak, an immediate goal is to determine R₀, so that public health and healthcare decision makers can be informed. For example, at the debut of the COVID-19 pandemic, reports of R₀ estimates were plentiful (see e.g. [1–6]). In the recent MERS-COV, 2009 H1N1, and 2003 SARS epidemics, there were also numerous studies of R₀ globally (see [7–19] for a small snapshot).

There are many statistical and mathematical methods that can be used to estimate R₀ [20–29]. A main difficulty in R₀ estimation is that the methods often depend on data that is not available, or the methods suffer from collection and/or reporting, or other, bias. Different estimators utilize different approaches to deal with these difficulties. Broadly speaking, estimators can be classified as real-time (requiring little computation time) and non-real-time (requiring more extensive computation). Real-time estimators typically rely on simple epidemic models and/or simplifications of models in an attempt to remove dependence on unobservables (such as the Susceptible-Infectious-Recovered, a.k.a. SIR, compartmental modelling framework). Non-real time methods generally handle unobservables via Bayesian or Monte Carlo approaches, at the cost of computing time. Often, real-time methods also assume some prior knowledge of other parameters, such as the serial interval (SI). It is therefore important to study the effects of misspecification of the either the modelling framework or input parameters on these estimators. For example, suppose an R₀ estimator has been constructed to work within a SIR disease modelling framework. Infectious diseases, however, can include periods of infection that are not infectious. The infectious period can also be split into various stages of asymptomatic and symptomatic infection, which ultimately affect the case reporting rate to public health. Therefore, methods that are based on the SIR modelling framework can project erroneous estimates of R₀, and differences in R₀ estimates may simply reflect poor estimator structure or application to data that has been misspecified.

A recent study by [27] has discussed several nuances of different estimator methods that can affect R₀ estimates. The effect of misspecification is only touched on briefly. In this work, we compare six different estimators of R₀: four real-time estimators and two estimators which require longer computation times. The four real-time estimators are based on an SIR or similar framework, while the two other estimators can be tuned to extensions of the SIR model. We then simulate data generated from one of three compartmental epidemiological models, the SIR, SEIR, and SEAIR models that track susceptible (S), exposed (E), asymptomatically infectious (A), symptomatically infectious (I), and recovered (R) individuals in their modelling frameworks. We note that three of the real-time estimators assume that the serial interval is known, and therefore we also consider the situation when this serial interval is guessed incorrectly in these estimators. Our work thus studies the effect of compartmental model and/or serial interval misspecification on the real-time estimators. Moreover, non-real-time methods require specification of the epidemiological model by the investigator, and our work studies the effect of compartmental model misspecification on these.

The report of our findings is organized as follows. We first provide an introduction to three compartmental infectious disease models that we use to generate case data. Six R₀ estimators are then introduced, including a discussion of their underlying compartmental model structure assumptions. We then apply each estimator to data generated from the three compartmental models, and Canadian COVID-19 data for the provinces of British Columbia, Ontario, Quebec, and also for the country as a whole. Early epidemic dynamics are discussed using the inflection point (or turning point) in the epidemic growth curve, the point at which the curvature in the epidemic growth curve changes—early timepoints exist before this point. We employ parameter values representative of respiratory virus epidemics, and in particular, influenza and COVID-19 [30–35]. We note that while daily data may be sometimes available during an infectious disease outbreak, it may not be complete and can include a reporting delay. We thus have chosen to use weekly case reports. Weekly case report data is also typical to outbreaks of influenza, a respiratory virus, and a chosen pathogen of study.

Methods

Epidemiological models

We focus on three compartmental epidemiological models that form the basis of all infectious diseases models [36–38], the

SIR: Susceptible–Infectious–Recovered
SEIR: Susceptible–Exposed–Infectious–Recovered
SEAIR: Susceptible–Exposed–(Asymptomatic Infectious)–(Symptomatic Infectious)–Recovered

models. The models are each composed of three to five compartments (with labels matching the model name). Individuals transition from one compartment to the next based on pre-specified random dynamics. Here, we assume that these distributions are exponential, and thus assume systems of ordinary differential equations (ODEs). We use the notation θ = (β, σ, ρ, γ) to denote the vector of parameters for the models, see Table 1 for details. The ODE systems for all three are provided in the S1 Appendix, as well as their corresponding flow diagrams. All models are considered without inclusion of demography, i.e. birth and death. The total population is fixed throughout the simulation and denoted by N with initial values of S₀ and I₀ for S and I populations, respectively, and all others zero. Therefore, for all three models N is equal to S₀ + I₀, and this is approximately equal to S₀ since S₀ > >I₀. For the SIR model, for all t ≥ 0 it also holds that S(t) + I(t) + R(t) = N. Similarly, S(t) + E(t) + I(t) + R(t) = N for the SEIR model and for the SEAIR model, S(t) + E(t) + A(t) + I(t) + R(t) = N.

Table 1. SIR, SEIR, SEAIR model parameters and values, R₀, serial interval.

(a) Model contact rate notation
model	parameter
model	β	σ	ρ	γ
SIR	S → I: βI(t)/N			I → R: γ
SEIR	S → E: βI(t)/N	E → I: σ		I → R: γ
SEAIR	S → E: βI(t)/N	E → A: σ	A → I: ρ	I → R: γ
(b) Model parameters, R₀, and serial interval
model	θ	R₀ = R₀(θ)		serial interval
SIR	(β, γ)	β/γ		1/γ
SEIR	(β, γ, σ)	β/γ		1/γ + 1/σ
SEAIR	(β, γ, σ, ρ)	β/γ + β/ρ		1/γ + 1/σ
(c) Parameter values for simulations
model	influenza 1	influenza 2		COVID-19
SIR	(1/3, 1/5)	(1/3, 1/5)		(1/2, 5/26)
SEIR	(1/3, 1/3, 1/5)	(5/9, 1/2, 1/3)		(13/11, 1/3, 5/11)
SEAIR	(1/3, 1/3, 1/2, 1/5)	(5/12, 1/2, 1, 1/3)		(26/57, 1/3, 2/7, 5/11)

Open in a new tab

Data is generated using the SIR, SEIR, and SEAIR compartmental model structures using a stochastic agent-based modelling framework implemented in C++. The simulations progress at the level of individual hosts in the applicable model disease status compartments. The simulation moves forward using “event times” that are assigned to each infected individual in the population and are determined by the compartment characteristics of which an individual is currently a member. Such event times correspond to infection events, when an infected individual transmits the infection to a susceptible, and times at which infected individuals progress to the next stage of infection or recover. The C++ model is based on previous work [39, 40]. Again, we note that all event times are assumed to be exponentially distributed with mean 1/ξ where ξ refers to the model parameter associated with the same transition in the system of ordinary differential equations. See Table 1.

1000 agent-based model simulations are conducted for each of the SIR, SEIR, and SEAIR frameworks with parameters as given in Table 1. Model parameters were taken from the literature, and are representative of pandemic influenza (R₀ ∈ [1.2, 7], serial interval ∈[1.5, 9.5]) and COVID-19 (R₀ ∈ [1.6, 3.4], serial interval ∈[4.2, 7.5]) [3, 30–32].

The first influenza (influenza 1 in Table 1) example parameters are such that R₀ = 5/3 for SIR and SEIR and R₀ = 7/3 for SEAIR. For this example, the serial interval is 5 days for the SIR model and 8 days for the SEIR and SEAIR models.
The second influenza (influenza 2) example parameters are such that R₀ = 5/3 and the serial interval is 5 days for each of the SIR, SEIR, and SEAIR models.
The COVID-19 parameters are such that R₀ = 2.6 and the serial interval is 5.2 days, again, for all models. The incubation period in the SEAIR COVID-19 model has a mean of 6.5 days [32].

For each epidemic, the population size N is set to 10, 001 where S(0) = 10, 000 and I(0) = 1.

R₀ and the serial distribution

The serial distribution is the distribution from the time that an infected individual (the infector) becomes symptomatic, to the time when a person infected by the infector, the infectee, becomes symptomatic. For the SIR model, this is the same as the time spent in the I compartment, and in particular, the serial distribution is exponential with mean 1/γ when exponential distributions are assumed throughout the model [41]. We summarize the serial intervals for our models in Table 1 [41]. In the literature, the serial distribution may also be referred to as the serial interval, although this most often refers to the mean of the serial distribution, or alternatively, a range indicating highly likely values from the serial distribution. Here, we will use the convention that the serial interval refers to the mean of the serial distribution. For diseases such as influenza, it may be reasonable to assume that the serial distribution is known apriori. For other situations, such as new emerging diseases, such assumptions are less valid.

Methods for estimating R₀

Many methods exist to estimate R₀. We refer to [29] for a recent review. If the transition rates in the compartmental models are known, then R₀ can be easily calculated using the formulas listed in Table 1. However, full transition rates are generally not known in practice, and hence statistical estimation methods are required. The main difficulty in estimation is that complete data is unavailable for the full epidemiological model. Here, we consider six different methods of estimating R₀. For simplicity, we name the methods WP, seqB, ID, IDEA, plug-n-play, and fullBayes in this work. A summary of the methods and their key properties is given in Table 2 for reference.

Table 2. Summary of estimation methods for R₀.

method	summary
WP	White & Pagano Method, due to [42]. Serial distribution can be assumed known or can be estimated using MLE; method developed under branching process model; simple method which yields real-time estimates (when serial interval is unknown the method takes longer to compute).
seqB	Sequential Bayes Method, due to [43]. Serial distribution assumed known (only the mean is used); method developed assuming SIR model and uses sequential Bayes methods; simple method which yields real-time estimates.
ID	Incidence Decay Method (see [44]). Serial distribution assumed known (only the mean is used); method developed assuming an SIR model structure and uses least squares estimation. It is a simple method which yields real-time estimates.
IDEA	The Incidence Decay and Exponential Adjustment Method is presented in [44]. Serial distribution assumed known (only the mean is used); method developed assuming SIR model and uses least squares estimation; simple method which yields real-time estimates. IDEA uses a slightly more complex model for fitting than ID.
plug-and-play	Plug-and-Play Method. See [45]. Serial distribution assumed unknown; method selects one of SIR/SEIR/SEAIR model; implementations available though not real-time (depending on input selection). Generally, this approach fits the complete model using maximum likelihood and relying on Monte Carlo to fill in missing observations. The R-package, called POMP, is quite technical and can be difficult to implement [45].
fullBayes	Full Bayes Method. See [46]. Serial distribution assumed unknown; method selects one of SIR/SEIR/SEAIR model; not real-time. this approach fits the complete model using maximum likelihood and relying on Monte Carlo to fill in missing observations. Can be quite technical in implementation.

Open in a new tab

The first four (WP, seqB, ID, and IDEA) are real-time methods based on simplifications of the full ODE epidemiological models. This simplification is necessitated by the fact that the full data is unobservable. In these methods, estimation of R₀ is coupled with either estimation or prior knowledge of the serial distribution.

The two latter methods (plug-n-play and fullBayes) do not simplify the full epidemic models, but handle the issue of unobservable data by Monte Carlo simulation (plug-n-play method) or Bayesian priors with MCMC used to handle estimation due to model complexity (fullBayes method). As such, these methods are more computationally intensive. These two methods estimate the unknown transition rate parameter vector θ in the epidemic model. They do not require any prior knowledge, including prior knowledge of the serial distribution. Indeed, since the methods result in estimates of θ, these can then in turn be used to derive an estimate of the serial distribution. Furthermore, the methods assume prior knowledge of the epidemic model, in the sense that the user can decide whether the SIR, SEIR, or the SEAIR model is more appropriate for the particular disease. In contrast, the WP, seqB, ID, and IDEA methods all rely on simplifications, and are not able to allow for such tailoring.

Although the plug-n-play and fullBayes methods are more computationally intensive and not considered “real-time”, we note that modern day access to computational power is blurring this line of distinction. Our implementations of fullBayes and plug-n-play were done on a non-specialized desktop computer and without special consideration to computing time in the implementations. The time required to obtain the estimates was less than two minutes in both cases, and we do not consider this to be prohibitive. Furthermore, more careful programming could yield even faster estimates. A more detailed discussion is available in Sectio. Computational Time.

WP: Maximum likelihood estimation of a branching model

[42] developed a straightforward estimation method whereby either the serial distribution is known, or the serial distribution is estimated along with R₀. The method assumes that only the number of infectious individuals at discrete time points (e.g. daily or weekly) is observable and both approaches (serial known and unknown) use maximum likelihood. Recall that I(t) denotes the number of infecteds (i.e. the individuals in compartment I) at time t. Using our notation, and assuming that the times t₀ = 0, t₁, t₂, …, t_κ are integers which count, for example, the number of days or weeks since the beginning of the pandemic (time zero), [42] obtain the log-likelihood

\begin{matrix} ℓ (R_{0}, p) = - \sum_{i = 1}^{κ} μ (t_{i}) + \sum_{i = 1}^{κ} I (t_{i}) log μ (t_{i}), \end{matrix}

where $μ (t) = R_{0} \sum_{j = 1}^{min (κ, t)} I (t - t_{j}) p (t_{j})$ and p is a vector denoting the (discrete and finite) serial distribution on t₁, …, t_κ. That is, if Y is the random variable representing the serial distribution then p(t_j) = P(t_j ≤ Y < t_j+1)/P(Y ≤ t_κ). If p is known (notably, this includes knowing the value of t_κ which describes the support of p) then the maximum likelihood estimate of R₀ is straightforward to compute. In the SIR model with exponential transitions, p(t_j) is a truncated geometric distribution. If p is unknown, then [42] recommend discretizing a gamma distribution to simplify estimation. Other models (SEIR and SEAIR) do not have simple closed form expressions for p(t_j) (see [41]). We found that for coarse data (e.g. weekly) the discretization and mean dominates the values of p more so than the actual distribution chosen.

The WP method assumes an underlying branching process, which is neither of the SIR/SEIR/SEAIR models from which our data sets are generated. This model assumes, in particular, that throughout, the population size “available” to be infected remains constant, which does not hold for our simulated ODE models. As such, estimates should only really be considered early on in the epidemic. In our simulations presented below, we highlight the inflection point of each epidemic, and the WP method should only really be considered valid before this time.

The method has been implemented in [47], see also [48] for details on the R package called R₀. In our simulations, we found this implementation to have some numerical instability issues, which is most likely caused by the particular parameters of our simulated data sets. This instability was particularly profound when p was assumed unknown, and most often the algorithm would not yield a solution. For this reason, we programmed our own implementation, for which we used a simple grid search. The built-in alternative optimization function in R uses the bisection method, and was very sensitive to the starting value (a small change in the starting value could change the R₀ estimate by orders of a thousand). In comparison, the grid search approach performed better, although it was still not ideal. The likelihood surface is very flat, which resulted in a non-unique MLE (we report only a default value). This property of the likelihood surface is most likely what also causes the issues we observed for our data in the implementation of the R₀ R package [48].

Furthermore, note that the log-likelihood assumes that the serial distribution is discrete, and that this discretization matches the observed data. That is, if data is observed weekly, the serial distribution is only known on a weekly timescale. This discretization can affect the serial distribution considerably, particularly if the timescale is quite coarse.

seqB: Equential Bayes estimation using an SIR approximation

[43] developed a Bayesian approach used to estimate R₀. As above, it is assumed that infectious counts are observed at periodic times such as days or weeks. The basic idea is to start with a mildly informative prior on R₀ and then update sequentially. The approach is based on the SIR model, and assumes that the mean of the serial distribution is known (under the SIR model, this is equivalent to knowing the parameter γ which is the inverse of the mean of the serial distribution). [43] note that under the SIR model, and considering time interval t_j+1 − t_j

\begin{matrix} I (t_{j + 1}) & = I (t_{j}) exp [γ \int_{t_{j}}^{t_{j + 1}} (R_{0} \frac{S (s)}{N} - 1) d s] \\ \approx I (t_{j}) exp [(t_{j + 1} - t_{j}) γ (R_{t} - 1)], \end{matrix}

where R_t = R₀ S(t)/N ≈ R₀ at the beginning of an infection. Using this result, seqB assumes that the conditional distribution of I(t_j+1), conditional on I(t_j), R₀, is Poisson with mean λ = I(t_j) exp{(t_j+1 − t_j)γ(R₀−1)}. In the approach, γ is known, and a prior is placed on R₀. With N₀ also assumed known, posterior estimates are found using a hierarchical or sequential Bayes approach. Note that the method cannot handle data sets where there are no new infections observed in some time interval t_j+1 − t_j (as this results in a Poisson mean of zero). Therefore, the times at which infectious counts are observed must be sufficiently coarse so that all counts are non-zero (e.g. weeks instead of days). The method would also be inappropriate for situations where long intervals between cases are observed in the initial stages of the epidemic. This was observed, for example, in Canada for the first cases of COVID-19.

Although the above development is based on the SIR model, the resulting approximation behaves similarly to a branching process, much like the WP method. We therefore again consider this estimator valid only in the early stages, which for our simulations translates to times prior to the inflection points of the epidemic.

The posterior distribution of R₀ will have the same support as the prior, and placing a discretized prior on R₀ makes computations relatively straightforward, since the normalizing constant of the posterior is easy to implement. In the R implementation in [48], called R₀, the initial prior on R₀ is assumed to be uninformative. Their package focuses on the posterior mode, and much like their implementation of the WP method, uses a discretized version of the serial distribution (which could affect the input value of γ). We again chose to use our own implementation, and report the posterior mean which minimizes the Bayes’ risk.

ID and IDEA: Least square estimation using incidence decay approximations

[44] introduced two simplified models describing the relationship between R₀ and other epidemic parameters in the SIR model. The first of these is the incidence decay (ID) model where

\begin{matrix} \tilde{I} (s) = R_{0}^{s} . \end{matrix}

(1)

In the model, time s is measured in units re-scaled based on the serial distribution. Recall that under the SIR model the serial distribution is exponential with mean 1/γ. We then have the relationship in (1) that $\tilde{I} (s) = I (γ s)$ . As (1) is only valid for a short (and unknown) period of time, [44] proposed a second alternative formulation, where a decay factor d was introduced in order to reflect the often observed outbreak decline. In the incidence decay and exponential adjustment (IDEA) model, the relationship becomes instead

\begin{matrix} \tilde{I} (s) = {(\frac{R_{0}}{{(1 + d)}^{s}})}^{s} . \end{matrix}

(2)

Under the ID model, we can solve (1) to obtain

\begin{matrix} R_{0} = \tilde{I} {(s)}^{1 / s} . \end{matrix}

Of course, this relationship is not valid for real data across all values of s as $\tilde{I} (s)$ is stochastic. To obtain an estimate of R₀ least squares is a natural option, and hence the ID estimator is the minimizer of

\begin{matrix} \sum_{j = 1}^{k} {(log R_{0} - \frac{1}{s_{j}} log \tilde{I} (s_{j}))}^{2}, \end{matrix}

which yields

\begin{matrix} exp {\frac{1}{k} \sum_{j = 1}^{k} \frac{1}{s_{j}} log \tilde{I} (s_{j})} . \end{matrix}

(3)

As noted above, the number of infectious people increases rapidly at the beginning of an outbreak, so a method based on (1) is expected to underestimate R₀. The IDEA model was introduced to overcome this issue. As in the ID model, we solve (2)

\begin{matrix} R_{0} = \tilde{I} {(s)}^{1 / s} {(1 + d)}^{s}, \end{matrix}

and use least squares estimation to obtain its estimate. The IDEA estimator is defined then as the minimizer of

\begin{matrix} \sum_{j = 1}^{k} {(log R_{0} - \frac{1}{s_{j}} log \tilde{I} (s_{j}) - s_{j} log (1 + d))}^{2} . \end{matrix}

Unlike in the ID model, we also need to obtain a minimizer of d to solve the optimization problem, and hence we require k ≥ 2. Minimizing, we obtain

\begin{matrix} exp (\frac{(\sum_{j = 1}^{k} s_{j}^{2}) (\sum_{j = 1}^{k} \frac{1}{s_{j}} log \tilde{I} (s_{j})) - (\sum_{j = 1}^{k} s_{j}) (\sum_{j = 1}^{k} log \tilde{I} (s_{j}))}{k \sum_{j = 1}^{k} s_{j}^{2} - {(\sum_{j = 1}^{k} s_{j})}^{2}}) . \end{matrix}

(4)

Details of these calculations are given in the S1 Appendix. Note that the formula is not valid for k = 1.

Both the ID and IDEA methods are straightforward and estimate R₀ directly, as long as the mean of the serial distribution is known. The model was built under the SIR assumption. In our simulations we examine the effect of misspecification of the underlying epidemic model.

plug-n-play: Maximum likelihood using sequential Monte Carlo for partially observed epidemics

Maximum likelihood is one of the more popular approaches used to estimate unknown parameters in a statistical model. The general idea is to find the parameter set θ which maximizes the likelihood (probability model) evaluated at the observed data. The difficulty for our setting is that our compartmental models (see the discussion of the epidemiological models) rely on data which is unobservable. In particular, the models require that the exact times of infections are known while we observe only daily or weekly counts of infectious individuals. The WP method [42], which also uses maximum likelihood, gets around this issue by creating a simplified model with a likelihood which relies only on observable data. Another alternative, discussed in [49], is to maximize the full likelihood and fill in the unobservables using many Monte Carlo simulations in a way which matches the fixed observable data points. Such an approach is often referred to as “plug-n-play”.

The plug-n-play inferential method of [49] is based on likelihood inference using sequential Monte Carlo of partially observed Markov processes (POMP), also known as hidden Markov models or state-space models. The plug-and-play terminology comes from the fact that inference is based on Monte Carlo simulations from the model and does not require explicit expressions of the transition probabilities, which can be quite complicated. The algorithm for this method has been implemented in the R package POMP [45]. This software package can be accessed from the comprehensive R archive network (CRAN), see also [50]. As mentioned previously, the basic idea is to generate complete epidemic data in a way which matches the observed weekly infectious observations. To simplify the implementation, complete continuous-time data is not generated but rather an approximation is generated with observations of all components at a discretized time-scale Δt (single value selected by the user). These discretized epidemics are generated using sequential Monte Carlo methods. An estimate of θ is then obtained via maximum likelihood using iterated filtering. The implementation in [50] allows for the selection of the model SIR, SEIR, or SEAIR. We refer to [49, 50] for additional details. The algorithm returns estimates of θ, as well as an estimate of R₀ derived via the formula

\begin{matrix} R_{0} = β \frac{Δ t}{1 - e^{- Δ t γ}}, \end{matrix}

regardless of the epidemiological model. We refer to the estimate thus obtained as the plug-n-play estimator. R code detailing our simulations and choices of input values is provided as S1 File.

fullBayes: Bayesian inference for partially observed epidemics

Similar to the plug-n-play approach of the previous section, this is a simulation approach in which the incomplete observed data is replaced with complete data via simulations. The main difference is that the complete data is generated by placing a prior on its distribution in a Bayesian inferential approach. Some examples of epidemiological inference under the Bayesian paradigm are described in [46].

In order to describe the method we need first to introduce some additional notation. We do this for the SEAIR model, as all other models are simplifications of this case. Recall that we have observed infection counts I(t₁), …, I(t_k) at times t₁, …, t_k. Let m denote the vector with jth element given by the cumulative sums $m_{j} = \sum_{i = 1}^{j} I (t_{i})$ . As such, m describes the entirety of the observed data. For a time interval [0, T] the complete epidemic includes much more information. Let $τ_{i}^{E}, i \geq 2$ denote the individual times of exposure. Similarly, $τ_{i}^{A}, i \geq 2; τ_{i}^{I}, i \geq 2, τ_{i}^{R}, i \geq 1$ denote the individual times of transitions into the asymptomatic, infectious, and recovered states, respectively. We assume that m₀ = 1. We also assume that all people who are infected in week j will recover in week j + 1. Furthermore, we assume that the number of exposed and asymptomatic people in week j is also equal to m_j − m_j−1. We let

\begin{matrix} τ = {τ_{i}^{A}, i \geq 2; τ_{i}^{I}, i \geq 2; τ_{i}^{R}, i \geq 1} \end{matrix}

denote the epidemic path which contains all of this information.

As in [46], the first infection $τ_{1}^{I}$ is treated separately as a parameter of the model. Hence a prior $π_{I} (τ_{1}^{I})$ is placed on this variable. Recall that θ denotes the vector of compartmental model parameters; see Table 1, (b) An independent prior is also placed on θ, π(θ), and samples from the posterior distribution $π (θ, τ_{1}^{I}, τ | m) \propto L (θ, τ_{1}^{I} | τ, m) π (θ) π_{I} (τ_{1}^{I})$ are obtained. The marginal distribution of $π (θ, τ_{1}^{I}, τ | m)$ is π(θ|m), which is the posterior distribution of θ given the observable data, and the distribution we are interested in.

We now calculate the likelihood $L (θ, τ_{1}^{I} | τ, m)$ for the SEAIR model.

\begin{matrix} L (τ, m | θ, τ_{1}^{I}) \\ = {\prod_{i = 2}^{m_{k}} \frac{β S (τ_{i}^{E})}{N} (I (τ_{i}^{E}) + A (τ_{i}^{E}))} {\prod_{i = 2}^{m_{k}} σ E (τ_{i}^{A})} {\prod_{i = 2}^{m_{k}} ρ A (τ_{i}^{I})} {\prod_{i = 1}^{m_{k - 1}} γ I (τ_{i}^{R})} \\ \times exp {- \int_{τ_{1}^{I}}^{t_{k}} [β S (t) (I (t) + A (t)) / N + σ E (t) + ρ A (t) + γ I (t)] d t} . \end{matrix}

The joint prior distribution of the unknown rate parameters θ is made up of independent gamma distributions given by Γ(α, k) with mean k/α. We assume that α is the same for the parameters β, σ, ρ, γ, while k varies and if appropriate will be denoted by k_β, k_σ, k_ρ, k_γ. In the simulations we take α = 1 and k_β = k_σ = 3, k_ρ = 2, k_γ = 5. The prior distribution on $- τ_{1}^{I}$ is exponential with rate one, and this is independent from the θ vector. Calculations given in the S1 Appendix give the posterior marginal distributions for $π (τ_{1}^{I} | θ, τ, m)$ and $π (θ | τ, m, τ_{1}^{I})$ all of which have gamma distribution with closed form expressions for the parameters. Some sensitivity analysis to the prior distributions was conducted (see S1 Appendix), and changing the prior did not visibly affect the results.

The general approach we take is now described using the following steps.

Use Markov chain Monte Carlo (MCMC) to simulate from $π (θ, τ, τ_{1}^{I} | m) .$
From Step 1, we obtain a sequence of samples $(τ_{l}, θ_{l}, τ_{1, l}^{I})$ for l = 1, …, b + B from the posterior distribution $π (θ, τ_{1}^{I}, τ | m)$ . Here, b denotes the burn-in period for the MCMC results, and B denotes the number of MCMC samples collected. To obtain an estimate of θ, from the samples l = b + 1, …, b + B, one option is to simply average the values θ_l. Instead, we treat each $(τ_{l}, θ_{l}, τ_{1, l}^{I})$ a sample from the full posterior model, and calculate the posterior mean of ${\bar{θ}}_{l},$ using the formulas given in the S1 Appendix.
Average the posterior means ${\bar{θ}}_{l}, l = b + 1, \dots, b + B$ to obtain an estimate of θ.

The final reported estimate is obtained from the estimate of θ in Step 3 using the appropriate formula in Table 1. In our simulations, we take b = 100 and B = 1000, and refer to the estimator as fullBayes.

The MCMC algorithm we use is the Metropolis-within-Gibbs. Namely, there are three main components to the posterior distribution θ, τ, and $τ_{1}^{I}$ . In the S1 Appendix, the posterior distributions for $π (θ | τ, τ_{1}^{I}, m)$ and $π (τ_{1}^{I} | θ, τ) = π (τ_{1}^{I} | θ)$ are obtained in closed form. Given one observation of $(θ_{l}, τ_{l}, τ_{1, l}^{I}),$ the algorithm generates the next observation as follows.

Sample $τ_{1, l + 1}^{I}$ from the posterior $π (τ_{1}^{I} | θ_{l}) .$
Sample θ_l+1 from the posterior $π (θ | τ_{l}, τ_{1, l + 1}^{I}, m)$
Sample τ_l+1 using a Metropolis step:
- (a)
  Propose a new τ: For each i = 1, …, k
  - (i)
    $τ_{j}^{E}$ is IID uniformly distributed on [t_i−1, t_i] for j = m_i−1, …, m_i
  - (ii)
    $τ_{j}^{A}$ is IID uniformly distributed on [t_i−1, t_i] for j = m_i−1, …, m_i
  - (iii)
    $τ_{j}^{I}$ is IID uniformly distributed on [t_i−1, t_i] for j = m_i−1, …, m_i
  - (iv)
    $τ_{j}^{R}$ is IID uniformly distributed on [t_i−1, t_i] for j = m_i−2, …, m_i−1
- (b)
  Accept the proposal with probability min{1, α} where
  $\begin{matrix} α = \frac{π (τ | θ_{l + 1}, τ_{1, l + 1}^{I}, m) g (τ | τ_{l})}{π (τ_{l} | θ_{l + 1}, τ_{1, l + 1}^{I}, m) g (τ_{l} | τ)} = \frac{L (τ | θ_{l + 1}, τ_{1, l + 1}^{I}, m)}{L (τ_{l} | θ_{l + 1}, τ_{1, l + 1}^{I}, m)}, \end{matrix}$
  noting that with the proposal distribution in (a), we have that g(τ|τ_l)/g(τ_l|τ) = 1. Details are provided in the S1 Appendix

The chain is initialized by sampling θ from its prior distribution.

Real world COVID-19 data

We consider an example for the COVID-19 pandemic in Canada. The first case of COVID-19 was recorded on January 25th, 2020 in Toronto, Ontario [51]. For the first few weeks, isolated cases arrived, however strict contact tracing kept the pandemic from beginning. We therefore do not consider the first four weeks of the pandemic timeline (there were very few cases, and most weeks had zero cases at this stage). In late February, the pandemic took hold and cases began to grow exponentially with community transmission [51]. Approximately one month from this, non-pharmaceutical measures were imposed and most provinces went into some form of lockdown. We therefore do not consider data much longer after lockdown initiation as these measures would decrease the transmission rate.

We estimate R₀ for all of Canada, and for the three most populous provinces, British Columbia (BC), Ontario, and Quebec. In Ontario, strict restrictions were imposed following March break (a one week school break during the winter) which fell around March 20th, 2022. In Quebec, lockdown was imposed around March 24th, and strict public measures were implemented around March 17th in BC. Epidemic data is provided from [52]. Public health mitigation data and dates are provided by [51].

Workflow

The goal of our study is to quantify R₀ estimation in well-specified and misspecified settings, including misspecification of the model and serial distribution. For all models we therefore consider data coming from SIR, SEIR, and SEAIR epidemiological models, and the realworld COVID-19 pandemic in Canada. We study the R₀ estimation methods as follows:

Using synthetic data provided by the SIR, SEIR, and SEAIR models, we apply the following methods for well-specified and misspecified settings

WP method assuming
- serial distribution (SD) is known and set to exponential with correct mean (5 days for influenza 1 and 2 and 5.2 days for COVID-19)
- SD is known and set to exponential with incorrect mean (3 days for influenza 1, 2 and 7 days for influenza 2, and 4.2 and 7.5 days for COVID-19)
- SD is unknown and estimated from a gamma distribution with unknown mean and variance (using a grid search algorithm)
seqB method assuming
- SD has the correct mean (5 days for influenza 1 and 2 and 5.2 days for COVID-19)
- SD has an incorrect mean (3 days for influenza 1, 2 and 7 days for influenza 2, and 4.2 and 7.5 days for COVID-19)
ID and IDEA methods assuming
- SD has the correct mean (5 days for influenza 1 and 2 and 5.2 days for COVID-19)
- SD has an incorrect mean (3 days for influenza 1, 2 and 7 days for influenza 2, and 4.2 and 7.5 days for COVID-19)
plug-n-play and fullBayes methods developed assuming
- SIR
- SEIR (SEIR and SEAIR data only)
- SEAIR (SEAIR data only)

In these examples, the outbreaks are followed for 15 weeks, and this is the timeline given in our results. This timeline is presented only as a comparison to what is happening at the earliest stages. It also, however, improves the comparison between methods. Our comments below focus only on the time period before the inflection point (denoted as a vertical blue line for all methods).

Using real world data, we apply the WP, seqB, ID, and IDEA methods with known SI, using incorrect and true values for COVID-19. We then apply WP, fullBayes and plug-n-play. Estimates are generated using weeks 5 to 10 for Canada, BC, Ontario, and Quebec. The date that lockdown was implemented is indicated by a vertical line for all three provinces. No such line is given for all of Canada, as the measures were handled provincially and not nationally.

When considering the results, recall that seqB and IDEA methods require at least two weeks of observations.

Results

Epidemic simulations

Fig 1 plots the number of individuals in compartment I for each model structure, and each parameter set. The grey lines plot the simulation outcomes while the black lines plot the mean of the simulation data. Although the complete epidemic path is simulated, we assume that only the weekly number of infectious people is actually available. The epidemics are followed for 15 weeks, which covers the first 100 days of an outbreak. Simulation data is recorded at every event time. Weekly data is extracted from each simulation and saved in a data file for use for all of the R₀ estimators employed here. The blue vertical line indicates the point of inflection, where the concavity/curvature of the black line changes. The inflection points are 7, 12, and 9 for influenza 1 parameter values, 6, 7, and 7 weeks for influenza 2, and 3, 5, and 6 weeks for COVID-19, for the SIR, SEIR, and SEAIR models, respectively. These points are used to determine appropriate time intervals for R₀ estimation for each model since R₀ estimates are associated with early exponential growth and can be affected by decreases in the growth rate as the epidemic continues towards and past the point of inflection. Thus, “early in the epidemic” is the same as prior to the point of inflection. In real data, this time point would be unknown. Code and files containing all results have been provided in the S1 File.

R₀ estimates

Using synthetic data from the SIR, SEIR and SEAIR epidemiological models

We summarize our numerical results in plots comparing the average mean squared error (MSE), side-by-side boxplots, as well as tables reporting the median R₀ estimates and its standard deviation. Again, these are all provided in a separate file as S1 File. In the main manuscript, we show only plots comparing the MSE of the various methods for the SIR data for the influenza 1 and 2 examples (Figs 2, 3 and 5), and SEAIR for the COVID-19 example (Figs 4 and 5). The MSE plots do not include the WP method where the serial distribution is estimated, as here the MSE was much too large to report. This can be ascertained from the Tables and the side-by-side boxplots provided in the Supplementary Material (in particular, see Tables 7, 12 and 17 in S1 File).

Fig 2 — The inflection point indicated by the blue dashed vertical line.

Fig 3 — The inflection point indicated by the blue dashed vertical line.

Fig 5 — For both influenza examples the data is SIR while for the COVID-19 example the data is SEAIR. The inflection point indicated by the blue dashed vertical line.

Figs 2 and 3 plot the MSE of the estimated R₀ values and the true R₀ value, for the WP, seqB, ID, and IDEA methods for the influenza 1 and 2 examples, using SIR data, and assuming a known serial interval. These plots provide examples of the well-specified and misspecified cases, using the true and misspecifed values of the known serial interval. Of the methods presented in these plots, seqB performs best, followed by ID. When SEIR and SEAIR data are considered, all estimators have larger MSE. However, our conclusion does not change (se. Sections 1–3 of the additional file included as S1 File) and seqB and ID still perform best. Finally, considering both bias and variance, as shown in the totality of boxplots and tables in the S1 File, our conclusion remains the same.

Fig 4 plots the MSE of the estimated R₀ values and the true R₀ value for the WP, seqB, ID and IDEA methods for the COVID-19 example, using SEAIR data. These plots provide examples of misspecification given incorrect serial interval (serial intervals of 4.2 and 7.5 days are incorrect, and 5.2 days is the true value), and given misspecified data where SEAIR data is used for these methods that relate best to the SIR model framework. Here, again, seqB performs best, followed by ID. This is also true when SIR and SEIR data are considered, and considering bias and variance as presented in the totality of boxplots and tables in the S1 File.

We plot the MSE of R₀ estimates calculated using the fullBayes and plug-n-play methods in Fig 5 for influenza 1 and 2 examples using SIR data and SIR model structure, and for the COVID-19 example using SEAIR data, but with SIR, SEIR and SEAIR model structures. In all cases presented in this figure, we find that plug-n-play outperforms fullBayes. fullBayes performs well in the longterm, but this is not our goal—R₀ estimates are needed early on in the epidemic. A review of all of the cases presented in the S1 File confirm our conclusion.

Computational time is a crucial factor as real-time estimates are desirable. Table 3 shows computational time for the SEIR model for a single data set and using a 1.60GHz/8GB RAM 64-bit operating system, x64-based processor. The results in this work are based on fullBayes with 1000 iterations and plug-n-play with 1000 particles and 10 IF iterations, where IF stands for the iterated filtering algorithm. The fullBayes method was implemented in R, and it is possible that faster implementations can be achieved using a different programming language. In comparison, the real-time methods (WP, seqB, ID, and IDEA) take less than one second each to compute.

Table 3. Computational time for the SEIR model for one data set (IF: Iterated filtering algorithm).

method	iterations	time
fullBayes	1000 iterations	8	minutes
fullBayes	3000 iterations	19.76	minutes
plug-n-play (1000 particles)	5 IF iterations	3.10	minutes
	10 IF iterations	5.82	minutes
	100 IF iterations	58.44	minutes
	1000 IF iterations	9.77	hours

Open in a new tab

Based on the estimator outcomes, our recommendations are as follows. When the serial interval is known, we recommend seqB and ID. We also recommend plug-n-play when the serial interval is known. When the serial interval is unknown, plug-n-play performs the best. Overall, we recommend that a suite of these estimators be used—employ plug-n-play, seqB, and ID. When the serial interval is unknown, a range of serial intervals can be provided to the seqB and ID methods to compare to the plug-n-play results. Practitioners, however, should consider their own preferences as to bias and variability of the estimators. We note here that as this study is focused on data observed weekly, our results may not be applicable to data observed, for example, daily, as the effect of the serial distribution on the results may be different. We also assumed that our data did not suffer from collection bias, under-reporting, and reporting delay. These issues are important, but beyond the scope of this work. However, it is our belief that weekly data, as considered here, is less sensitive to some of these issues than more fine-grained data.

Using real world COVID-19 data

Fig 6 shows plots of estimates of R₀ for all six estimators as applied to real world COVID-19 epidemic data from Canada. The provinces of BC (second column), Ontario (third column), and Quebec (last column) are studied, as well as the entire nation (first column). The WP, seqB, ID and IDEA methods are applied using assumed known serial intervals of 2, 5, and 8 days. We compare our estimates to previously found R₀ estimates (black horizontal lines) of the Canadian pandemic in reference [3], to the Greater Toronto Area (which represents approximately 1/6 of the Canadian population). In summary, seqB, ID and plug-n-play estimates perform best. seqB produces estimates within the range denoted by the black horizontal lines for all serial interval values considered. The same is true for early estimates for plug-n-play. The ID method achieves the lower estimate for all geographic jurisdictions. It is sensitive to the choice of serial interval value, however, and higher serial interval values may drive the estimation to lie above the upper bound. See, for example, the subplots for Canada and Ontario. Given the findings here, we again recommend a combination of seqB, ID, and plug-n-play methods for estimation of R₀.

Conclusion

The basic reproduction number, R₀, is an important parameter for estimation early in an epidemic so that public health interventions can be informed. As many estimators exist, and the assumptions of the estimators as well as their dependency on particular biological estimates (i.e., the serial interval), vary between methods, it is expected that R₀ estimates will differ. It is thus important to understand what estimators provide better outcomes under both true and misspecified conditions. Since respiratory viruses (especially influenza, and coronaviruses i.e., COVID-19 of late) affect the global population every year, we have chosen to study the estimators of R₀ for these types of infections, which are typically modelled using SIR, SEIR and SEAIR compartmental models. We have also chosen to consider weekly case data, as this is characteristic of pandemic influenza and other pandemic respiratory infection outbreak reported data, globally (with the exception of COVID-19, which was reported almost daily in most regions until early 2022).

We have considered six estimators that are commonly used when determining R₀ for any infectious disease outbreak. We discussed the advantages and disadvantages of each method, including dependencies on proper estimates of the serial distribution, and the computational resources needed to run each estimator. Our simulations consider a variety of well- and missspecified settings. Briefly, we find that the WP method can provide close estimates to the true R₀ value if the SD is known, but when the SD is unknown, the method suffers greatly (see Tables 7, 12 and 17 in S1 File). The seqB method performs well given SIR data but underperforms if there is any misspecification; the ID and IDEA methods, are useful due to their simplicity. ID outperforms the IDEA model, but ID estimates of slightly higher MSE copared to seqB. fullBayes estimates can have large variabilities, and are sensitive to the underlying model structure, but the plug-n-play method provides consistent estimates even with only one week of data.

Considering both bias and variability, as well as misspecification, we find that the performance of the seqB, ID, and plug-n-play estimators is best, providing estimates of R₀ that are closest to the true value under both correctly specified and misspecified cases. Notably, plug-n-play does not require prior knowledge of the serial distributions. However, if the serial interval is known, seqB and ID outperform plug-n-play. Furthermore, seqB and ID require less computational time, and are easier to implement.

The choice of R₀ estimator is ultimately up to the practitioner. In our analysis we have shown that some R₀ estimators can be greatly affected by even a small level of misspecification. Given that biological certainty may be lacking at the beginning of an infectious disease outbreak, the number of disease stages needed in a model and a proper distribution of the serial interval may not be known. This means that a range of R₀ results will ensue, and the accuracy of the estimates will be unclear. We therefore recommend that a suite of estimators be used when estimating R₀. Given the current study results, we recommend that seqB, ID, and plug-n-play methods be included in any suite. plug-n-play does not require knowledge of the serial distribution and provides close to true estimates under different model structures quickly. seqB and ID should be implemented using a range of known serial intervals, to provide sensitivity analysis and confidence in R₀ estimation. We do however note that plug-n-play may be difficult to implement for some, since the R package is quite technical [45].

Daily case reporting data has been available for the most recent COVID-19 pandemic. Daily data was not provided during the 2009 H1N1 pandemic, however. Furthermore, there may be issues with daily reporting (such as periodicity, reporting delay) whereby public health may choose to use weekly reporting data over daily data as the weekly data would be more reliable. We have thus only considered weekly case reporting data in this study as it is expected that weekly case reporting data can be expected in many future epidemics and pandemics. It is important to note that First Few Hundred (FF100) studies, whereby the first few hundred cases of a new virus are followed in detail at the beginning of an infectious disease outbreak, have been implemented during the 2009 H1N1 and COVID-19 pandemics [53–60]. In these cases the serial distribution, and the need to consider exposed and/or asymptomatic periods of infection can be quickly determined, enabling realization of earlier and more certain estimates of R₀ early on. Given that First Few Hundred protocols are not implemented in much of the globe, weekly case report data however may still be considered the norm for future pandemics.

In our current study we have assumed perfect data with no unobserved infections, no reporting delay, and no data collection bias. These issues are intuitively expected to affect R₀ estimates. We venture to continue our study of R₀ estimation considering these aspects in our epidemiological data sets.

In summary, our work has various strengths, and some limitations. A unique strength of our work is the study of model misspecification. We are unaware of previous work in this direction. We did not consider all possible estimators of R₀, but focused on those most commonly used in the field of Infectious Disease Modelling. We selected a variety of influenza and COVID-19 scenarios for our simulations, which provide considerable information on the behaviour of these estimators. We did not investigate other infectious diseases, such as Ebola, which could potentially have quite different parameters. Our overall recommendations are however, general, and are therefore widely applicable. Lastly, we considered only the scenario of perfect data. Alternative settings are beyond the scope of this work, however, this, along with other infectious diseases and potentially more estimators will be considered in future.

Supporting information

S1 File. A supplementary file contains additional simulations results (both tables and boxplots) as well as some further technical details.

(PDF)

Click here for additional data file.^{(641.6KB, pdf)}

S1 Appendix

(PDF)

Click here for additional data file.^{(210.9KB, pdf)}

Data Availability

Yes - all data are fully available without restriction; The code and data can be found at https://github.com/hannajankowski/R0_estimators_data.

Funding Statement

We note that our work was supported by the Natural Science and Engineering Research Council of Canada. The funders had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript. The authors received no salary from funders.

References

1. Zhao S, Lin Q, Ran J, Musa SS, Yang G, Wang W, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International Journal of Infectious Diseases. 2020;92:214–217. doi: 10.1016/j.ijid.2020.01.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Tuite AR, Fisman DN. Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic. Annals of Internal Medicine. 2020;172(8):567–568. doi: 10.7326/M20-0358 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Knight J, Mishra S. Estimating effective reproduction number using generation time versus serial interval, with application to COVID-19 in the Greater Toronto Area, Canada. Infectious Disease Modelling. 2020;5:889–896. doi: 10.1016/j.idm.2020.10.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Mellan TA, Hoeltgebaum HH, Mishra S, Whittaker C, Schnekenberg RP, Gandy A, et al. Report 21: Estimating COVID-19 cases and reproduction number in Brazil. medRxiv. 2020. [Google Scholar]
5. Hilton J, Keeling MJ. Estimation of country-level basic reproductive ratios for novel Coronavirus (SARS-CoV-2/COVID-19) using synthetic contact matrices. PLoS Computational Biology. 2020;16(7):e1008031. doi: 10.1371/journal.pcbi.1008031 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Price DJ, Shearer FM, Meehan MT, McBryde E, Moss R, Golding N, et al. Early analysis of the Australian COVID-19 epidemic. ELife. 2020;9:e58785. doi: 10.7554/eLife.58785 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Nishiura H, Chowell G, Safan M, Castillo-Chavez C. Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) 2009. Theoretical Biology and Medical Modelling. 2010;7(1):1–13. doi: 10.1186/1742-4682-7-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Chowell G, Echevarría-Zuno S, Viboud C, Simonsen L, Tamerius J, Miller MA, et al. Characterizing the epidemiology of the 2009 influenza A/H1N1 pandemic in Mexico. PLoS Med. 2011;8(5):e1000436. doi: 10.1371/journal.pmed.1000436 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Tuite AR, Greer AL, Whelan M, Winter AL, Lee B, Yan P, et al. Estimated epidemiologic parameters and morbidity associated with pandemic H1N1 influenza. CMAJ. 2010;182(2):131–136. doi: 10.1503/cmaj.091807 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Paine S, Mercer G, Kelly P, Bandaranayake D, Baker M, Huang Q, et al. Transmissibility of 2009 pandemic influenza A (H1N1) in New Zealand: effective reproduction number and influence of age, ethnicity and importations. Eurosurveillance. 2010;15(24):19591. doi: 10.2807/ese.15.24.19591-en [DOI] [PubMed] [Google Scholar]
11. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, et al. Pandemic potential of a strain of influenza A (H1N1): early findings. Science. 2009;324(5934):1557–1561. doi: 10.1126/science.1176062 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Pourbohloul B, Ahued A, Davoudi B, Meza R, Meyers LA, Skowronski DM, et al. Initial human transmission dynamics of the pandemic (H1N1) 2009 virus in North America. Influenza and Other Respiratory Viruses. 2009;3(5):215–222. doi: 10.1111/j.1750-2659.2009.00100.x [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Chowell G, Blumberg S, Simonsen L, Miller MA, Viboud C. Synthesizing data and models for the spread of MERS-CoV, 2013: key role of index cases and hospital transmission. Epidemics. 2014;9:40–51. doi: 10.1016/j.epidem.2014.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Hsieh YH. 2015 Middle East respiratory syndrome coronavirus (MERS-CoV) nosocomial outbreak in South Korea: insights from modeling. PeerJ. 2015;3:e1505. doi: 10.7717/peerj.1505 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Cauchemez S, Fraser C, Van Kerkhove MD, Donnelly CA, Riley S, Rambaut A, et al. Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility. The Lancet Infectious Diseases. 2014;14(1):50–56. doi: 10.1016/S1473-3099(13)70304-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, et al. Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science. 2003;300(5627):1961–1966. doi: 10.1126/science.1086478 [DOI] [PubMed] [Google Scholar]
17. Anderson RM, Fraser C, Ghani AC, Donnelly CA, Riley S, Ferguson NM, et al. Epidemiology, transmission dynamics and control of SARS: The 2002–2003 epidemic. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2004;359(1447):1091–1105. doi: 10.1098/rstb.2004.1490 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Wang W, Ruan S. Simulating the SARS outbreak in Beijing with limited data. Journal of Theoretical Biology. 2004;227(3):369–379. doi: 10.1016/j.jtbi.2003.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Dye C, Gay N. Modeling the SARS epidemic. Science. 2003;300(5627):1884–1885. doi: 10.1126/science.1086925 [DOI] [PubMed] [Google Scholar]
20. Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. Journal of the Royal Society Interface. 2005;2(4):281–293. doi: 10.1098/rsif.2005.0042 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Diekmann O, Heesterbeek JAP, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology. 1990;28(4):365–382. doi: 10.1007/BF00178324 [DOI] [PubMed] [Google Scholar]
22. Diekmann O, Heesterbeek J, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface. 2010;7(47):873–885. doi: 10.1098/rsif.2009.0386 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical Biosciences. 2002;180(1-2):29–48. doi: 10.1016/S0025-5564(02)00108-6 [DOI] [PubMed] [Google Scholar]
24. Vegvari C, Abbot S, Ball F, et al. Commentary on the use of the reproduction number R during the COVID-19 pandemic. Statistical Methods in Medical Research. 2021. doi: 10.1177/09622802211037079 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Heesterbeek J, Dietz K. The concept of R0 in epidemic theory. Statistica Neerlandica. 1996;50(1):89–110. doi: 10.1111/j.1467-9574.1996.tb01482.x [DOI] [Google Scholar]
26. Blumberg S, Lloyd-Smith JO. Comparing methods for estimating R0 from the size distribution of subcritical transmission chains. Epidemics. 2013;5(3):131–145. doi: 10.1016/j.epidem.2013.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Gallagher S, Chang A, Eddy WF. Exploring the nuances of R0: Eight estimates and application to 2009 pandemic influenza. arXiv preprint arXiv:200310442. 2020.
28. Farrington CP, Kanaan MN, Gay NJ. Estimation of the basic reproduction number for infectious diseases from age-stratified serological survey data. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2001;50(3):251–292. [Google Scholar]
29. White LF, Moser CB, Thompson RM, Pagano M. Statistical Estimation of the Reproductive Number from Case Notification Data. American Journal of Epidemiology. 2021. doi: 10.1093/aje/kwaa211 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Chowell G, Nishiura H, Bettencourt LM. Comparative estimation of the reproduction number for pandemic influenza from daily case notification data. Journal of the Royal Society Interface. 2007;4(12):155–166. doi: 10.1098/rsif.2006.0161 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Biggerstaff M, Cauchemez S, Reed C, Gambhir M, Finelli L. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature. BMC Infectious Diseases. 2014;14(1):1–20. doi: 10.1186/1471-2334-14-480 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Alene M, Yismaw L, Assemie MA, Ketema DB, Gietaneh W, Birhan TY. Serial interval and incubation period of COVID-19: a systematic review and meta-analysis. BMC Infectious Diseases. 2021;21(1):1–9. doi: 10.1186/s12879-021-05950-x [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Cowling BJ, Fang VJ, Riley S, Peiris JM, Leung GM. Estimation of the serial interval of influenza. Epidemiology (Cambridge, Mass). 2009;20(3):344. doi: 10.1097/EDE.0b013e31819d1092 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Vink MA, Bootsma MCJ, Wallinga J. Serial intervals of respiratory infectious diseases: a systematic review and analysis. American Journal of Epidemiology. 2014;180(9):865–875. doi: 10.1093/aje/kwu209 [DOI] [PubMed] [Google Scholar]
35. Park JE, Ryu Y. Transmissibility and severity of influenza virus by subtype. Infection, Genetics and Evolution. 2018;65:288–292. doi: 10.1016/j.meegid.2018.08.007 [DOI] [PubMed] [Google Scholar]
36. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1992. [Google Scholar]
37. Allen LJ, Brauer F, Van den Driessche P, Wu J. Mathematical epidemiology. vol. 1945. Springer; 2008. [Google Scholar]
38. Keeling MJ, Rohani P. Modeling infectious diseases in humans and animals. Princeton university press; 2011. [Google Scholar]
39. Heffernan JM, Wahl LM. Monte Carlo estimates of natural variation in HIV infection. Journal of Theoretical Biology. 2005;236(2):137–153. doi: 10.1016/j.jtbi.2005.03.002 [DOI] [PubMed] [Google Scholar]
40. Heffernan JM, Wahl LM. Natural variation in HIV infection: Monte Carlo estimates that include CD8 effector cells. Journal of Theoretical Biology. 2006;243(2):191–204. doi: 10.1016/j.jtbi.2006.05.032 [DOI] [PubMed] [Google Scholar]
41. Ma J. Estimating epidemic exponential growth rate and basic reproduction number. Infectious Disease Modelling. 2020;5:129–141. doi: 10.1016/j.idm.2019.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. White LF, Pagano M. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Statistics in Medicine. 2008;27:2999–3016. doi: 10.1002/sim.3136 [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Bettencourt LMA, Riberio RM. Real time Bayesian estimation of the epidemic potential of emerging infectious diseases. PLOS ONE. 2008;3. doi: 10.1371/journal.pone.0002185 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Fisman DN, Hauck TS, Tuite AR, Greer AL. An IDEA for Short Term Outbreak Projection: Nearcasting Using the Basic Reproduction Number. PLOS ONE. 2013;8. doi: 10.1371/journal.pone.0083622 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. King AA, Ionides EL, Breto C. Statistical Inference for Partially Observed Markov Processes; 2017. R package version 1.12. [Google Scholar]
46. O’Neill PD, Roberts GO. Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical Society Series A. 1999;162:121–129. doi: 10.1111/1467-985X.00125 [DOI] [Google Scholar]
47. Obadia T, Boëlle P. R0: Estimation of R0 and Real-Time Reproduction Number from Epidemics; 2015. R package version 1.2-6. [Google Scholar]
48. Obadia T, Haneef R, Boëlle P. The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks. BMC Med Inform Decis Mak. 2017;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. He D, Ionides EL, King AA. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. JR Soc Interface. 2010;7. doi: 10.1098/rsif.2009.0151 [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Nguyen D, Ionides EL, King AA. Statistical Inference for Partially Observed Markov Processes via the R Package pomp. Journal of Statistical Software. 2016;69. [Google Scholar]
51.Akanteva A, Dick D, Heffernan JM. A database of public healthcare mitigation and relaxation during the COVID-19 pandemic, for all Canadian provinces; 2022. Database access—available on request.
52. Berry I, Soucy JPR, Tuite A, Fisman D. Open access epidemiologic data and an interactive dashboard to monitor the COVID-19 outbreak in Canada. Cmaj. 2020;192(15):E420–E420. doi: 10.1503/cmaj.75262 [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Black AJ, Geard N, McCaw JM, McVernon J, Ross JV. Characterising pandemic severity and transmissibility from data collected during first few hundred studies. Epidemics. 2017;19:61–73. doi: 10.1016/j.epidem.2017.01.004 [DOI] [PubMed] [Google Scholar]
54.World Health Organization and others. The First Few X cases and contacts (FFX) investigation protocol for coronavirus disease 2019 (COVID-19). World Health Organization; 2020.
55. McLean E, Pebody R, Campbell C, Chamberland M, Hawkins C, Nguyen-Van-Tam J, et al. Pandemic (H1N1) 2009 influenza in the UK: clinical and epidemiological findings from the first few hundred (FF100) cases. Epidemiology & Infection. 2010;138(11):1531–1541. doi: 10.1017/S0950268810001366 [DOI] [PubMed] [Google Scholar]
56. Boddington NL, Charlett A, Elgohari S, Walker JL, McDonald HI, Byers C, et al. COVID-19 in Great Britain: epidemiological and clinical characteristics of the first few hundred (FF100) cases: a descriptive case series and case control analysis. MedRxiv. 2020. [Google Scholar]
57. van Gageldonk-Lafeber AB, van der Sande MA, Meijer A, Friesema IH, Donker GA, Reimerink J, et al. Utility of the first few 100 approach during the 2009 influenza A (H1N1) pandemic in the Netherlands. Antimicrobial Resistance and Infection Control. 2012;1(1):1–7. doi: 10.1186/2047-2994-1-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.England HPA. “First Few Hundred” Project Epidemiological Protocols for Comprehensive Assessment of Early Swine Influenza Cases in the United Kingdom; 2009.
59. Ghani A, Baguelin M, Griffin J, Flasche S, van Hoek AJ, Cauchemez S, et al. The early transmission dynamics of H1N1pdm influenza in the United Kingdom. PLoS Currents. 2009;1. doi: 10.1371/currents.RRN1130 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Pandemic Influenza. Australian Health Management Plan for Pandemic Influenza; 2014. https://www1.health.gov.au/internet/main/publishing.nsf/Content/ohp-ahmppi.htm.

PLoS One. doi: 10.1371/journal.pone.0269306.r001

Decision Letter 0

Inés P Mariño

8 Oct 2021

PONE-D-21-22343Estimating the basic reproduction number at the beginning of an outbreak under incomplete dataPLOS ONE

Dear Dr. Heffernan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Inés P. Mariño, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In “Estimating the basic reproduction number at the beginning of an outbreak under incomplete data” by Boonpatcharanon and colleagues, different methods to estimate R0, the basic reproduction number, are compared considering the first 100 days of an epidemic. The authors apply frequentist and Bayesian approaches to estimate R0 under three different infection models: SIR, SEIR and SEAIR. These models differ in the allowed transitions between states of individuals (susceptible, exposed, (asymptomatic/symptomatic) infected, recovered). The authors conclude with a recommendation but also highlight that it always depend on the data which approach to choose; they recommend sensitivity analyses.

The aim of the study is described and motivated. The manuscript is well written. However, there are some issues the authors should consider to facilitate the readability of the manuscript.

Major issues:

1) “Incomplete data” sounds like missing data related to counts of, e.g., infected individuals. Should “incomplete” also comprise incomplete knowledge/information on transmission and course of infection? Please clarify (in the manuscript and probably in the title). Additionally, please add information on the required (observed) data underlying the R0 estimation/calculation.

2) Please provide real data applications to support the assumptions in the simulation study and to illustrate the investigated methods on real data. For influenza, weekly case reports are published for several seasons, for example by the ECDC (European Centre for Disease Prevention and Control), the Government of Canada or the CDC (Centers for Disease Control and Prevention).

3) Please provide the (documented) source code for the investigation to redo the analysis (including data simulation and figure/table preparation).

4) Introduction:

a. Could the authors elaborate more on data misspecification? Maybe through an own paragraph including examples of misspecifications and their possible influence on the R0 estimates? This issue is related to the reliability of an R0 estimation in the epidemic situation itself. The benefit/value of the R0 estimation depends heavily on the population under investigation, i.e. whether this population is a random sample of the total population or a for the total population not representative subpopulation (i.e. comprising, e.g., more or fewer infected individuals or different transmission probabilities than in the total population). Issues to be considered are for example the test strategy (which individuals are tested or must provide a test result; related to the number of unreported cases) and the test quality (reliable test results).

b. As the study is about the early stage of an epidemic (first 15 weeks), could the authors additionally include this time frame into the considerations about misspecification? In case of a “new” disease, the knowledge on which, e.g., R0 estimation is based is limited in the early days. Could the authors please highlight the important issues unique to the beginning of an epidemic/pandemic – compared to the subsequent time? Besides in the beginning of an epidemic, is it also possible to consider a time point within an epidemic with a very low number of infected individuals, e.g. between two waves or two seasons (in case of seasonality as for influenza)? Please clarify “early stage”. Please add a motivation for considering only the first 15 weeks.

c. Please add a motivation for the decision to consider SIR, SEIR and SEAIR only.

5) Materials and Methods:

a. Please provide the underlying assumptions related to the data for the investigation (i.e. no unobserved infections, no reporting delay, …).

b. Please include a section about the simulation study. The approach description should not be part of the result section and the parameter choice should not be part of the method description. Please aggregate.

c. In some parts, methods are provided in the results section and vice versa. Please check and separate.

d. Lines 64-77: Please provide a supporting figure for illustration, if possible. Furthermore, please consider the inclusion of Table 1 in this figure and, if possible, remove Table 1.

e. Line 97: Please introduce the methods briefly (including the reference to the respective subsection) and provide the abbreviations used throughout the manuscript. Then, refer to Table 3. Otherwise, the subsequent sections cannot be followed easily.

f. Line 103: Please consider to describe “serial distribution” earlier in the manuscript because it was already used earlier. Suggestion: Provide a section with definitions needed for the models (SIR, SEIR, SEAIR). Furthermore, please consider a summarisation of all parameters that are set to some selected values in the investigation. A table (or subheadings after re-ordering) might help.

g. Part 0.2.1

i. Please check notations and definitions. For example:

1. Line 133: “or” instead of “, or,“.

2. Please unify kappa and k.

3. Line 135: “both” does not fit to “the method”, which is one method. Please check.

4. Lines 137/138: Please add the origin for “number of days or weeks”.

5. Line 139: Please clarify min(kappa, t). What is t?

6. Line 139: Please clarify the relation between I(t – t_j) and I(t), if there is one, otherwise please define I(time difference / interval).

7. Line 159: Please clarify “built-in alternative optimisation”. Where is it “built-in”?

ii. Please provide p(t_j) for all models.

iii. Lines 149/150: Please explain the limitation.

iv. Lines 150: Please provide the section reference for the simulations.

v. Instability issues (lines 156-165):

1. Might the instability be an indicator for non-adequateness of the applied method?

2. Please consider to include the observed instability issues in the result section to clearly separate methods and results (introduction of new subsection headings might help). Is it possible to quantify these issues?

3. Was the implementation of the grid search approach in comparison to the original implementation validated? If so, how?

h. Part 0.2.2:

i. As long intervals without new infections are problematic for this approach, this approach might be better suited for situations after the start of a new “wave” with rapidly increasing numbers of newly detected infections. Did the authors investigated scenarios, in which the numbers only increased slowly, or were the scenarios adapted to this method? In the latter case, a comparison in a non-adequate scenario would be of interest to guide future method applications. Especially in the beginning of a pandemic, such situations might occur.

ii. Lines 214/215: Please state the adaptations in more detail. Was the implementation in comparison to the original implementation validated? If so, how?

i. Part 0.2.3:

i. Lines 233/234: Please clarify “beginning of an outbreak”. The authors state that the number of infectious individuals rapidly decreases in the beginning, but in the beginning of a new disease few individuals are infected/infectious and the number of infected/infectious people increase. Otherwise, I would expect that R0 is overestimated as the estimate does not decrease fast enough. Please clarify.

ii. Lines 244/245: Please provide a reference to the specifications of the misspecification.

j. Part 0.2.4:

i. Please provide (throughout the manuscript) names of R packages besides the reference.

ii. Line 275: Please explain “particle”.

iii. Equation after line 279: R0 is probably not a single value as delta_t is probably a sequence. Please check and adapt, if necessary.

iv. Line 280: Please clarify where “regardless of the epidemiological model” relates to (and what is model-dependent).

v. Line 282: Please check the reference to the appendix. Appendix 1.3 is “Least square estimation for the IDEA method”. Please provide more comments in the source code (Appendix 1.4) and please check line breaks to facilitate reading.

k. Part 0.2.5:

i. Line 292: Please provide the respective simplifications in the subsequent derivations.

ii. Lines 294/295: Please describe m more clearly. Please explain additionally (besides the equation) m_j in words. Definition of m0 should be provided with the definition of m_j.

iii. Line 295: Please clarify “epidemic” and “much more information”.

iv. Lines 296/297: Please check the conditions for i.

v. Lines 299/300: What is the impact, if an individual needs more than one week to recover? What is the motivation for one week? Please add.

vi. Line 334: “obtained” instead of “obtain”

6) Results:

a. Lines 351-353: Please additionally consider the case that the population studied is not a random sample of the target population. Alternatively, please clearly state (when defining the study design) the assumption that the populations studied is a random sample and discuss this assumption as limitation.

b. Lines 377 to 380: Does the results change if the other methods are also only applied to the subset of samples? Please comment.

c. Lines 380/381: Please define bias and variability. Did the authors also consider a joint measure of bias and variability?

d. Line 382: A figure cannot study. Please rephrase throughout the manuscript.

e. Please consider to add further subsections to provide more guidance to the reader.

f. Line 405: Computation time is provided but the related section follows later-on. Please reorder.

g. Part 1.1: Could the authors please provide computational aspects for all models?

7) Discussion:

a. Please provide a paragraph about strength and limitations.

b. Please compare the results (at least in parts) with other studies.

8) Abbreviations, parameter, model names, methods names and other short forms:

a. Please introduce all in the main part of the manuscript. E.g. ODE, MCMC, IID, S0, I0, S, I, S(t), SD, … are missing.

b. Please check the usage for consistency, e.g. S versus S(t).

c. Please state which parameter are 0 at t=0.

9) Figures:

a. Please provide axis titles at the respective axis and not in the description.

b. In case the legend only comprises one symbol/colour differing between figure panels, please consider providing this information as panel title above the respective plot panel. This also introduces shorter description.

c. Please introduce all abbreviations, parameter and model names in the figure description.

d. In case of boxplots, please provide complete boxplots. In case of a needed zoomed-in boxplot, the complete one should be provided in the supplement.

e. Please provide information in the description of the boxplots so that the reader is able to identify scenarios with misspecifications.

10) Tables:

a. Please introduce all abbreviations, parameter, model names and method names in the table description.

b. Please provide a description that allows to understand the table without the part in the main manuscript where the table is cited for the first time.

Minor issues:

1) Section numbering in the main part: Please remove the leading “0.”. Please check the complete numbering and doubling of section headings, e.g. “Results” and “1. Results” and supporting information starts with 1.2.

2) Please consider to avoid “flu” and to use “influenza” throughout the manuscript.

3) Materials and methods:

a. Line 58: Please clarify “approximately”.

b. Line 77: It should probably be I(0) = 1 (first round bracket is misplaced).

c. Line 85: Please provide information on the meaning of “inflection” in lay terms (i.e. related to the course of infection/pandemic).

d. Line 126: Please provide some additional information on the computer.

e. Part 0.2.2:

i. Equation after line 192: To stick to the notation throughout the manuscript, please consider replacing s by t, i.e. S(t) and dt.

ii. Line 194: Please consider to replace | by “given”, i.e. “conditional distribution of I(t_j+1) given I(t_j) and R_0”. This would facilitate reading.

iii. Line 196: Please introduce N0.

f. Part 0.2.3:

i. Please introduce s and d.

ii. Line 230: Please delete “obvious”.

iii. Equation (4): Please consider to use additional brackets so that it is clear to which the sum sign belongs.

iv. Lines 243/244: “However, …” instead of “…, however.”.

4) Figure 1:

a. Lines 84/85: Please consider to remove parts of figure descriptions from the main text that should be part of the description accompanying the respective figure itself, i.e. below the figure panel(s).

b. Please introduce the meaning of “inflection”.

5) Table 2: Please clarify the meaning of Y_i (exponentially distributed with a mean of 1). Later-on, it is a mean of 1/gamma (provided as an example). Or other natural numbers. Please consider a consistent notation.

6) Supporting information:

a. Part 1.2:

i. Please provide references for the models and their chosen parametrisation.

ii. Please introduce all parameter in more detail, even if they are introduced in the main text. Providing all definitions facilitates reading. The authors could consider to introduce a separate section within 1.2 for definitions. An alternative might be to provide the definitions in the main text, e.g. in a table.

b. Part 1.3:

i. Please provide the partial derivatives and few more steps of the solving process.

Reviewer #2: This manuscript describes an interesting simulation study comparing 6 different methods of estimating the R0 coefficient (WP, secB, ID, IDEA, plug-n-play and fullBayes). The data are simulated via three different compartmental models, SIR, SEIR and SEAIR. Methods are intended to be tested both under the well-specified model and parameters and under the miss-specified ones. The quality of this work is the large range of methods tested, from the more classical and simplified models to the fully Bayesian ones. However, while the idea of comparing the performance of the methods is good and promising and the spectrum of methods compared is broad, the study and manuscript suffer from several weaknesses.

The biggest problem is a misunderstanding of two random duration variables involved in the epidemiological analysis of a pandemic: the infectious period and the serial interval. The first is the random length of time a subject remains infectious, the second is the random time between when the infector develops symptoms and when the infected develops symptoms in a chain of transmission (see for example: Zhou X-H, You C, et al, 2020, the Lancet). These two intervals are in general quite different in mean; for instance for COVID-19 infection the mean infectious period is around 8-10 days (He X, Lau EHY, et al 2020, Nature; Zhou X-H, You C, et al, 2020, the Lancet) while the mean serial interval is around 4-5 days (Nishiura et al 2020, IJID; Du et al, 2020, CDC; Zhou X-H, You C, et al, 2020, the Lancet). The mix-up between these two intervals (and distributions) is evident on page 4 when it says: “The serial distribution is the distribution of the random amount of time that an individual is infected..”.

This inaccuracy has consequences for the simulation study. In fact, data generated according SIR model of parameters beta and gamma have by construction mean infectious period of 1/gamma (fixed at 5 days for simulations). The problem arises when methods adopted for R0 estimation depend on the serial interval distribution, instead of the infectious period distribution, which is the case of the WP (White and Pagano 2007), ID and IDEA (Fisman 2013). In these cases models will not be well specified even when authors present them as being so. This can explain why in Fig 5, for example, WP, ID and IDEA methods (lines 1,3 and 4) seem to perform better when the gamma parameter is incorrect (right panel) than when it is correct (left panel). And comparing Fig 5 and 6 for the same methods, performance is improved when the model is miss-specified (R0 estimated assuming SIR with SEIR data). The authors need to address this point first.

A second point is inherent in the design of the simulation and the presentation of the results. The data are indeed simulated under a single choice of parameters, which may not be sufficient to draw general conclusions. Here, the parameters are chosen with respect to a given infection (influenza). It seems to me that adding other parameter choices would add value to the study. In addition, attention should again be paid to the fact that the gamma parameter do refer to the distribution of the infection period and not to the distribution of the serial interval.

The results are presented by boxplots, which is a good idea. However, on the one hand, some graphs are repeated several times (e.g. the WP case (SD = exp mean 5/7) with the SIR data is repeated 3 times in fig 2, 3 and 5), and I believe that a way could be found to avoid this. On the other hand, the results should also be presented numerically in tables, with for each setting the specification of the bias and variability of the simulated results at the inflection point, or with a summary of both (mean square error).

An application to real data would also be interesting, in order to see how different R0 estimations the considered methods can produce on observed incidence data. I would personally be interested in seeing these results for COVID-19 outbreak.

Finally, a thorough review of the English language is necessary.

Specific points:

Page 2, line 26. ….”serial interval, infectious period… “. Please define all quantities when they are introduced

Page 3, line 74. Here gamma is set to 1/3, while in the Result section it is set to 1/5 (or 7/5 with weekly data).

Page 3, line 100. “ODE epidemiological model “. Please define

Page 12, line 391: “Note that here the mean of the serial distribution was incorrect by only two days….”. Here authors don’t comment the fact that performance is better with the wrong serial distribution (see my comment above). In addition the amount of miss-specification (2 days) is chosen by the authors and they can modify it if it seems not enough to show some effect. I recommend testing a range of parameter choices.

Page 15, line 490-91. “Asymptomatic infected (infected, no symptoms, not infection)”. Replace with : (infected, no symptoms, infection)

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Miriam Kesselmeier

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Jun 17;17(6):e0269306. doi: 10.1371/journal.pone.0269306.r002

Author response to Decision Letter 0

25 Jan 2022

We thank the reviewers for their comments. We have added new examples to our study. We have also revised the manuscript for enhanced clarity and understanding. We have provided a detailed response to reviewers as an attachment.

Attachment

Submitted filename: ProjectInc_revisions_replytoreviewers.pdf

Click here for additional data file.^{(175.1KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0269306.r003

Decision Letter 1

Inés P Mariño

4 Mar 2022

PONE-D-21-22343R1Estimating the basic reproduction number at the beginning of an outbreakPLOS ONE

Dear Dr. Heffernan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we consider that the manuscript is much improved but still does not fully meet PLOS ONE’s publication criteria. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised by one of the reviewers.

Please submit your revised manuscript by Apr 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Inés P. Mariño, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have greatly improved their manuscript. The additional explanations facilitate reading very much.

However, some issues remain to be solved or should at least be considered:

1) Source code: For reasons of reproducibility, please make the complete source code including the C++ code available.

2) Abbreviations:

a. Please introduce all abbreviations (e.g., SI in the text and IF in table 3).

b. Please consider avoiding some abbreviations (e.g., SI or SD) for a better readability of the text. SI and SD are only used in some places and, hence, might be avoided. This also applies to IF in line 498.

3) Citations:

a. Please introduce space between authors and “(Year)”, e.g. “Anderson and May (1992)” or “Allen et al. (2008)”.

b. Please remove “[“ and “]” if the citation is a real part of the text and not only a reference to the reference section. Examples: Lines 23/24 or line 54. The brackets should remain, e.g., in lines 80/81 or line 134.

4) Reference to supplement: Could the authors please provide a more detailed reference to the supplement in the main text, e.g. “see supplemental section 1.1” or “Fig. 7”?

5) Supplement, lines 608-611: Please provide more calculation steps, such that the reader can follow the derivations more easily. This is not for understanding the method but for enabling an easy tracking of the derivations.

6) Please stick to one notation – either influenza 1 and influenza 2 or influenza one and influenza two (main part and supplement).

7) Abstract: Could the authors please add some results and a conclusion to the abstract?

8) Introduction: The authors clarified „early stage“. However, I would like to come back to a point I made on the previous manuscript version. The authors state, that they only consider a single wave. Does this imply that (i) the complete pandemic only runs for one wave, (ii) each wave is considered as a new pandemic or (iii) that only the first wave is considered in this manuscript? This question arises, as the authors seem to model influenza seasons as different pandemics (or as different waves). At least for me, a detail is missing. Is there a difference between “wave” and “season” in the application of methods? Please clarify.

9) Methods:

a. Language:

i. Line 105: “equal” instead of “equation“

ii. Line 143: “for our models” or “for the models” instead of “for the our models”

iii. Lines 156/157: Please rephrase “The main difficulty in estimation is that complete data is unavailable for the full epidemiological model is unavailable.” Delete “is unavailable” at the end of the sentence or replace “for” by “if”?

iv. Line 190: Please delete “is a”.

v. Line 196: “)” missing after t_(j+1).

vi. Line 239: I would like to emphasize, that the word “given” instead of “|” would facilitate reading as it is provided in the text and not in a formula. Furthermore, there might be readers that are not so familiar with mathematical notations, but want to read the method section.

vii. Line 353: “…” between “b+1” and “b+B” probably missing.

viii. Line 357: “obtain” instead of “obtained”

ix. Line 363: “obtained” instead of “obtain”

x. Lines 370-373: Is “…”necessary?

b. Format:

i. Lines 106-108: Please consider providing each equation in a separate line instead of in line with the remaining text.

ii. Lines 153, 296, 348: Section number is missing. Please check throughout the manuscript.

c. Definitions / settings:

i. Line 105: Please consider “for all time points t >= 0” instead of “for all t >= 0”.

ii. Line 192: Please consider “t_0 = 0 (beginning of the pandemic), t_1, …” (or something similar) instead of “t_0 = 0, t_1, …” to provide the time origin.

iii. Equation between lines 237 and 238 as well as line 238: Above, t >= 0 is time in the process. Maybe I am missing something, but I would expect an subscript at t as it is related to I(t_(j+1))

iv. Equation between lines 276 and 277: Probably, s_j is similarly defined as t_j. Please state this briefly.

v. Line 335: Please introduce theta (again).

vi. Line 351: Please introduce B.

d. Please consider providing the link to the github somewhere in the main manuscript.

10) Results:

a. A figure or lines cannot plot (e.g. lines 441-443). Information is, e.g., provided or indicated in a figure. Please rephrase and check throughout the manuscript.

b. Line 449: The comma after the 9 is probably misplaced. Please check.

c. Line 450, suggestion: Replace “5, 6” by “5 and 6” to be consistent with the previously provided numbers.

d. Lines 502-506: Probably “if” instead of “when”.

11) Discussion:

a. Please provide a paragraph about strength and limitations (and approaches to mitigate them) of the investigation (not of the studied estimators) presented in this manuscript – maybe just by reordering of the paragraphs or by highlighting these issues.

b. Line 549: In the discussion, results provided in the result section should be summarised and a reference to the supplementary material should not be necessary.

12) Figures (main part and supplement):

a. In general:

i. Please assure readability of all parts of the plot, including axes and legend.

ii. Please provide axes names (including unit) on the respective axes and not only in the figure description below the plot.

iii. Please avoid overlapping plot symbols in the legend.

iv. Please introduce for each figure all used abbreviations, e.g. SI.

b. Additionally in Figure 1: The legends could be omitted, if the model is provided in the top of each column – similarly to the disease in front of the rows. This would facilitate reading.

c. Additionally in Figure 6:

i. Suggestion: Providing the column information (Canada, provinces) above the columns might facilitate reading, especially in the presence of the axes names (see 12.a.ii).

ii. Lines 517-519 should be part of the figure 6 description itself.

d. Additionally in Figure 8:

i. The legend could be omitted, if the data is provided in front of each row.

ii. Please provide the prior distribution above each column of the plot.

13) Tables (main part and supplement):

a. In general: Please introduce all abbreviations, e.g. SI.

b. Table 1 (c): Please refer to (b) for tuple definition.

c. Table 2: No reference for ID method available?

d. Supplement: Please correct “… denotes a standard deviation great than…” to “denotes a standard deviation greater than”.

Reviewer #2: Thank you for having addressed all of my comments regarding the description of the method and presentation of the results. The manuscript is now easier to understand, thanks also to a revision of the English language.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Miriam Kesselmeier

Reviewer #2: No

PLoS One. 2022 Jun 17;17(6):e0269306. doi: 10.1371/journal.pone.0269306.r004

Author response to Decision Letter 1

18 Apr 2022

We have supplied a rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This is uploaded as a separate file labeled 'Response to Reviewers'.

We have supplied a marked-up copy of our manuscript that highlights changes made to the original version, labeled 'Revised Manuscript with Track Changes'.

We have uploaded an unmarked version of our revised paper without tracked changes, labeled as 'Manuscript'.

Attachment

Submitted filename: main_revrev.pdf

Click here for additional data file.^{(145.2KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0269306.r005

Decision Letter 2

Inés P Mariño

4 May 2022

PONE-D-21-22343R2Estimating the basic reproduction number at the beginning of an outbreakPLOS ONE

Dear Dr. Heffernan,

Please submit your revised manuscript by Jun 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Inés P. Mariño, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed almost all of my comments.

An issue that was not addressed, yet, can be found in lines 410-425. There are still "influenza one" and "influenza two", although the authors stated that they have changed all to "influenza 1" and "influenza 2", respectively. Please adapt.

Coming back to strength and limitations in the discussion, my wording was not clear. I am sorry for the inconvenience. My suggestion was to additionally provide a paragraph on strengths and limitations (and approaches to mitigate them) of the conducted study with respect to, e.g., selected scenarios and the selection of the investigated estimators. Additionally to the strengths and limitations of the investigated estimators. Maybe a simple reordering might be a solution for highlighting.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Miriam Kesselmeier

PLoS One. 2022 Jun 17;17(6):e0269306. doi: 10.1371/journal.pone.0269306.r006

Author response to Decision Letter 2

12 May 2022

Dear Editor and Reviewers,

We thank the Editor and both reviewers for their consideration and careful reading of our manuscript. Reviewer #2 is now completed satisfi\fed with the manuscript and is not requesting a single change. Reviewer #1 has requested two minor changes and we have implemented both.

Comment 1: The authors have addressed almost all of my comments.

(a) An issue that was not addressed, yet, can be found in lines 410-425. There are still "influenza one" and "influenza two", although the authors stated that they have changed all to "influenza 1" and "infuenza 2",

respectively. Please adapt.

We missed these entries in the previous revision. This has been done.

(b) Coming back to strength and limitations in the discussion, my wording was not clear. I am sorry for the inconvenience. My suggestion was to additionally provide a paragraph on strengths and limitations (and approaches to mitigate them) of the conducted study with respect to, e.g., selected scenarios and the selection of the investigated estimators. Additionally to the strengths and limitations of the investigated

estimators. Maybe a simple reordering might be a solution for highlighting.

We have summarized some strengths and limitations in a final paragraph to the Conclusion.

Attachment

Submitted filename: reply_to_reviewer.pdf

Click here for additional data file.^{(70KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0269306.r007

Decision Letter 3

Inés P Mariño

19 May 2022

Estimating the basic reproduction number at the beginning of an outbreak

PONE-D-21-22343R3

Dear Dr. Heffernan,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Inés P. Mariño, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0269306.r008

Acceptance letter

Inés P Mariño

7 Jun 2022

PONE-D-21-22343R3

Estimating the basic reproduction number at the beginning of an outbreak

Dear Dr. Heffernan:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Inés P. Mariño

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. A supplementary file contains additional simulations results (both tables and boxplots) as well as some further technical details.

(PDF)

Click here for additional data file.^{(641.6KB, pdf)}

S1 Appendix

(PDF)

Click here for additional data file.^{(210.9KB, pdf)}

Attachment

Submitted filename: ProjectInc_revisions_replytoreviewers.pdf

Click here for additional data file.^{(175.1KB, pdf)}

Attachment

Submitted filename: main_revrev.pdf

Click here for additional data file.^{(145.2KB, pdf)}

Attachment

Submitted filename: reply_to_reviewer.pdf

Click here for additional data file.^{(70KB, pdf)}

Data Availability Statement

Yes - all data are fully available without restriction; The code and data can be found at https://github.com/hannajankowski/R0_estimators_data.

[pone.0269306.ref001] 1. Zhao S, Lin Q, Ran J, Musa SS, Yang G, Wang W, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International Journal of Infectious Diseases. 2020;92:214–217. doi: 10.1016/j.ijid.2020.01.050 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref002] 2. Tuite AR, Fisman DN. Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic. Annals of Internal Medicine. 2020;172(8):567–568. doi: 10.7326/M20-0358 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref003] 3. Knight J, Mishra S. Estimating effective reproduction number using generation time versus serial interval, with application to COVID-19 in the Greater Toronto Area, Canada. Infectious Disease Modelling. 2020;5:889–896. doi: 10.1016/j.idm.2020.10.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref004] 4. Mellan TA, Hoeltgebaum HH, Mishra S, Whittaker C, Schnekenberg RP, Gandy A, et al. Report 21: Estimating COVID-19 cases and reproduction number in Brazil. medRxiv. 2020. [Google Scholar]

[pone.0269306.ref005] 5. Hilton J, Keeling MJ. Estimation of country-level basic reproductive ratios for novel Coronavirus (SARS-CoV-2/COVID-19) using synthetic contact matrices. PLoS Computational Biology. 2020;16(7):e1008031. doi: 10.1371/journal.pcbi.1008031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref006] 6. Price DJ, Shearer FM, Meehan MT, McBryde E, Moss R, Golding N, et al. Early analysis of the Australian COVID-19 epidemic. ELife. 2020;9:e58785. doi: 10.7554/eLife.58785 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref007] 7. Nishiura H, Chowell G, Safan M, Castillo-Chavez C. Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) 2009. Theoretical Biology and Medical Modelling. 2010;7(1):1–13. doi: 10.1186/1742-4682-7-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref008] 8. Chowell G, Echevarría-Zuno S, Viboud C, Simonsen L, Tamerius J, Miller MA, et al. Characterizing the epidemiology of the 2009 influenza A/H1N1 pandemic in Mexico. PLoS Med. 2011;8(5):e1000436. doi: 10.1371/journal.pmed.1000436 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref009] 9. Tuite AR, Greer AL, Whelan M, Winter AL, Lee B, Yan P, et al. Estimated epidemiologic parameters and morbidity associated with pandemic H1N1 influenza. CMAJ. 2010;182(2):131–136. doi: 10.1503/cmaj.091807 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref010] 10. Paine S, Mercer G, Kelly P, Bandaranayake D, Baker M, Huang Q, et al. Transmissibility of 2009 pandemic influenza A (H1N1) in New Zealand: effective reproduction number and influence of age, ethnicity and importations. Eurosurveillance. 2010;15(24):19591. doi: 10.2807/ese.15.24.19591-en [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref011] 11. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, et al. Pandemic potential of a strain of influenza A (H1N1): early findings. Science. 2009;324(5934):1557–1561. doi: 10.1126/science.1176062 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref012] 12. Pourbohloul B, Ahued A, Davoudi B, Meza R, Meyers LA, Skowronski DM, et al. Initial human transmission dynamics of the pandemic (H1N1) 2009 virus in North America. Influenza and Other Respiratory Viruses. 2009;3(5):215–222. doi: 10.1111/j.1750-2659.2009.00100.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref013] 13. Chowell G, Blumberg S, Simonsen L, Miller MA, Viboud C. Synthesizing data and models for the spread of MERS-CoV, 2013: key role of index cases and hospital transmission. Epidemics. 2014;9:40–51. doi: 10.1016/j.epidem.2014.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref014] 14. Hsieh YH. 2015 Middle East respiratory syndrome coronavirus (MERS-CoV) nosocomial outbreak in South Korea: insights from modeling. PeerJ. 2015;3:e1505. doi: 10.7717/peerj.1505 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref015] 15. Cauchemez S, Fraser C, Van Kerkhove MD, Donnelly CA, Riley S, Rambaut A, et al. Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility. The Lancet Infectious Diseases. 2014;14(1):50–56. doi: 10.1016/S1473-3099(13)70304-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref016] 16. Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, et al. Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science. 2003;300(5627):1961–1966. doi: 10.1126/science.1086478 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref017] 17. Anderson RM, Fraser C, Ghani AC, Donnelly CA, Riley S, Ferguson NM, et al. Epidemiology, transmission dynamics and control of SARS: The 2002–2003 epidemic. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2004;359(1447):1091–1105. doi: 10.1098/rstb.2004.1490 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref018] 18. Wang W, Ruan S. Simulating the SARS outbreak in Beijing with limited data. Journal of Theoretical Biology. 2004;227(3):369–379. doi: 10.1016/j.jtbi.2003.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref019] 19. Dye C, Gay N. Modeling the SARS epidemic. Science. 2003;300(5627):1884–1885. doi: 10.1126/science.1086925 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref020] 20. Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. Journal of the Royal Society Interface. 2005;2(4):281–293. doi: 10.1098/rsif.2005.0042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref021] 21. Diekmann O, Heesterbeek JAP, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology. 1990;28(4):365–382. doi: 10.1007/BF00178324 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref022] 22. Diekmann O, Heesterbeek J, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface. 2010;7(47):873–885. doi: 10.1098/rsif.2009.0386 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref023] 23. van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical Biosciences. 2002;180(1-2):29–48. doi: 10.1016/S0025-5564(02)00108-6 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref024] 24. Vegvari C, Abbot S, Ball F, et al. Commentary on the use of the reproduction number R during the COVID-19 pandemic. Statistical Methods in Medical Research. 2021. doi: 10.1177/09622802211037079 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref025] 25. Heesterbeek J, Dietz K. The concept of R0 in epidemic theory. Statistica Neerlandica. 1996;50(1):89–110. doi: 10.1111/j.1467-9574.1996.tb01482.x [DOI] [Google Scholar]

[pone.0269306.ref026] 26. Blumberg S, Lloyd-Smith JO. Comparing methods for estimating R0 from the size distribution of subcritical transmission chains. Epidemics. 2013;5(3):131–145. doi: 10.1016/j.epidem.2013.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref027] 27.Gallagher S, Chang A, Eddy WF. Exploring the nuances of R0: Eight estimates and application to 2009 pandemic influenza. arXiv preprint arXiv:200310442. 2020.

[pone.0269306.ref028] 28. Farrington CP, Kanaan MN, Gay NJ. Estimation of the basic reproduction number for infectious diseases from age-stratified serological survey data. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2001;50(3):251–292. [Google Scholar]

[pone.0269306.ref029] 29. White LF, Moser CB, Thompson RM, Pagano M. Statistical Estimation of the Reproductive Number from Case Notification Data. American Journal of Epidemiology. 2021. doi: 10.1093/aje/kwaa211 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref030] 30. Chowell G, Nishiura H, Bettencourt LM. Comparative estimation of the reproduction number for pandemic influenza from daily case notification data. Journal of the Royal Society Interface. 2007;4(12):155–166. doi: 10.1098/rsif.2006.0161 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref031] 31. Biggerstaff M, Cauchemez S, Reed C, Gambhir M, Finelli L. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature. BMC Infectious Diseases. 2014;14(1):1–20. doi: 10.1186/1471-2334-14-480 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref032] 32. Alene M, Yismaw L, Assemie MA, Ketema DB, Gietaneh W, Birhan TY. Serial interval and incubation period of COVID-19: a systematic review and meta-analysis. BMC Infectious Diseases. 2021;21(1):1–9. doi: 10.1186/s12879-021-05950-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref033] 33. Cowling BJ, Fang VJ, Riley S, Peiris JM, Leung GM. Estimation of the serial interval of influenza. Epidemiology (Cambridge, Mass). 2009;20(3):344. doi: 10.1097/EDE.0b013e31819d1092 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref034] 34. Vink MA, Bootsma MCJ, Wallinga J. Serial intervals of respiratory infectious diseases: a systematic review and analysis. American Journal of Epidemiology. 2014;180(9):865–875. doi: 10.1093/aje/kwu209 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref035] 35. Park JE, Ryu Y. Transmissibility and severity of influenza virus by subtype. Infection, Genetics and Evolution. 2018;65:288–292. doi: 10.1016/j.meegid.2018.08.007 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref036] 36. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1992. [Google Scholar]

[pone.0269306.ref037] 37. Allen LJ, Brauer F, Van den Driessche P, Wu J. Mathematical epidemiology. vol. 1945. Springer; 2008. [Google Scholar]

[pone.0269306.ref038] 38. Keeling MJ, Rohani P. Modeling infectious diseases in humans and animals. Princeton university press; 2011. [Google Scholar]

[pone.0269306.ref039] 39. Heffernan JM, Wahl LM. Monte Carlo estimates of natural variation in HIV infection. Journal of Theoretical Biology. 2005;236(2):137–153. doi: 10.1016/j.jtbi.2005.03.002 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref040] 40. Heffernan JM, Wahl LM. Natural variation in HIV infection: Monte Carlo estimates that include CD8 effector cells. Journal of Theoretical Biology. 2006;243(2):191–204. doi: 10.1016/j.jtbi.2006.05.032 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref041] 41. Ma J. Estimating epidemic exponential growth rate and basic reproduction number. Infectious Disease Modelling. 2020;5:129–141. doi: 10.1016/j.idm.2019.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref042] 42. White LF, Pagano M. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Statistics in Medicine. 2008;27:2999–3016. doi: 10.1002/sim.3136 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref043] 43. Bettencourt LMA, Riberio RM. Real time Bayesian estimation of the epidemic potential of emerging infectious diseases. PLOS ONE. 2008;3. doi: 10.1371/journal.pone.0002185 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref044] 44. Fisman DN, Hauck TS, Tuite AR, Greer AL. An IDEA for Short Term Outbreak Projection: Nearcasting Using the Basic Reproduction Number. PLOS ONE. 2013;8. doi: 10.1371/journal.pone.0083622 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref045] 45. King AA, Ionides EL, Breto C. Statistical Inference for Partially Observed Markov Processes; 2017. R package version 1.12. [Google Scholar]

[pone.0269306.ref046] 46. O’Neill PD, Roberts GO. Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical Society Series A. 1999;162:121–129. doi: 10.1111/1467-985X.00125 [DOI] [Google Scholar]

[pone.0269306.ref047] 47. Obadia T, Boëlle P. R0: Estimation of R0 and Real-Time Reproduction Number from Epidemics; 2015. R package version 1.2-6. [Google Scholar]

[pone.0269306.ref048] 48. Obadia T, Haneef R, Boëlle P. The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks. BMC Med Inform Decis Mak. 2017;12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref049] 49. He D, Ionides EL, King AA. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. JR Soc Interface. 2010;7. doi: 10.1098/rsif.2009.0151 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref050] 50. Nguyen D, Ionides EL, King AA. Statistical Inference for Partially Observed Markov Processes via the R Package pomp. Journal of Statistical Software. 2016;69. [Google Scholar]

[pone.0269306.ref051] 51.Akanteva A, Dick D, Heffernan JM. A database of public healthcare mitigation and relaxation during the COVID-19 pandemic, for all Canadian provinces; 2022. Database access—available on request.

[pone.0269306.ref052] 52. Berry I, Soucy JPR, Tuite A, Fisman D. Open access epidemiologic data and an interactive dashboard to monitor the COVID-19 outbreak in Canada. Cmaj. 2020;192(15):E420–E420. doi: 10.1503/cmaj.75262 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref053] 53. Black AJ, Geard N, McCaw JM, McVernon J, Ross JV. Characterising pandemic severity and transmissibility from data collected during first few hundred studies. Epidemics. 2017;19:61–73. doi: 10.1016/j.epidem.2017.01.004 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref054] 54.World Health Organization and others. The First Few X cases and contacts (FFX) investigation protocol for coronavirus disease 2019 (COVID-19). World Health Organization; 2020.

[pone.0269306.ref055] 55. McLean E, Pebody R, Campbell C, Chamberland M, Hawkins C, Nguyen-Van-Tam J, et al. Pandemic (H1N1) 2009 influenza in the UK: clinical and epidemiological findings from the first few hundred (FF100) cases. Epidemiology & Infection. 2010;138(11):1531–1541. doi: 10.1017/S0950268810001366 [DOI] [PubMed] [Google Scholar]

[pone.0269306.ref056] 56. Boddington NL, Charlett A, Elgohari S, Walker JL, McDonald HI, Byers C, et al. COVID-19 in Great Britain: epidemiological and clinical characteristics of the first few hundred (FF100) cases: a descriptive case series and case control analysis. MedRxiv. 2020. [Google Scholar]

[pone.0269306.ref057] 57. van Gageldonk-Lafeber AB, van der Sande MA, Meijer A, Friesema IH, Donker GA, Reimerink J, et al. Utility of the first few 100 approach during the 2009 influenza A (H1N1) pandemic in the Netherlands. Antimicrobial Resistance and Infection Control. 2012;1(1):1–7. doi: 10.1186/2047-2994-1-30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref058] 58.England HPA. “First Few Hundred” Project Epidemiological Protocols for Comprehensive Assessment of Early Swine Influenza Cases in the United Kingdom; 2009.

[pone.0269306.ref059] 59. Ghani A, Baguelin M, Griffin J, Flasche S, van Hoek AJ, Cauchemez S, et al. The early transmission dynamics of H1N1pdm influenza in the United Kingdom. PLoS Currents. 2009;1. doi: 10.1371/currents.RRN1130 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0269306.ref060] 60.Pandemic Influenza. Australian Health Management Plan for Pandemic Influenza; 2014. https://www1.health.gov.au/internet/main/publishing.nsf/Content/ohp-ahmppi.htm.

PERMALINK

Estimating the basic reproduction number at the beginning of an outbreak

Sawitree Boonpatcharanon

Jane M Heffernan

Hanna Jankowski

Roles

Abstract

Introduction

Methods

Epidemiological models

Table 1. SIR, SEIR, SEAIR model parameters and values, R0, serial interval.

R0 and the serial distribution

Methods for estimating R0

Table 2. Summary of estimation methods for R0.

WP: Maximum likelihood estimation of a branching model

seqB: Equential Bayes estimation using an SIR approximation

ID and IDEA: Least square estimation using incidence decay approximations

plug-n-play: Maximum likelihood using sequential Monte Carlo for partially observed epidemics

fullBayes: Bayesian inference for partially observed epidemics

Real world COVID-19 data

Workflow

Results

Epidemic simulations

Fig 1. The number of infectious individuals (y-axis) at time t in weeks (x-axis); from left to right: SIR, SEIR, and SEAIR; from top to bottom the examples are influenza 1, influenza 2, then covid19.

R0 estimates

Using synthetic data from the SIR, SEIR and SEAIR epidemiological models

Fig 2. Influenza example 1 estimated MSE of R0 estimators assuming known serial interval (SI) with SIR data (week on x-axis).

Fig 3. Influenza example 2 estimated MSE of R0 estimators assuming known serial interval (SI) with SIR data (week on x-axis).

Fig 5. Estimated MSE of R0 estimators assuming unknown serial interval (SI) (week on x-axis).

Fig 4. COVID-19 estimated MSE of R0 estimators assuming known serial interval (SI) with SEAIR data (week on x-axis).

Table 3. Computational time for the SEIR model for one data set (IF: Iterated filtering algorithm).

Using real world COVID-19 data

Fig 6. R0 estimators (y-axis) for COVID-19 data in Canada.

Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Inés P Mariño

Roles

Author response to Decision Letter 0

Decision Letter 1

Inés P Mariño

Roles

Author response to Decision Letter 1

Decision Letter 2

Inés P Mariño

Roles

Author response to Decision Letter 2

Decision Letter 3

Inés P Mariño

Roles

Acceptance letter

Inés P Mariño

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. SIR, SEIR, SEAIR model parameters and values, R₀, serial interval.

R₀ and the serial distribution

Methods for estimating R₀

Table 2. Summary of estimation methods for R₀.

R₀ estimates

Fig 2. Influenza example 1 estimated MSE of R₀ estimators assuming known serial interval (SI) with SIR data (week on x-axis).

Fig 3. Influenza example 2 estimated MSE of R₀ estimators assuming known serial interval (SI) with SIR data (week on x-axis).

Fig 5. Estimated MSE of R₀ estimators assuming unknown serial interval (SI) (week on x-axis).

Fig 4. COVID-19 estimated MSE of R₀ estimators assuming known serial interval (SI) with SEAIR data (week on x-axis).

Fig 6. R₀ estimators (y-axis) for COVID-19 data in Canada.