Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2024 Jun 21;19(6):e0305550. doi: 10.1371/journal.pone.0305550

ern: An R package to estimate the effective reproduction number using clinical and wastewater surveillance data

David Champredon 1,*, Irena Papst 1, Warsame Yusuf 1
Editor: Salim Heddam2
PMCID: PMC11192340  PMID: 38905266

Abstract

The effective reproduction number, Rt, is an important epidemiological metric used to assess the state of an epidemic, as well as the effectiveness of public health interventions undertaken in response. When Rt is above one, it indicates that new infections are increasing, and thus the epidemic is growing, while an Rt is below one indicates that new infections are decreasing, and so the epidemic is under control. There are several established software packages that are readily available to statistically estimate Rt using clinical surveillance data. However, there are comparatively few accessible tools for estimating Rt from pathogen wastewater concentration, a surveillance data stream that cemented its utility during the COVID-19 pandemic. We present the R package ern that aims to perform the estimation of the effective reproduction number from real-world wastewater or aggregated clinical surveillance data in a user-friendly way.

Introduction

The effective reproduction number, commonly denoted as Rt, is a key metric in epidemiology. It is defined as the average number of new infections generated by an infected individual at time t during an epidemic. It differs from the basic reproduction number, R0, in that it additionally accounts for changes in population susceptibility and transmission at a given point in time. The parameter Rt effectively measures the strength of transmission of an infectious pathogen within a population [1]. The value of Rt has a simple interpretation depending on whether it is greater than, equal to, or less than one: it implies that the number of new infections is either increasing, constant, or decreasing over time, respectively. Usually, Rt is estimated using the daily number of new cases reported via clinical surveillance. The importance of Rt was reinforced during the SARS-CoV-2 pandemic when its estimates supported public health decisions in many jurisdictions worldwide [2].

Wastewater-based epidemiological surveillance emerged as a critical component of the public health arsenal to monitor the COVID-19 pandemic (e.g., [3, 4]), despite being used since at least since the 1940s to monitor the poliovirus [5]. While individuals infected with SARS-CoV-2 shed viral particles through various routes (such as urine, saliva, and sputum), stool shedding is the dominant source of viral shedding when examining community-level wastewater surveillance [6]. Once shed, viral particles enter the sewer network and reside in wastewater. Wastewater samples are typically collected at treatment plants and viral RNA is extracted from these samples using various laboratory methods. The concentration of viral RNA in these samples can be quantified using real-time quantitative polymerase chain reaction (RT-qPCR) as well as digital droplet PCR. The concentration is assumed to be proportional to the infection prevalence in the community living in the catchment area (up to a conversion factor). Fecal shedding occurs passively and irrespective of the symptomatic status of the infected individual [7], although shedding is likely to be at its peak during the symptomatic period [8]. Hence wastewater surveillance data does not have the same biases as clinical surveillance data, which tends to focus on symptomatic/severe infections.

In light of the utility of using wastewater-based surveillance during the COVID-19 pandemic, this methodology has been applied successfully to several other pathogens: human influenza, respiratory syncytial virus, and mpox are now routinely monitored in wastewater samples in many jurisdictions [9, 10]. Therefore, it is important for the public health community to be able to easily estimate Rt of an infectious disease from wastewater data. Moreover, as wastewater-based epidemiological surveillance expands, public health organizations will likely leverage both clinical and wastewater-based surveillance data to monitor the spread of pathogens. As such, it would be useful to have a tool that estimates Rt concordantly across both of these data sources.

The literature on methods to estimate Rt from clinical data is vast due to the importance of Rt in infectious disease epidemiology (for example [1, 1115]). On the contrary, few studies have attempted to estimate Rt from wastewater data. Huisman et al. [16] proposed a method based on deconvoluting the fecal shedding distribution. Previous work has developed epidemic compartmental models that can integrate wastewater-based surveillance [1719] but Rt cannot be derived explicitly (except for [19]). Jiang et al. [20] derived Rt from an artificial neural network, and Amman et al. [21] approximated Rt of SARS-CoV-2 variants from their relative abundance in wastewater samples. While these methods are useful, there have been relatively few efforts to port these theoretical frameworks into user-friendly software to apply them to real-world wastewater data. One recently-released R package, EpiSewer, aims to address this gap [22].

Clinical data are often reported as aggregated cases over a period of time, typically weekly. However, a key parameter in estimating Rt is the distribution of the intrinsic generation interval (defined as the interval between the time when an individual is infected by an infector and the time when this infector was infected). For many infectious pathogens, this interval is on the order of days. Many existing implementations of Rt estimation in R libraries require that the input data (clinical case reports) and the specification of the intrinsic generation interval [23] are on the same timescale (e.g., days). For example, H1N1 influenza has a mean intrinsic generation interval of about 3 days and a maximum value of about 7 days [24, 25]. If the data is reported weekly, it is not possible to define the generation interval distribution meaningfully in units of week. This is because the generation interval distribution must be discrete for existing methods, so it is not as easy as defining a continuous distribution rescaled to weeks. Hence, before estimating Rt with existing methods, the input data must first be disaggregated onto the scale of days, which is not a straightforward process.

Several R packages exist to estimate Rt from clinical data. One popular package is EpiEstim, which initially implemented a Poisson-based model of the renewal equation [26]. This package has recently been improved to handle aggregated input data [27]. Briefly, the approach to estimating Rt from aggregated clinical reports (typically reported weekly) relies on an expectation-maximization algorithm to disaggregate the counts into daily case reports, assuming a local exponential growth for transmission. As a result of this assumption, the inferred daily case reports have a piecewise exponential form, which may be problematic for downstream applications. Moreover, EpiEstim does not explicitly handle the various time delays like, for example, incubation period and reporting delays (the time between symptoms onset and reporting of a case) typically encountered in practice with epidemiological reports.

EpiNow2 is also a recent R package that aims to improve the estimation of Rt including for example reporting delays and periodicity, as well as the propagation of parameter uncertainty [28]. The package also provides tools for short-term forecasting of case reports but cannot handle explicitly non-daily (e.g., weekly) reporting. Another R package, epidemia provides a regression-based framework to estimate Rt from daily clinical data [29]. We note that while theoretically possible, estimating Rt from wastewater data with EpiNow2 or epidemia is not straightforward, especially for users who do not have a modelling background. Moreover, because of their reliance on the Bayesian inference software Stan [30], computing time may be long. The R package estimateR is another tool to estimate Rt from clinical data but does not explicitly handle wastewater data or aggregated clinical data [31].

Here, we present the R library ern to address the gaps identified above, specifically:

  • to disaggregate the clinical reports into a shorter time unit to enable estimation of Rt using an intrinsic generation interval on a useful timescale;

  • to provide a framework to estimate Rt from wastewater data, consistent with an estimation based on clinical data;

  • to provide a user-friendly interface geared at public-health practitioners that may have limited proficiency in the R programming language;

  • to perform an efficient and rapid Rt estimation.

Table 1 summarises key features of the R packages discussed above, along with the ern package.

Table 1. A comparison of ern with other R packages built to estimate Rt from epidemiological data.

Checkmarks (✓) indicate the presence of a feature and crosses (×) indicate the absence. A cross with an asterisk (×*) denotes a feature not built-in the package but technically possible though not straightforward for the average user (e.g., they may require additional modelling knowledge and/or the use of advanced/less documented features).

R Package Accepted inputs Available features
Wastewater concentration Daily clinical case data Reporting delays Disaggregate case data
ern
EpiSewer × ×
EpiEstim × ×
EpiNow2 ×* ×*
epidemia ×* ×*
estimateR ×* ×

The ern package ultimately uses the EpiEstim package for the core of the Rt computation as EpiEstim already provides a robust and one of the fastest implementations of well-tested Rt estimation algorithms. However, ern wraps complex and critical features for estimating Rt from real-world clinical and wastewater data that have not all been implemented in any one existing R package for Rt estimation.

Materials and methods

The R code for the ern package is available on the Comprehensive R Archive Network at https://cran.r-project.org/web/packages/ern/index.html.

Fig 1 gives a high-level overview of how the ern package computes Rt for both wastewater and clinical input data. The pipeline for each data stream has three components:

Fig 1. Overview of the ern data pipeline to estimate Rt.

Fig 1

  1. Estimating daily incidence from the raw data (wastewater or clinical)

  2. Estimating Rt from the estimated daily incidence

  3. Repeating Rt estimates (previous two components) to generate an ensemble reflecting various sources of uncertainty

Throughout this work, we use the term incidence to denote the “true” underlying incidence of infections, as opposed to reported incidence (from clinical data), which we instead refer to as reports or reported cases.

Dashed elements represent optional components. Layered boxes represent replicates from resampling that inform uncertainty in the final Rt ensemble. Resampled elements include the distributions used in deconvolutions and EpiEstim (sampled from the specified family of distributions for each quantity), the set of inferred daily reports (when these are estimated), and the underreporting proportion.

Estimating daily incidence with wastewater data

Our approach to estimating the daily incidence time series from wastewater data is similar to the one taken in [16], where the concentration of pathogen shed in wastewater, wt, is assumed to be the convolution of the incidence of infections, i, and the fecal shedding distribution f (the relative proportion of pathogen shed in feces as a function of time since infection) of an average infected individual:

wt=ωk=1t-1i(t-k)f(k) (1)

The function f can be defined such that ∑k>0f(k) = 1. The parameter ω denotes how much a single average infection contributes to wastewater concentration in total over the course of infection, as measured in the sewer system. This parameter captures baseline average shedding, but also reflects the loss of viral particles measured between the shedding and downstream sampling locations (dependant on the sewer system, environmental factors, and the processing pipeline of the laboratory).

Since we model the wastewater signal as a convolution of incidence with the fecal shedding distribution, we must perform a deconvolution of the wastewater signal with the fecal shedding distribution to recover incidence for Rt estimation. However, sampled pathogen concentration in wastewater tends to be a noisy signal, so we smooth the time series of concentrations wt. Wastewater samples are often taken a few days a week, so smoothing additionally interpolates the signal on a daily scale, which is a requirement for working with an intrinsic generation interval measured in days to estimate Rt. Hence, we obtain Wt, a daily interpolated and smoothed time series of pathogen concentrations in wastewater:

Wt=smooth_interpolation(wt,θ) (2)

where θ are the smoothing parameters.

The smoothing algorithms implemented in ern are moving average and LOESS (LOcally Estimated Scatterplot Smoothing), with default values set to provide a light smoothing of the time series. The moving average smooths a time series by taking an unweighted mean of all points in a window of each time point. ern users can specify the width and centering of the window with respect to the focal time point. The LOESS method is a generalisation of the moving average [32]; it still operates across subsets of the time series, but instead of computing the unweighted mean in each of these windows, it performs a weighted linear regression at each point and returns the predicted value of the focal time point. Weighting is done by distance from the focal time point, with closer points carrying more weight. The window size is controlled by a span parameter, which an ern user can specify, along with a minimum concentration to prevent zero or negative values in the smoothed time series when inputting low-concentration measurements.

Finally, to extract the daily incidence i(t), we substitute wt by Wt in Eq 1 and we use the Richardson-Lucy algorithm [3335] to deconvolute Wt using the fecal shedding distribution f as the kernel:

i(t)=RL(W,f) (3)

where RL represents the deconvolution algorithm.

Estimating daily incidence with clinical data

To estimate a daily incidence time series from daily clinical reports, the reports are optionally smoothed to eliminate some noise from the signal. As with wastewater input data, the smoothing algorithms available in ern are LOESS and moving average. Then, reports are scaled to account for underreporting and bring the signal to the scale of actual infections. Next, the smoothed and scaled time series is deconvoluted (similarly as in the wastewater method) using i) a reporting delay distribution kernel and ii) an incubation period distribution kernel. These two deconvolutions estimate daily “true” incidence (i.e., tallied by date of infection, not the report date).

In some cases, the clinical reporting frequency may not be compatible with the relevant timescale of the intrinsic generation interval distribution. For example, seasonal influenza cases are typically reported on a weekly basis, but its generation interval should be defined in units of days because it is shorter than a week for most cases [24, 25]. (For a detailed discussion on why the reporting frequency and timescale of the intrinsic generation interval must match, see section “Daily incidence to Rt”.) The package ern implements two methods to interpolate aggregate reports and produce inferred daily reports used to compute Rt.

The first method is called the “renewal” method as it involves a statistical model that infers the latent daily reports from aggregate counts using a standard “Susceptible-Infectious-Recovered” (SIR) epidemic model via the renewal equation [36, 37].

This approach ensures the inferred daily reports follow a realistic epidemic curve, as opposed to, e.g., an ad-hoc estimate such as naively dividing weekly reports by 7. A poor approximation of the exponential transmission process of the disease, as reflected in the inferred daily reports, could significantly impact the quality of the Rt estimates. See S1 File for an example.

With the renewal interpolation method, SIR model parameters are fitted to the aggregated (e.g., weekly) clinical reports using a Markov Chain Monte Carlo (MCMC) algorithm and then daily reports are inferred from the fitted model. We use the R package rjags to perform this inference. More details about this statistical model are given in S2 File.

While the renewal method better represents the process that generates observed aggregate case reports, it can be computationally intensive. Thus, we also provide a faster, alternative method using simple linear interpolation, described fully in S3 File.

Daily incidence to Rt

Once daily incidence has been estimated from either data stream, we feed this time series into the function estimate_r() of the package EpiEstim, along with a specific intrinsic general interval distribution. We use the mean value, as well as the 2.5% and 97.5% quantiles, as reported by EpiEstim::estimate_r() as a single estimate of Rt. (Resampling to produce an ensemble Rt estimate is discussed in the next section.)

Underpinning the EpiEstim::estimate_r() estimation of Rt is the following equation governing how incidence at the current time, i(t), is modelled by Rt, the generation interval distribution g, and past incidence:

i(t)=Rtk1g(k)i(t-k) (4)

Here, k = 1, … is a discrete-time index: incidence is being observed (inferred from reports) at discrete times, i(tk), and it is being weighted by a discrete generation interval distribution g(k) and scaled by Rt to calculate current incidence i(t). In other words, current incidence is a function of past incidence (and the generation interval distribution).

The discrete timescale used here is not prescribed (i.e. doesn’t necessarily have to be daily, weekly, etc.), but Eq 4 shows that the timescales of the generation interval and the observed incidence must match. Many infectious diseases, like influenza and COVID-19, produce generation intervals that are mostly less than a week, and so representing their generation interval distributions on the timescale of weeks (e.g. to match weekly reported incidence data input into EpiEstim::estimate_r()) would not yield useful results.

To understand precisely why a coarse generation interval may not yield useful results, let’s consider the example of influenza A/H1N1, which has a generation interval distribution smaller than 7 days in most settings [25] and assume we work with data reported weekly (so the unit of index k is week). In this case, we would need to define the generation interval distribution on a weekly scale as g(1) = 1 and g(k) = 0 for all k > 1 (the generation interval is 0 for any time larger than a week), and so

i(t)=Rti(t-1)Rt=i(t)/i(t-1) (5)

The parameter Rt is often used in public health surveillance to determine whether a disease is spreading or receding in a population by comparing it to the Rt=1 threshold. The crude approximation in Eq 4 would be >1, indicating the disease is spreading, exactly when i(t)>i(t − 1), and receding when i(t) < i(t − 1). If there is any noise in the incidence time series (inferred from observed reports), which there always is in real data, the approximation in Eq 5 would not be able to distinguish a true increase (or decrease) signal from noise.

For Rt to be a useful surveillance metric for infectious diseases, the generation interval must be represented in a timescale that describes finely enough the temporal variation of disease transmission. Many infectious disease data (especially respiratory ones) are reported on a coarser timescale (e.g, weeks), which is why we have built methods into ern to disaggregate input clinical data (as discussed in section “Estimating daily incidence with clinical data”).

Generating an Rt ensemble reflecting uncertainty

The package ern accounts for various sources of uncertainty in estimating Rt. There is uncertainty in some inputs used to estimate daily incidence for each data stream, as well as statistical uncertainty incorporated in the daily incidence to Rt estimate. The latter case is handled by EpiEstim through its Poisson-based model of the renewal equation [26]. The former case is handled by ern. Indeed, ern performs the Rt calculation repeatedly and then summarizes the results in an ensemble. Each realization of the ensemble involves (re)sampling each uncertain input.

For the wastewater data, the uncertain inputs can be:

  • the fecal shedding distribution,

  • the intrinsic generation interval distribution.

For the clinical data, the uncertain inputs can be:

  • the inferred daily reports,

  • the underreporting fraction,

  • the incubation period distribution,

  • the reporting delay distribution,

  • the intrinsic generation interval distribution.

Uncertain distributions are specified for ern as a family of distributions, where each distribution parameter has an associated standard deviation. Supported families of distributions include Gamma, Normal, and Log-Normal. One can also specify a Uniform distribution (e.g., for the underreporting proportion). Distribution parameters are assumed to be Gamma-distributed to ensure sampled values (which specify a sampled distribution) are strictly positive. Inferred daily reports are drawn from posterior samples produced by the MCMC fit (if estimated). We sample 300 posterior replicates (using EpiEstim::sample_posterior_R()) from every single estimate of Rt (i.e., each realization of the final Rt ensemble) and calculate by date the mean of those posteriors along with 2.5% and 97.5% quantiles for Rt to produce a single ensemble time series.

Results

The package ern has two functions with which to estimate the daily effective reproduction number, Rt, for each supported data stream:

  • estimate_R_ww, which uses the concentration of a pathogen in wastewater over time as the input signal;

  • estimate_R_cl, which uses the count of clinically reported cases over time as the input signal.

We give an illustration of each method below.

Example with wastewater data

The function estimate_R_ww estimates Rt from the pathogen concentration measured in wastewater. Its first input, ww.conc, is a dataframe with columns date (measurement date) and value (concentration value) that specifies the pathogen concentration in wastewater over time. The other inputs dist.fec and dist.gi specify parameters for two families of distributions: one for the fecal shedding rate distribution and the other for the intrinsic generation interval distribution, respectively.

We start by loading a subset of wastewater data that is attached in the ern package. This dataset contains daily average concentration data of SARS-CoV-2 (N2 gene), measured in gene copies per milliliter of wastewater, from the Iona Island wastewater treatment plant in Vancouver, British Columbia collected between 7 July 2023 and 5 November 2023 [38]. Note that the type of normalization of the wastewater data (e.g., viral concentration normalized by flow, other biomarkers, suspended solids mass, etc) is left to the user as this choice depends on each sampling site and laboratory methods.

R> ww.conc = ern::ww.data

This data is plotted in the top panel of Fig 3.

As this example uses the SARS-CoV-2 virus, we can define fecal shedding and generation interval as the following:

R> dist.fec = ern::def_dist(
+   dist     = "gamma",
+   mean     = 12.9,
+   mean_sd  = 1.1,
+   shape    = 1.7,
+   shape_sd = 0.27,
+   max      = 33
+ )

R> dist.gi  = ern::def_dist(
+   dist     = "gamma",
+   mean     = 6.8,
+   mean_sd  = 0.74,
+   shape    = 2.4,
+   shape_sd = 0.36,
+   max      = 15
+ )

Each distribution family is defined by a structured list:

> print(dist.fec)
$dist
[1] "gamma"

$mean
[1] 12.9

$mean_sd
[1] 1.1

$shape
[1] 1.7

$shape_sd
[1] 0.27

$max
[1] 33

The first element of each distribution family list, dist, gives the shape of the distribution family. The nomenclature of distribution names follows the one used in R (e.g., gamma from the R functions d/r/q/pgamma). The next four elements give parameters for this family of distributions, stated in terms of the mean and standard deviation, along with an associated standard deviation (_sd) for each distribution parameter. The final element of this list, max gives the maximum value to be drawn from this distribution; this is where the density is truncated (and then re-normalized to ensure it still sums to 1). This structure for the distribution list applies to Gamma, Normal, and Log-Normal families. For Uniform, ern currently supports only the specification of a single distribution (as opposed to a family). In this case, the distribution list specifying a Uniform would have three entires: dist, which would be equal to “unif”, and then min and max, to specify the minimum and maximum values with non-zero density (i.e., the support of the Uniform distribution).

We can visualize distributions by calling the function plot_dist. This convenience function will plot the mean distribution of the given family, that is, the distribution corresponding to the mean of each distribution parameter in the family. For example, plot_dist(dist.fec) was used to produce Fig 2 from the parameters for SARS-CoV-2 specified above.

Fig 2. Fecal shedding distribution example.

Fig 2

A possible choice for the mean fecal shedding distribution used for SARS-CoV-2 wastewater data.

The function estimate_R_ww also takes a number of parameters that give the user control over various components of the Rt estimation:

  • scaling.factor is the average number of infections attributable to a unit of pathogen concentration per day. This quantity is typically estimated from i) clinical cases, ii) wastewater concentrations and iii) an “ascertainment rate” that estimates the number of infections missed by clinical surveillance (for example, using serological data).

  • prm.smooth defines the smoothing settings for the input wastewater data;

  • prm.R defines the settings for the Rt estimates.

# Initializing scaling factor
R> scaling.factor = 1

# Initializing smoothing parameters
R> prm.smooth = list(
+   method = 'loess',  # smoothing method
+   align  = 'center', # smoothing alignment
+   span   = 0.30,     # smoothing span (used for loess smoothing only)
+   floor  = 5         # minimum smoothed concentration value
+                      # (optional, LOESS smoothing only)
+ )

# Initializing Rt settings
R> prm.R = list(
+   iter   = 20,           # number of resampling iterations 
+                          # to evaluate Rt ensemble
+   CI     = 0.95,         # confidence interval
+   window = 10,           # backward time window for Rt calculations
+   config.EpiEstim = NULL # optional EpiEstim configuration
                           # for Rt calculations
+ )

Once we have specified all of these settings, we can feed them in, along with the input wastewater concentration data and the relevant distributions, to estimate Rt:

R> r.estim = estimate_R_ww(
+   ww.conc        = ww.conc,
+   dist.fec       = dist.fec,
+   dist.gi        = dist.gi,
+   scaling.factor = scaling.factor,
+   prm.smooth     = prm.smooth,
+   prm.R          = prm.R
+ )

estimate_R_ww returns a list with four elements:

  • ww.conc: the original input of pathogen concentration in wastewater over time

  • ww.smooth: the smoothed wastewater concentration over time; includes columns:
    • t: internal time index
    • obs: smoothed value of the observation
    • date
  • inc: the daily incidence inferred over time; includes columns:
    • date
    • mean: mean of the inferred daily incidence
    • lwr, upr: lower and upper bounds of the 95% confidence interval for the inferred daily incidence
  • R: the estimated daily reproduction number over time; includes columns:
    • date
    • mean: mean Rt value
    • lwr, upr: lower and upper bounds of the confidence interval (width as specified in prm.R) for Rt

The function plot_diagnostic_ww conveniently displays all of the output data to help assess the quality of the Rt estimates (Fig 3).

Fig 3. Output of the function plot_diagnostic_ww.

Fig 3

The top panel shows the wastewater concentration data used as input (step line) along with the smoothed version of this time series (curve). The middle panel represents the daily incidence inferred from the smoothed wastewater concentration data (using the Richardson-Lucy deconvolution algorithm). The grey band gives a confidence band reflecting the uncertainty associated with the fecal shedding distribution. (The confidence width is set with prm.R$CI.) The estimated incidence is proportional to the parameter scaling.factor, here assumed equal to 1. The bottom panel shows the mean Rt estimates (solid line), along with a 95% confidence interval (grey band) reflecting various sources of uncertainty. The horizontal dashed line represents the Rt threshold value of 1, which is epidemiologically important.

Example with clinical data

As shown in Fig 1, a key feature implemented in ern is the ability to handle clinical data that is reported on a time scale that is coarser than the typical generation interval timescale when estimating Rt.

The function estimate_R_cl requires a data frame, cl.data, with one column for the report date (date) and another for the count of clinical reports (value). In addition, the user must specify a reporting fraction distribution (dist.repfrac) and three distribution families:

  • dist.repdelay: reporting delay;

  • dist.incub: incubation period;

  • dist.gi: intrinsic generation interval.

If input clinical reports are not reported daily, an additional parameter must be provided: popsize, representing the size of the population being considered, in order for daily reports to be inferred using the “renewal” method (see S1 File).

A sample of Canadian COVID-19 clinical reports are included in ern. This data set includes weekly reports from the provinces of British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, and Quebec, between 1 Feb 2020 and 1 Apr 2023 [39]. As an example, we start by loading a subset of the weekly clinical report data for Quebec:

R> # --- data
+ dat <- (ern::cl.data
+   |> dplyr::filter(
+        pt == "qc", 
+        dplyr::between(date, 
+                       as.Date("2021-06-01"),
+                       as.Date("2021-09-01"))
+   )
)

We define distributions for the reporting fraction, reporting delay, incubation period, and intrinsic generation interval:

R> # --- distributions
+ # reporting fraction
+ dist.repfrac = ern::def_dist(
+     dist = "unif",
+     min  = 0.1,
+     max  = 0.3
+ )
+ # reporting delay
+ dist.repdelay = ern::def_dist(
+     dist    = 'gamma',
+     mean    = 5, 
+     mean_sd = 1,
+     sd      = 1,
+     sd_sd   = 0.1,
+     max     = 10
+ )
+ # incubation period
+ dist.incub = ern::def_dist(
+     dist     = "gamma",
+     mean     = 3.49,
+     mean_sd  = 0.1477,
+     shape    = 8.5,
+     shape_sd = 1.8945,
+     max      = 8
+ )
+ # generation interval
+ dist.gi = ern::def_dist(
+     dist     = "gamma",
+     mean     = 6,
+     mean_sd  = 0.75,
+     shape    = 2.4,
+     shape_sd = 0.3,
+     max      = 10
+ )

The data set we are working with reports COVID-19 on a weekly basis, which is substantially longer than the typical generation interval of about 5 days for SARS-CoV-2 [40]. ern will estimate daily incidence from non-daily data. We specify the settings for this inference via prm.daily:

R> # --- settings
+ # daily report inference
+ prm.daily <- list(
+     method  = "renewal",
+     popsize = 8.5e6,     # Q3 (July 1) 2022 estimate for Quebec
+     burn    = 500,       # "burn-in" for MCMC
+     iter    = 500,       # MCMC iterations after burn-in
+     chains  = 2,         # number of chains
+     # priors for the R0 distribution (Gamma)
+     prior_R0_shape = 1.1, prior_R0_rate = 0.6,
+     # priors for the alpha distribution 
+     prior_alpha_shape = 1, prior_alpha_rate = 1
+ )

The method = “renewal” setting specifies the use of the renewal-equation-based epidemic model fitted with an MCMC algorithm, described fully in S1 File. This algorithm requires the specification of a total population size, which we source from Statistics Canada for this example [41]. The rest of the arguments in prm.daily give settings for the MCMC algorithm. The output of estimate_R_cl() has an element called diagnostic.mcmc which contains objects that help assess the convergence of the MCMC algorithm. In particular, a warning message is displayed if the Gelman-Rubin statistics [42] of the latent daily incidence variable is above 1.025, prompting the user to increase the number of MCMC iterations.

After the inference of the daily reports is performed, a check is run to ensure that the posterior aggregated daily reports are not too different from the observed aggregated reports (given as input). The parameter agg.reldiff.tol is the maximum tolerance (as a percentage) accepted for the relative difference between the observed and posterior aggregates:

R> # daily report inference check
+ prm.daily.check <- list(
+     agg.reldiff.tol = 10
+ )

The Bayesian model tends to be most error-prone at the start of the input time series, so after performing this check, ern will drop any inferred values before the differences first fall below the specified tolerance. It will not filter out observations after that point to ensure the inferred time series remains daily. It will also produce a warning to ensure the user is aware how many observations were dropped, along with some advice on how to increase the accuracy of the MCMC fit to decrease the number of dropped observations.

Choosing a number of MCMC iterations that is not very large (to avoid long computation times, for example) may lead to daily report posteriors that are not very smooth. This, in turn, can affect the quality of Rt estimates. Hence, ern provides a smoothing of the posterior daily reports in order to improve the quality of Rt inference. The smoothing parameters are defined as follows:

R> # smoothing
+ prm.smooth <- list(
+     method = "rollmean",
+     align  = "center",
+     window = 7
+ )

In the example above, the smoothing performs a centered moving average with a sliding window of 7 days. The same smoothing options are available across the wastewater and clinical methods.

We specify the parameters for the Rt ensemble, just as we did in the wastewater example:

R> # Rt computation
+  prm.R <- list(
+    iter            = 20,
+    CI              = 0.95, 
+    window          = 7,
+    config.EpiEstim = NULL
+  )

Finally, we can call the main ern function to estimate Rt from clinical data:

R> r.estim <- estimate_R_cl(
+   dat = cl.data,
+   dist.repdelay   = dist.repdelay,
+   dist.repfrac    = dist.repfrac,
+   dist.incub      = dist.incub,
+   dist.gi         = dist.gi,
+   prm.daily       = prm.daily,
+   prm.daily.check = prm.daily.check,
+   prm.smooth      = prm.smooth,
+   prm.R           = prm.R
+ )

estimate_R_cl returns a list with four elements:

  • cl.data: the original input of clinical disease reports over time, with an added column t for an internal time index

  • cl.daily: reports as input for Rt calculation (inferred daily counts if original inputs were aggregates, smoothed if specified); includes columns:
    • id: identifier for each realization (resampling iteration) of the daily report inference
    • date: daily date
    • value: inferred daily report count
    • t: internal time index
  • inferred.agg: inferred daily reports re-aggregated on the reporting schedule as input in cl.data; includes columns:
    • date: report date
    • obs: original (aggregated) observations
    • mean.agg: mean of the aggregated posterior daily reports
    • lwr.agg, upr.agg: lower and upper bounds of a 95% confidence interval of the aggregated inferred daily reports
  • R: the estimated daily reproduction number over time; includes columns:
    • date
    • mean: mean Rt value
    • lwr, upr: lower and upper bounds of a confidence interval for each Rt estimate
    • use: logical flag, FALSE denotes estimated Rt values that may be particularly unreliable as they fall within the maximum time range of one (truncated) generation interval from the start of the clinical report time series
  • diagnostic.mcmc: a list with various MCMC diagnostics, including
    • plot.traces: trace plots for fitted parameters
    • plot.gelmanrubin: plot of the Gelman Rubin statistics for fitted parameters
    • jags.obj: the JAGS output mcmc.list object, as produced by rjags [43]

The function plot_diagnostic_cl summarises this output (Fig 4).

Fig 4. Output of the function plot_diagnostic_cl.

Fig 4

The top panel shows the observed case report data used as input. The second panel from the top shows daily reports, smoothed and, in this case, inferred from the input aggregate (weekly) reports. When this inference is made, this panel also summarises the ensemble of daily report time series with a grey band, whose limits are given by the 2.5% and 97.5% percentiles by day. The second panel from the bottom appears only in the case where the input data is coarser than daily and compares the observed (aggregate) reports (black points) to aggregates from inferred daily reports (red points with 95% confidence bars), so that the user can check whether inferred daily reports are plausible against the input data. The bottom panel shows the mean Rt estimates (solid line), along with a 95% confidence interval (grey band) reflecting various sources of uncertainty. The horizontal dashed line represents the Rt threshold value of 1, which is epidemiologically important.

Sensitivity analysis for wastewater Rt

We perform a sensitivity analysis of Rt estimations with wastewater input data to investigate various input choices since the methods used for this data stream are still relatively new. The code to replicate these results is provided in S5 File.

The package ern currently includes two smoothing methods: rolling mean and LOESS. Using similar smoothness parameters, i.e., a centered rolling mean on a 5-day window and a span parameter of 0.3 for LOESS, Fig S4–1 in S4 File shows that the Rt estimates are comparable.

Because of the paucity of clinical studies, there is a fair amount of uncertainty regarding the temporal profile of fecal shedding for respiratory infections. Hence, in Fig S4–2 in S5 File, we show how the Rt estimates can be significantly impacted by assuming differing profiles based Gamma, normal, uniform, and exponential distribution-like shapes for the fecal shedding distribution.

When the prevalence of infections is low in the population of interest, the epidemic “signal”, represented by a low count of clinical reports and/or low viral concentration in wastewater, is dominated by noise. In this case, the estimation of Rt may be challenging. In Figs S4–3 in S4 File, we illustrate this using wastewater data by estimating Rt on sample data multiplied by a factor of 0.01, 0.1, 1 and 10. The Rt estimates are similar for multipliers 1 and 10, but very different (and unreliable) when the multiplier is 0.1 or 0.01, confirming the difficulty of estimating Rt when prevalence is (very) low.

Computing time benchmarks

Rapid Rt estimation can be important in some cases, such as during an epidemic being monitored daily in order to follow its evolution closely and assess the success of ongoing interventions meant to reduce transmission. Here, a computation time of less than a day is key. Rapid Rt calculation is also important in cases where there are many input datasets. For instance, if one is calculating Rt with wastewater data across an entire country, they may wish to do so by computing one Rt per wastewater sampling location (it can be difficult to meaningfully combine wastewater data sampled from different sites into a single signal). Here, it is important for the Rt calculation to be quick so that one can produce Rt estimates for a large number of wastewater sampling locations in a reasonable amount of time.

As an example, Table 2 shows computing times to calculate Rt with different R packages using either weekly clinical case reports or viral concentration in wastewater. These times are simply meant to illustrate the order of magnitudes of the calculation times, and do not represent a thorough benchmarking exercise. In this example, estimates for wastewater-based Rt take about one second with ern compared to about four minutes with EpiSewer. The latter uses a Hamiltonian Markov Chain Monte Carlo (via Stan) to estimate latent variables, which is much more computationally intensive than the simple deconvolution performed in ern. For Rt estimation based on weekly clinical reports, the computing time is on the order of one second for both ern (using the linear method) and EpiEstim. The code to reproduce this example is given in S6 File.

Table 2. Sample computing times (in seconds) for Rt estimates using different R packages.

The wastewater data is taken from the data set shipped with the package, which consists of four months of daily SARS-CoV-2 concentration measurements for the city of Zurich. The clinical data are simulated weekly reports. See S6 File for more details.

R package data type method (daily report inference) compute time (s)
1 ern wastewater - 0.85
2 EpiSewer wastewater - 251.23
3 ern clinical linear 1.89
4 ern clinical renewal 20.72
5 EpiEstim clinical expectation maximization 0.81

Discussion and conclusions

The R package ern was designed with public health practitioners in mind, specifically to provide them with a tool to estimate, in a user-friendly way, the effective reproduction number Rt from typical clinical reports and/or data reporting pathogen concentration in wastewater. The inferences for Rt rely on various distributions (e.g., fecal shedding, incubation period, generation interval) that are rarely perfectly known. To reflect this uncertainty, these distributions are defined as family of distributions and the estimation process samples from those families to propagate this source of uncertainty into the final Rt estimates. Clinical cases of infectious diseases are rarely reported on a daily basis despite being the most natural time unit (at least for respiratory diseases) in Rt models. The package ern accepts non-daily clinical reports and can infer daily incidence using a genuine transmission model.

The methods implemented in ern to estimate Rt from clinical or wastewater data are similar to other existing methods. For example, the deconvolution of the incubation period and reporting delays in ern use the same Richardson-Lucy algorithm as in [15, 16]. The LOESS or rolling mean smoothing of the wastewater data as a way to preprocess the data to reduce the noise is also use broadly. Indeed, the R package ern leverages previous works and focuses its scientific contribution on bringing these different methodological approaches into a single, consistent, user-friendly package.

There are several limitations of the ern package. For clinical inputs, the renewal method depends on JAGS, which may not be straightforward to install for the average user. The computing time when using aggregated clinical reports and the renewal method may be too long for some applications. Moreover, the renewal method does not have a time-dependent transmission parameter in its current implementation, so estimating Rt using this method is appropriate for a single epidemic wave without any significant change in transmission (for example, a typical seasonal influenza wave in a non-tropical region). The linear method can handle temporal changes in transmission, though it may not always infer a realistic epidemic curve for inferred daily reports.

Another limitation is that the model in ern does not have the latent incidence as a random variable when estimating Rt from wastewater data (unlike, for example, the R package EpiSewer), so this uncertainty is not accounted for. Even if the uncertainty of the fecal shedding distribution is propagated, it does not capture the full scale of uncertainty. This can be problematic for real-time surveillance because the uncertainty for Rt estimates may be underestimated for dates close to the estimation time.

For wastewater inputs, the scaling factor used to convert between prevalence and viral concentration in wastewater is difficult to estimate in practice. Ideally, one would need, over multiple days, i) an accurate estimate of the actual prevalence in the catchment area from (extensive) clinical surveillance and ii) viral concentration measurements over the same period. The scaling factor would then be proportional to the ratio of prevalence over concentration (and depending on the laboratory method used to measure the viral concentration, additional normalization, for instance by flow rate or suspended solid mass, may be required).

ern currently allows users to define a particular distribution for fecal shedding kinetics. Studies examining SARS-CoV-2 shedding have shown that fecal shedding kinetics can vary among infected individuals [44, 45]. Moreover, the scaling factor in ern is held constant over time, which may not be realistic as new viral lineages emerge and the immune profile of the population evolves over time; both of these factors can affect pathogen shedding in wastewater. As a result, the “inferred incidence” estimated by ern (using the output estimate_R_ww(…)$inc) must be interpreted carefully.

Wastewater sample concentration can also be affected by environmental and structural factors of sewer systems. Flow from rainfall and snowmelt can dilute sample concentration readings [46] and sewer transit time can impact the rate at which viral particles degrade prior to sample collection [47].

Future versions of ern will attempt to address the above limitations.

In conclusion, the R package ern aims to provide a relatively user-friendly environment to empower public health professionals with a tool to estimate the effective reproduction number Rt from clinical and wastewater-based data.

Computational details

The results in this paper were obtained using R version 4.3.1 with packages EpiEstim version 2.4, rjags version 4–14, and the software JAGS version 4.3.1. R itself and all packages used are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.

Supporting information

S1 File. Methodological differences when inferring daily incidence.

(PDF)

pone.0305550.s001.pdf (130.4KB, pdf)
S2 File. Bayesian model to infer daily clinical report count.

(PDF)

pone.0305550.s002.pdf (152.8KB, pdf)
S3 File. Linear interpolation to infer daily clinical report count.

(PDF)

pone.0305550.s003.pdf (170.8KB, pdf)
S4 File. Sensitivity analysis to selected parameters.

(PDF)

pone.0305550.s004.pdf (170.2KB, pdf)
S5 File. R code to perform sensitivitity analyses presented in S4 File.

(R)

pone.0305550.s005.R (7.8KB, R)
S6 File. R code to evaluate the computing time of selected R packages that estimate the effective reproduction number.

(R)

pone.0305550.s006.R (7.6KB, R)
S7 File. R code to associated with the methodological differences presented in S1 File.

(R)

Acknowledgments

We thank Shokoofeh Nourbakhsh for testing early versions of the package. We also thank the reviewers for their insightful comments that sparked improvements in this article and in the ern package presented here.

Data Availability

The R code for the package is currently available on the official R repository CRAN (https://CRAN.R-project.org/package=ern) and GitHub: https://github.com/phac-nml-phrsd/ern The data used in this manuscript is publicly available and attached to the package.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1991. [Google Scholar]
  • 2. Inglesby TV. Public health measures and the reproduction number of SARS-CoV-2. Jama. 2020;323(21):2186–2187. doi: 10.1001/jama.2020.7878 [DOI] [PubMed] [Google Scholar]
  • 3. Kirby AE, Walters MS, Jennings WC, Fugitt R, LaCross N, Mattioli M, et al. Using wastewater surveillance data to support the COVID-19 response—United States, 2020–2021. Morbidity and Mortality Weekly Report. 2021;70(36):1242. doi: 10.15585/mmwr.mm7036a2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Castiglioni S, Schiarea S, Pellegrinelli L, Primache V, Galli C, Bubba L, et al. SARS-CoV-2 RNA in urban wastewater samples to monitor the COVID-19 pandemic in Lombardy, Italy (March–June 2020). Science of The Total Environment. 2022;806:150816. doi: 10.1016/j.scitotenv.2021.150816 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Melnick JL, et al. Poliomyelitis Virus in Urban Sewage in Epidemic and in Nonepidemie Times. American journal of hygiene. 1947;45(2):240–53. [DOI] [PubMed] [Google Scholar]
  • 6. Crank K, Chen W, Bivins A, Lowry S, Bibby K. Contribution of SARS-CoV-2 RNA shedding routes to RNA loads in wastewater. Science of The Total Environment. 2022;806. doi: 10.1016/j.scitotenv.2021.150376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wölfel R, Corman VM, Guggemos W, Seilmaier M, Zange S, Müller MA, et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. doi: 10.1038/s41586-020-2196-x [DOI] [PubMed] [Google Scholar]
  • 8. Jones DL, Baluja MQ, Graham DW, Corbishley A, McDonald JE, Malham SK, et al. Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. Science of The Total Environment. 2020;749:141364. doi: 10.1016/j.scitotenv.2020.141364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. de Jonge EF, Peterse CM, Koelewijn JM, van der Drift AMR, van der Beek RF, Nagelkerke E, et al. The detection of monkeypox virus DNA in wastewater samples in the Netherlands. Science of The Total Environment. 2022;852:158265. doi: 10.1016/j.scitotenv.2022.158265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Mercier E, D’Aoust PM, Thakali O, Hegazy N, Jia JJ, Zhang Z, et al. Municipal and neighbourhood level wastewater surveillance and subtyping of an influenza virus outbreak. Scientific Reports. 2022;12(1):15777. doi: 10.1038/s41598-022-20076-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical biosciences. 2002;180(1-2):29–48. doi: 10.1016/S0025-5564(02)00108-6 [DOI] [PubMed] [Google Scholar]
  • 12. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American journal of epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, R t. PLoS computational biology. 2020;16(12):e1008409. doi: 10.1371/journal.pcbi.1008409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Alvarez L, Colom M, Morel JD, Morel JM. Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique. Proceedings of the National Academy of Sciences. 2021;118(50):e2105112118. doi: 10.1073/pnas.2105112118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Huisman JS, Scire J, Angst DC, Li J, Neher RA, Maathuis MH, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. eLife. 2022;11:e71345. doi: 10.7554/eLife.71345 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Huisman JS, Scire J, Caduff L, Fernandez-Cassi X, Ganesanandamoorthy P, Kull A, et al. Wastewater-based estimation of the effective reproductive number of SARS-CoV-2. Environmental health perspectives. 2022;130(5):057011. doi: 10.1289/EHP10050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. McMahan CS, Self S, Rennert L, Kalbaugh C, Kriebel D, Graves D, et al. COVID-19 wastewater epidemiology: a model to estimate infected populations. The Lancet Planetary Health. 2021;5(12):e874–e881. doi: 10.1016/S2542-5196(21)00230-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Fazli M, Sklar S, Porter MD, French BA, Shakeri H. Wastewater-Based Epidemiological Modeling for Continuous Surveillance of COVID-19 Outbreak. medRxiv. 2021. doi: 10.1101/2021.10.19.21265221 [DOI] [Google Scholar]
  • 19. Nourbakhsh S, Fazil A, Li M, Mangat CS, Peterson SW, Daigle J, et al. A wastewater-based epidemic model for SARS-CoV-2 with application to three Canadian cities. Epidemics. 2022;39:100560. doi: 10.1016/j.epidem.2022.100560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Jiang G, Wu J, Weidhaas J, Li X, Chen Y, Mueller J, et al. Artificial neural network-based estimation of COVID-19 case numbers and effective reproduction rate using wastewater-based epidemiology. Water research. 2022;218:118451. doi: 10.1016/j.watres.2022.118451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Amman F, Markt R, Endler L, Hupfauf S, Agerer B, Schedl A, et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nature Biotechnology. 2022;40(12):1814–1822. doi: 10.1038/s41587-022-01387-y [DOI] [PubMed] [Google Scholar]
  • 22.Lison A. adrian-lison/EpiSewer: EpiSewer 0.0.1; 2024. Available from: 10.5281/zenodo.10569102. [DOI]
  • 23. Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proceedings of the Royal Society B: Biological Sciences. 2015;282(1821):20152026. doi: 10.1098/rspb.2015.2026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Boëlle PY, Ansart S, Cori A, Valleron AJ. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review. Influenza and other respiratory viruses. 2011;5(5):306–316. doi: 10.1111/j.1750-2659.2011.00234.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. te Beest DE, Wallinga J, Donker T, van Boven M. Estimating the generation interval of influenza A (H1N1) in a range of social settings. Epidemiology. 2013; p. 244–250. doi: 10.1097/EDE.0b013e31827f50e8 [DOI] [PubMed] [Google Scholar]
  • 26.Cori A. EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves; 2023. Available from: https://github.com/mrc-ide/EpiEstim.
  • 27. Nash RK, Bhatt S, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. PLOS Computational Biology. 2023;19(8):e1011439. doi: 10.1371/journal.pcbi.1011439 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sam Abbott, Joel Hellewell, Katharine Sherratt, Katelyn Gostic, Joe Hickson, Hamada S Badr, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020.
  • 29.Scott JA, Gandy A, Mishra S, Unwin J, Flaxman S, Bhatt S. epidemia: Modeling of Epidemics using Hierarchical Bayesian Models; 2020. Available from: https://imperialcollegelondon.github.io/epidemia/.
  • 30.Stan Development Team. Stan Modeling Language Users Guide and Reference Manual; 2024. Available from: https://mc-stan.org/.
  • 31. Scire J, Huisman JS, Grosu A, Angst DC, Lison A, Li J, et al. estimateR: an R package to estimate and monitor the effective reproductive number. BMC bioinformatics. 2023;24(1):310. doi: 10.1186/s12859-023-05428-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Cleveland WS. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association. 1979;74(368):829–836. doi: 10.1080/01621459.1979.10481038 [DOI] [Google Scholar]
  • 33. Richardson WH. Bayesian-based iterative method of image restoration. JoSA. 1972;62(1):55–59. doi: 10.1364/JOSA.62.000055 [DOI] [Google Scholar]
  • 34. Lucy LB. An iterative technique for the rectification of observed distributions. The astronomical journal. 1974;79:745. doi: 10.1086/111605 [DOI] [Google Scholar]
  • 35. Goldstein E, Dushoff J, Ma J, Plotkin JB, Earn DJ, Lipsitch M. Reconstructing influenza incidence by deconvolution of daily mortality time series. Proceedings of the National Academy of Sciences. 2009;106(51):21825–21829. doi: 10.1073/pnas.0902958106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london Series A, Containing papers of a mathematical and physical character. 1927;115(772):700–721. [Google Scholar]
  • 37. Champredon D, Dushoff J, Earn DJ. Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation. SIAM Journal on Applied Mathematics. 2018;78(6):3258–3278. doi: 10.1137/18M1186411 [DOI] [Google Scholar]
  • 38.Government of Canada. COVID-19 wastewater monitoring dashboard: Viral signal trend; 2023. https://health-infobase.canada.ca/src/data/covidLive/wastewater/covid19-wastewater.csv.
  • 39.Government of Canada. COVID-19 epidemiology update: cases and deaths data; 2023. https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv.
  • 40. Challen R, Brooks-Pollock E, Tsaneva-Atanasova K, Danon L. Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number. Statistical Methods in Medical Research. 2022;31(9):1686–1703. doi: 10.1177/09622802211065159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Statistics Canada. Table 17-10-0009-01 Population estimates, quarterly; 2023. 10.25318/1710000901-eng. [DOI]
  • 42. Gelman A, Rubin DB. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992;7(4):457–472. doi: 10.1214/ss/1177011136 [DOI] [Google Scholar]
  • 43.Plummer M. rjags: Bayesian Graphical Models using MCMC; 2023. Available from: https://CRAN.R-project.org/package=rjags.
  • 44. Arts PJ, Kelly JD, Midgley CM, Anglin K, Lu S, Abedi GR, et al. Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. mSphere. 2023;8(4):e00132–23. doi: 10.1128/msphere.00132-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Uncertainties in estimating SARS-CoV-2 prevalence by wastewater-based epidemiology. Chemical Engineering Journal. 2021;415:129039. doi: 10.1016/j.cej.2021.129039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Wade MJ, Lo Jacomo A, Armenise E, Brown MR, Bunce JT, Cameron GJ, et al. Understanding and managing uncertainty and variability for wastewater monitoring beyond the pandemic: Lessons learned from the United Kingdom national COVID-19 surveillance programmes. Journal of Hazardous Materials. 2022;424:127456. doi: 10.1016/j.jhazmat.2021.127456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. McCall C, Fang ZN, Li D, Czubai AJ, Juan A, LaTurner ZW, et al. Modeling SARS-CoV-2 RNA degradation in small and large sewersheds. Environ Sci: Water Res Technol. 2022;8:290–300. doi: 10.1039/D1EW00717C [DOI] [Google Scholar]

Decision Letter 0

Salim Heddam

31 Jan 2024

PONE-D-24-01907ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance dataPLOS ONE

Dear Dr. Champredon,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 16 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We are unable to open your Supporting Information file [suppl_S1_interpolation_impact.R]. Please kindly revise as necessary and re-upload.

Additional Editor Comments:

Reviewer 1#:

please abstract must be more clear

material and methods not Models and software 93

discussion must be wrirten more advanced refernce with more knowedge about your artcle

conculsion must be written with detail

Reviewer 2#:

The authors present the R package ern for the estimation of the effective reproduction number from wastewater or aggregated clinical surveillance data.

The package provides a framework for an efficient and quicker estimation of effective reproduction number using a user-friendly interface.

The manuscript is well-written and makes a relevant contribution to the field.

I thoroughly enjoyed reviewing this manuscript and only have some minor requests for revision, as follows:

Lines 39 to 42: Do not start a sentence using a citation number. In Line 39, you may write, "Huisman et al. [14] proposed a method....". Do the same for lines 40, 41 and 42.

Line 261: "...for the for the ...". Delete the repetition.

Reviewer 3#:

1- It would have been better to talk about the Rt factor, which was mentioned in the research, in numerical terms, with something simple in the abstract

2-It was possible to dispense with some paragraphs in figure or table in the introduction

3-The researcher did not provide a research review of references that address the same topic, even in a simple way

4- It is possible to clarify the work algorithm in the form of clear points or in an algorithmic form, on the basis of which the steps of the example are clarified

5- The discussion was narrative and did not clarify any future idea or plan of action for researchers working in the future in the same field and what difficulties they may face.

Reviewer 4#:

1. Describe dataset features in more details and its total size and size of (train/test) as a table.

2. Pseudocode / Flowchart and algorithm steps need to be inserted.

3. Time spent need to be measured in the experimental results.

4. Limitation and Discussion Sections need to be inserted.

5. The parameters used for the analysis must be provided in table

6. The architecture of the proposed model must be provided

7. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

8. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

9. Add future work in last section (conclusion) (if any)

10. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Optimizing epileptic seizure recognition performance with feature scaling and dropout layers

b) Optimizing classification of diseases through language model analysis of symptoms

c) Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction

d) Utilizing convolutional neural networks to classify monkeypox skin lesions

e) Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt

Reviewer 5#:

I would like to thank the authors for putting together the R package ern and for submitting this accompanying manuscript. I agree with the assessment that there is a need for user-friendly statistical software to estimate reproduction numbers from wastewater concentration measurements. This manuscript explains the motivation for developing the package ern, i.e. to provide a dedicated interface to estimate Rt from wastewater and clinical case data. It then describes the statistical approach used and presents a vignette-style illustration of the main functionalities of the package. The authors also propose a new approach to disaggregate non-daily case counts for subsequent Rt estimation.

I have read the manuscript in detail and I have tested the package both using the example data from Canada as provided by the authors, and using wastewater data from Switzerland. I tried to structure my review into manuscript-related and package/method-related major comments, plus a collection of minor comments.

For transparency, I am a (co)author of two packages mentioned in my review, i.e. the package "estimateR" and the package "EpiSewer".

Kind regards

Adrian Lison

## Major points manuscript

First let me say that I found the manuscript clean and well-written.

### Related work

I think the manuscript can provide more details on what is methodologically novel and what not. You mention that the method implemented is similar to the method by Huisman et al., but aside from the approach to disaggregate non-daily case data, it seems at first glance to be EXACTLY the method by Huisman et al., i.e. as detailed in https://doi.org/10.7554/eLife.71345 for case data and in https://doi.org/10.1289/EHP10050 for wastewater data (LOESS smoothing, deconvolution using Richardson-Lucy algorithm, scaling, R estimation using EpiEstim, uncertainty quantification via resampling). I am raising this not only because of attribution but because it is important to clearly describe the differences and similarities between related methods such that readers can compare them properly.

Furthermore, there are several related works that are worth mentioning in my opinion.

First, the method by Huisman et al. has also been implemented as an open source R package (https://doi.org/10.1186/s12859-023-05428-4) "estimateR", and can similarly be used to estimate Rt from wastewater and case data (see e.g. https://ibz-shiny.ethz.ch/wastewaterRe/). Compared to the package ern, the interface and plotting functionality of estimateR are not explicitly tailored to wastewater data, therefore I think that ern is more user-friendly for this domain. Also, estimateR offers no option to disaggregate non-daily case counts. Aside from that, I think that estimateR and ern are highly similar since they are based on almost the same method.

Aside from ern, there are also other R packages for modeling wastewater data, including the package EpiSewer (https://zenodo.org/doi/10.5281/zenodo.10569101), your own package wem (https://github.com/phac-nml-phrsd/wem/tree/main), and the Covid19 Wastewater Analysis Package (https://github.com/UW-Madison-DSI/Covid19Wastewater/tree/main), although the latter does not produce Rt estimates. I do not think a detailed comparison or benchmarking of these packages is necessary, but a short discussion of their differences would be useful. I believe it is valuable to give potential users an overview over available options.

### Sensitivity analyses

I liked the illustrations of the package, but what I missed are some sensitivity analyses that give users an idea of what behavior to expect in different situations and what limitations the method might have. Some interesting analyses that I can think of would for example be about the smoothing (what role do the hyperparameters play, how do moving average and LOESS compare, how does the interpolation deal with larger gaps of data), or what happens if the fecal shedding distribution is misspecified, or how the method performs when concentrations are low. I know that such kinds of analyses require some work, but I think having some sensitivity analyses would add much value to the paper. You can then also point to these analyses from the package documentation.

### Limitations

Lines 22f: I agree that wastewater has several advantages over clinical data and does not have the same biases, but it would be good to shortly mention also potential biases of wastewater. In particular I am worried that the statement "Fecal shedding occurs passively and irrespective of the symptomatic status of the infected individual" could be misunderstood by readers. While it is true that asymptomatic patients also seem to shed into wastewater, there is also large variation in shedding loads and distributions between patients and it is not yet clear what the main factors are (see e.g. https://doi.org/10.1016/j.scitotenv.2020.141364 and https://doi.org/10.1128/msphere.00132-23). Wastewater concentrations could also be to a large part be driven by "supershedders" and we don't know how representative this subgroup is of the overall population. Other sources of bias worth mentioning are changing populations in the catchments and environmental factors like rainfall.

Scaling factor: Can you give some more details on how you would choose this in practice, and what the implications of a potential misspecification are (for example that a constant factor of misspecification will strongly bias incidence estimates but at least not bias Rt except in certain edge cases).

For the above reason, I would stress in the manuscript (and also in the package documentation) that the inferred incidence directly depends on the hard-to-estimate scaling factor and must therefore be carefully interpreted. Otherwise there is a risk that people will use this to estimate prevalence etc.

I suggest to add unit information to the concentration and scaling factor. For example, what are the typical units of the exemplary data in ww.input?

### Disaggregation

The proposed method for disaggregation of non-daily cases and the comparison with the method by Nash et al. is quite interesting. Based on your illustration, it seems like the method by Nash et al. could have important limitations not found in the original study by Nash et al.. Since this would be a strong result, can you provide

more details, e.g. which sliding window was used in the example? Also in Figure 5, can you also plot Rt estimated from the daily incidence time series?

In your model for inferring daily case counts, you do not seem to account for potential changes in transmission other than due to susceptible depletion. What happens if you fit this over longer time periods with multiple waves or time series with strong non-pharmaceutical interventions?

Lines 379f: I think this is a rather problematic approach - drawing not enough posterior samples and then applying smoothing to improve the irregular posterior. I think this can easily lead to an unrepresentative posterior and also distort the uncertainty estimates.

In the introduction, you mention long runtimes of epidemia and EpiNow2 as disadvantages to overcome, but the disaggregation of ern also requires MCMC sampling. How do the runtimes compare on non-daily data, is ern still considerably faster?

## Major points package / method

The package ern currently only seems to offer a fixed scaling factor. Explicitly supporting a time-varying scaling factor to account for flow would be great. Scaling concentrations by daily flow volumes at the treatment plant can be quite important in my experience because there can be a strong effect of dilution of the viral particles by rainfall etc. on the measured concentration.

Lines 173: What output of EpiEstim is used for Rt? Do you use the estimated mean of Rt? Or do you draw samples from the posterior Gamma distribution estimated by EpiEstim?

I noticed that the Rt estimates provided by ern do not have higher uncertainty towards the present, although this should definitely be the case (Rt of today cannot be estimated with the same accuracy as Rt of last week because of delays in fecal shedding / reporting). Do you account for uncertainty in the deconvolution step?

I think it is a great feature that ern also supports uncertain distributions. One question I had is if you could also support correlated parameters. At the moment, the mean and sd of a distribution get drawn independently, but they may be correlated in reality. Another scenario that may be quite realistic is that users have several different distributions / parameters from the literature. In this case you could allow users to provide a list of distributions from which to draw with replacement. These are just suggestions, not requests for this paper.

I am still a bit unsure about the default distributions provided in the package. On the one hand, this is a practical feature, but on the other hand: how are you planning to maintain/update this epidemiological data? They seem to be hard-coded in the package, so if newer distributions become available or old distributions are invalidated by new research, users will only get the updates if they install a newer version of the package. It would be good to comment on this.

I am particularly skeptical of providing default reporting delay distributions or reporting proportions as they will differ a lot between surveillance systems / countries etc.

I currently find it hard to define custom distributions with my own parameters. Functions like def_dist_incubation_period only accept the name of a pathogen as input. I know I can also construct a list with custom parameters myself, but a constructor function with the relevant parameters would be helpful.

There is an argument for subtypes/variants in def_dist_fecal_shedding, but it does not seem to do anything, I always get back the same distribution.

Also, def_dist_reporting_fraction does not accept a custom value and assumes a uniform distribution between 0.1 and 0.3 per default. This seems quite arbitrary to me.

I like the diagnostic plots, but having the option to produce individual plots for concentration, incidence, Rt etc. would be valuable. They are currently all merged into one plot via the patchwork package, making it hard to customize individual plots.

## Minor points

Line 5: "It differs from the basic reproduction number, R0, in that it takes into account the level of susceptibility in the population at a given point in time." Maybe say more generally that it accounts for "changes in transmission" - this may be due to changing susceptibility or other factors like different contact patterns, infection control measures etc.

Lines 21f: I would also mention digital droplet PCR as an alternative quantification method. Also, maybe shortly mention that viral RNA is first extracted from the wastewater sample using various laboratory methods.

Line 78: EpiNow2 and epidemia do not use the rstan package, they use cmdstanr package. I would just write "stan".

Lines 113: This is an important point and well explained.

Lines 166f: This paragraph felt a bit informal and was difficult to understand. Is your main point that it is important to estimate Rt using the number of cases by time of infection, not by time of report / time of sample?

I personally don't find the name of the function ern::ww.input very intuitive, I would not have expected that this returns example data! Also, can you provide more details in the function documentation, e.g. what the column "pt" means and what the unit of the concentration is in this example?

Lines 137f, reporting delay: I would say more clearly that this is the delay between symptom onset and case report.

As a suggestion, you could register the current version of the package in an online archive like zenodo. This will give you a DOI for the package, which can then be referenced in the manuscript.

## References

1. Huisman, J. S. _et al._ Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. _eLife_ **11**, e71345 (2022).

2. Huisman, J. S. _et al._ Wastewater-Based Estimation of the Effective Reproductive Number of SARS-CoV-2. _Environmental Health Perspectives_ **130**, 057011 (2022).

3. Scire, J. _et al._ estimateR: an R package to estimate and monitor the effective reproductive number. _BMC Bioinformatics_ **24**, 310 (2023).

4. Lison, A. EpiSewer: Estimate Reproduction Numbers from Wastewater Measurements. Zenodo (2024).

5. Jones, D. L. _et al._ Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. _Science of The Total Environment_ **749**, 141364 (2020).

6. Arts, P. J. _et al._ Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. _mSphere_ **8**, e00132-23 (2023).

7. Nash, R. K., Bhatt, S., Cori, A. & Nouvellet, P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. _PLOS Computational Biology_ **19**, e1011439 (2023).

Reviewer 6#:

(See attached PDF for a full evaluation.)

The authors presented an R software package, which implements statistical methods to estimate the actual number of new infections using the number of reported cases or the wastewater data. It is important to note that the package allows the input data to be sampled by a period higher then one day (e.g., aggregated weekly data is also acceptable). Still, the output is a daily time series, which allows to estimate the effective reproduction number using the already existing tool, EpiEstim. To estimate the hidden time-series (i.e., the unknown input) from the measured output, the Authors applied a deconvolution using an existing Richardson-Lucy implementation. As far as I know, this technique is equivalent to a dynamic inversion, which was already used to infer the effective reproduction number.

Although I cannot detect any scientific contribution in this manuscript, the ``attached'' R package may be useful for a certain community (e.g., public health practitioners), and the software description in the main body is clear and didactic.

The manuscript has therefore a raison d'être, possibly not in such a high impact journal (but the Editor is the final judge on that).

Anyway, the Authors need to be better justify why their software tool is preferable or more convenient compared to other existing packages.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

Reviewer #5: Yes

Reviewer #6: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: please abstract must be more clear

material and methods not Models and software 93

discussion must be wrirten more advanced refernce with more knowedge about your artcle

conculsion must be written with detail

Reviewer #2: The authors present the R package ern for the estimation of the effective reproduction number from wastewater or aggregated clinical surveillance data.

The package provides a framework for an efficient and quicker estimation of effective reproduction number using a user-friendly interface.

The manuscript is well-written and makes a relevant contribution to the field.

I thoroughly enjoyed reviewing this manuscript and only have some minor requests for revision, as follows:

Lines 39 to 42: Do not start a sentence using a citation number. In Line 39, you may write, "Huisman et al. [14] proposed a method....". Do the same for lines 40, 41 and 42.

Line 261: "...for the for the ...". Delete the repetition.

Reviewer #3: 1- It would have been better to talk about the Rt factor, which was mentioned in the research, in numerical terms, with something simple in the abstract

2-It was possible to dispense with some paragraphs in figure or table in the introduction

3-The researcher did not provide a research review of references that address the same topic, even in a simple way

4- It is possible to clarify the work algorithm in the form of clear points or in an algorithmic form, on the basis of which the steps of the example are clarified

5- The discussion was narrative and did not clarify any future idea or plan of action for researchers working in the future in the same field and what difficulties they may face.

Reviewer #4: 1. Describe dataset features in more details and its total size and size of (train/test) as a table.

2. Pseudocode / Flowchart and algorithm steps need to be inserted.

3. Time spent need to be measured in the experimental results.

4. Limitation and Discussion Sections need to be inserted.

5. The parameters used for the analysis must be provided in table

6. The architecture of the proposed model must be provided

7. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

8. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

9. Add future work in last section (conclusion) (if any)

10. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Optimizing epileptic seizure recognition performance with feature scaling and dropout layers

b) Optimizing classification of diseases through language model analysis of symptoms

c) Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction

d) Utilizing convolutional neural networks to classify monkeypox skin lesions

e) Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt

Reviewer #5: I would like to thank the authors for putting together the R package ern and for submitting this accompanying manuscript. I agree with the assessment that there is a need for user-friendly statistical software to estimate reproduction numbers from wastewater concentration measurements. This manuscript explains the motivation for developing the package ern, i.e. to provide a dedicated interface to estimate Rt from wastewater and clinical case data. It then describes the statistical approach used and presents a vignette-style illustration of the main functionalities of the package. The authors also propose a new approach to disaggregate non-daily case counts for subsequent Rt estimation.

I have read the manuscript in detail and I have tested the package both using the example data from Canada as provided by the authors, and using wastewater data from Switzerland. I tried to structure my review into manuscript-related and package/method-related major comments, plus a collection of minor comments.

For transparency, I am a (co)author of two packages mentioned in my review, i.e. the package "estimateR" and the package "EpiSewer".

Kind regards

Adrian Lison

## Major points manuscript

First let me say that I found the manuscript clean and well-written.

### Related work

I think the manuscript can provide more details on what is methodologically novel and what not. You mention that the method implemented is similar to the method by Huisman et al., but aside from the approach to disaggregate non-daily case data, it seems at first glance to be EXACTLY the method by Huisman et al., i.e. as detailed in https://doi.org/10.7554/eLife.71345 for case data and in https://doi.org/10.1289/EHP10050 for wastewater data (LOESS smoothing, deconvolution using Richardson-Lucy algorithm, scaling, R estimation using EpiEstim, uncertainty quantification via resampling). I am raising this not only because of attribution but because it is important to clearly describe the differences and similarities between related methods such that readers can compare them properly.

Furthermore, there are several related works that are worth mentioning in my opinion.

First, the method by Huisman et al. has also been implemented as an open source R package (https://doi.org/10.1186/s12859-023-05428-4) "estimateR", and can similarly be used to estimate Rt from wastewater and case data (see e.g. https://ibz-shiny.ethz.ch/wastewaterRe/). Compared to the package ern, the interface and plotting functionality of estimateR are not explicitly tailored to wastewater data, therefore I think that ern is more user-friendly for this domain. Also, estimateR offers no option to disaggregate non-daily case counts. Aside from that, I think that estimateR and ern are highly similar since they are based on almost the same method.

Aside from ern, there are also other R packages for modeling wastewater data, including the package EpiSewer (https://zenodo.org/doi/10.5281/zenodo.10569101), your own package wem (https://github.com/phac-nml-phrsd/wem/tree/main), and the Covid19 Wastewater Analysis Package (https://github.com/UW-Madison-DSI/Covid19Wastewater/tree/main), although the latter does not produce Rt estimates. I do not think a detailed comparison or benchmarking of these packages is necessary, but a short discussion of their differences would be useful. I believe it is valuable to give potential users an overview over available options.

### Sensitivity analyses

I liked the illustrations of the package, but what I missed are some sensitivity analyses that give users an idea of what behavior to expect in different situations and what limitations the method might have. Some interesting analyses that I can think of would for example be about the smoothing (what role do the hyperparameters play, how do moving average and LOESS compare, how does the interpolation deal with larger gaps of data), or what happens if the fecal shedding distribution is misspecified, or how the method performs when concentrations are low. I know that such kinds of analyses require some work, but I think having some sensitivity analyses would add much value to the paper. You can then also point to these analyses from the package documentation.

### Limitations

Lines 22f: I agree that wastewater has several advantages over clinical data and does not have the same biases, but it would be good to shortly mention also potential biases of wastewater. In particular I am worried that the statement "Fecal shedding occurs passively and irrespective of the symptomatic status of the infected individual" could be misunderstood by readers. While it is true that asymptomatic patients also seem to shed into wastewater, there is also large variation in shedding loads and distributions between patients and it is not yet clear what the main factors are (see e.g. https://doi.org/10.1016/j.scitotenv.2020.141364 and https://doi.org/10.1128/msphere.00132-23). Wastewater concentrations could also be to a large part be driven by "supershedders" and we don't know how representative this subgroup is of the overall population. Other sources of bias worth mentioning are changing populations in the catchments and environmental factors like rainfall.

Scaling factor: Can you give some more details on how you would choose this in practice, and what the implications of a potential misspecification are (for example that a constant factor of misspecification will strongly bias incidence estimates but at least not bias Rt except in certain edge cases).

For the above reason, I would stress in the manuscript (and also in the package documentation) that the inferred incidence directly depends on the hard-to-estimate scaling factor and must therefore be carefully interpreted. Otherwise there is a risk that people will use this to estimate prevalence etc.

I suggest to add unit information to the concentration and scaling factor. For example, what are the typical units of the exemplary data in ww.input?

### Disaggregation

The proposed method for disaggregation of non-daily cases and the comparison with the method by Nash et al. is quite interesting. Based on your illustration, it seems like the method by Nash et al. could have important limitations not found in the original study by Nash et al.. Since this would be a strong result, can you provide

more details, e.g. which sliding window was used in the example? Also in Figure 5, can you also plot Rt estimated from the daily incidence time series?

In your model for inferring daily case counts, you do not seem to account for potential changes in transmission other than due to susceptible depletion. What happens if you fit this over longer time periods with multiple waves or time series with strong non-pharmaceutical interventions?

Lines 379f: I think this is a rather problematic approach - drawing not enough posterior samples and then applying smoothing to improve the irregular posterior. I think this can easily lead to an unrepresentative posterior and also distort the uncertainty estimates.

In the introduction, you mention long runtimes of epidemia and EpiNow2 as disadvantages to overcome, but the disaggregation of ern also requires MCMC sampling. How do the runtimes compare on non-daily data, is ern still considerably faster?

## Major points package / method

The package ern currently only seems to offer a fixed scaling factor. Explicitly supporting a time-varying scaling factor to account for flow would be great. Scaling concentrations by daily flow volumes at the treatment plant can be quite important in my experience because there can be a strong effect of dilution of the viral particles by rainfall etc. on the measured concentration.

Lines 173: What output of EpiEstim is used for Rt? Do you use the estimated mean of Rt? Or do you draw samples from the posterior Gamma distribution estimated by EpiEstim?

I noticed that the Rt estimates provided by ern do not have higher uncertainty towards the present, although this should definitely be the case (Rt of today cannot be estimated with the same accuracy as Rt of last week because of delays in fecal shedding / reporting). Do you account for uncertainty in the deconvolution step?

I think it is a great feature that ern also supports uncertain distributions. One question I had is if you could also support correlated parameters. At the moment, the mean and sd of a distribution get drawn independently, but they may be correlated in reality. Another scenario that may be quite realistic is that users have several different distributions / parameters from the literature. In this case you could allow users to provide a list of distributions from which to draw with replacement. These are just suggestions, not requests for this paper.

I am still a bit unsure about the default distributions provided in the package. On the one hand, this is a practical feature, but on the other hand: how are you planning to maintain/update this epidemiological data? They seem to be hard-coded in the package, so if newer distributions become available or old distributions are invalidated by new research, users will only get the updates if they install a newer version of the package. It would be good to comment on this.

I am particularly skeptical of providing default reporting delay distributions or reporting proportions as they will differ a lot between surveillance systems / countries etc.

I currently find it hard to define custom distributions with my own parameters. Functions like def_dist_incubation_period only accept the name of a pathogen as input. I know I can also construct a list with custom parameters myself, but a constructor function with the relevant parameters would be helpful.

There is an argument for subtypes/variants in def_dist_fecal_shedding, but it does not seem to do anything, I always get back the same distribution.

Also, def_dist_reporting_fraction does not accept a custom value and assumes a uniform distribution between 0.1 and 0.3 per default. This seems quite arbitrary to me.

I like the diagnostic plots, but having the option to produce individual plots for concentration, incidence, Rt etc. would be valuable. They are currently all merged into one plot via the patchwork package, making it hard to customize individual plots.

## Minor points

Line 5: "It differs from the basic reproduction number, R0, in that it takes into account the level of susceptibility in the population at a given point in time." Maybe say more generally that it accounts for "changes in transmission" - this may be due to changing susceptibility or other factors like different contact patterns, infection control measures etc.

Lines 21f: I would also mention digital droplet PCR as an alternative quantification method. Also, maybe shortly mention that viral RNA is first extracted from the wastewater sample using various laboratory methods.

Line 78: EpiNow2 and epidemia do not use the rstan package, they use cmdstanr package. I would just write "stan".

Lines 113: This is an important point and well explained.

Lines 166f: This paragraph felt a bit informal and was difficult to understand. Is your main point that it is important to estimate Rt using the number of cases by time of infection, not by time of report / time of sample?

I personally don't find the name of the function ern::ww.input very intuitive, I would not have expected that this returns example data! Also, can you provide more details in the function documentation, e.g. what the column "pt" means and what the unit of the concentration is in this example?

Lines 137f, reporting delay: I would say more clearly that this is the delay between symptom onset and case report.

As a suggestion, you could register the current version of the package in an online archive like zenodo. This will give you a DOI for the package, which can then be referenced in the manuscript.

## References

1. Huisman, J. S. _et al._ Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. _eLife_ **11**, e71345 (2022).

2. Huisman, J. S. _et al._ Wastewater-Based Estimation of the Effective Reproductive Number of SARS-CoV-2. _Environmental Health Perspectives_ **130**, 057011 (2022).

3. Scire, J. _et al._ estimateR: an R package to estimate and monitor the effective reproductive number. _BMC Bioinformatics_ **24**, 310 (2023).

4. Lison, A. EpiSewer: Estimate Reproduction Numbers from Wastewater Measurements. Zenodo (2024).

5. Jones, D. L. _et al._ Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. _Science of The Total Environment_ **749**, 141364 (2020).

6. Arts, P. J. _et al._ Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. _mSphere_ **8**, e00132-23 (2023).

7. Nash, R. K., Bhatt, S., Cori, A. & Nouvellet, P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. _PLOS Computational Biology_ **19**, e1011439 (2023).

Reviewer #6: (See attached PDF for a full evaluation.)

The authors presented an R software package, which implements statistical methods to estimate the actual number of new infections using the number of reported cases or the wastewater data. It is important to note that the package allows the input data to be sampled by a period higher then one day (e.g., aggregated weekly data is also acceptable). Still, the output is a daily time series, which allows to estimate the effective reproduction number using the already existing tool, EpiEstim. To estimate the hidden time-series (i.e., the unknown input) from the measured output, the Authors applied a deconvolution using an existing Richardson-Lucy implementation. As far as I know, this technique is equivalent to a dynamic inversion, which was already used to infer the effective reproduction number.

Although I cannot detect any scientific contribution in this manuscript, the ``attached'' R package may be useful for a certain community (e.g., public health practitioners), and the software description in the main body is clear and didactic.

The manuscript has therefore a raison d'être, possibly not in such a high impact journal (but the Editor is the final judge on that).

Anyway, the Authors need to be better justify why their software tool is preferable or more convenient compared to other existing packages.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: rewan abdelaziz

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Tarek Abd El-Hafeez

Reviewer #5: Yes: Adrian Lison

Reviewer #6: Yes: Péter Polcz (Pázmány Péter Catholic University Faculty of Information Technology and Bionics)

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review.pdf

pone.0305550.s008.pdf (197KB, pdf)

Decision Letter 1

Salim Heddam

10 May 2024

PONE-D-24-01907R1ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance dataPLOS ONE

Dear Dr. Champredon,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 24 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Reviewer 5#:Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

First, thank you for answering my question with regard to accounting for uncertainty of the Rt estimates towards the present. I understand the current deconvolution method is not able to account for this uncertainty. But I think it is very important to highlight this limitation more strongly in the manuscript, i.e. in the discussion, but also to make users aware of this in the package documentation etc. Otherwise, the use of the package could lead to wrong conclusions when using it for real-time surveillance. The uncertainty towards the present because of partial information is an inherent characteristic of the data and means that we can only get accurate Rt estimates with a certain delay. If this is not reflected in the uncertainty information provided by the method, users should be made aware of this limitation.

Second, you describe that to pool the uncertainty of the Rt estimates, you get the mean and quantiles for each realization and then compute means or quantiles of these across realizations. This seems rather approximate to me, and I wonder why you don't just draw e.g. 100 Rt samples from the posterior Gamma distribution as described by Cori et al. (you can use the estimated mean and cv from EpiEstim) for each realization, then combine all the draws across your realizations and compute the mean, empirical quantiles etc. on this pooled posterior sample. This should also be very fast and more accurate.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

Reviewer #5: (No Response)

Reviewer #6: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

Reviewer #5: Yes

Reviewer #6: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thanks for your reply

Reviewer #2: (No Response)

Reviewer #3: (No Response)

Reviewer #4: An updated manuscript addressing previous comments and suggestions was evaluated positively. The updated submission demonstrates significant improvement and provides valuable insights relevant to the research community.

Reviewer #5: Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

First, thank you for answering my question with regard to accounting for uncertainty of the Rt estimates towards the present. I understand the current deconvolution method is not able to account for this uncertainty. But I think it is very important to highlight this limitation more strongly in the manuscript, i.e. in the discussion, but also to make users aware of this in the package documentation etc. Otherwise, the use of the package could lead to wrong conclusions when using it for real-time surveillance. The uncertainty towards the present because of partial information is an inherent characteristic of the data and means that we can only get accurate Rt estimates with a certain delay. If this is not reflected in the uncertainty information provided by the method, users should be made aware of this limitation.

Second, you describe that to pool the uncertainty of the Rt estimates, you get the mean and quantiles for each realization and then compute means or quantiles of these across realizations. This seems rather approximate to me, and I wonder why you don't just draw e.g. 100 Rt samples from the posterior Gamma distribution as described by Cori et al. (you can use the estimated mean and cv from EpiEstim) for each realization, then combine all the draws across your realizations and compute the mean, empirical quantiles etc. on this pooled posterior sample. This should also be very fast and more accurate.

Reviewer #6: Upon perusing the Authors' response letter, I have become convinced of the manuscript's significant contribution to the scientific community. In the revised manuscript, the authors have carefully outlined the discipline to which they wish to contribute and how. Table 1 is very useful and informative.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Tarek Abd El-Hafeez

Reviewer #5: Yes: Adrian Lison

Reviewer #6: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jun 21;19(6):e0305550. doi: 10.1371/journal.pone.0305550.r004

Author response to Decision Letter 1


28 May 2024

Response to Reviewers – Round 2

We would like to thank the reviewers for their time to read and comment on our revised manuscript.

In response to Reviewer #5’s comments:

Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

First, thank you for answering my question with regard to accounting for uncertainty of the Rt estimates towards the present. I understand the current deconvolution method is not able to account for this uncertainty. But I think it is very important to highlight this limitation more strongly in the manuscript, i.e. in the discussion, but also to make users aware of this in the package documentation etc. Otherwise, the use of the package could lead to wrong conclusions when using it for real-time surveillance. The uncertainty towards the present because of partial information is an inherent characteristic of the data and means that we can only get accurate Rt estimates with a certain delay. If this is not reflected in the uncertainty information provided by the method, users should be made aware of this limitation.

Response: Thank you for this suggestion. We have added a paragraph in the Discussion section to reflect this limitation.

Second, you describe that to pool the uncertainty of the Rt estimates, you get the mean and quantiles for each realization and then compute means or quantiles of these across realizations. This seems rather approximate to me, and I wonder why you don't just draw e.g. 100 Rt samples from the posterior Gamma distribution as described by Cori et al. (you can use the estimated mean and cv from EpiEstim) for each realization, then combine all the draws across your realizations and compute the mean, empirical quantiles etc. on this pooled posterior sample. This should also be very fast and more accurate.

Response: Thank you very much for this suggestion. Indeed, the way this was implemented was not statistically correct (we were not aware of the EpiEstim function “sample_posterior_R”). It is now implemented as suggested (link). A few tests showed that the numerical difference between both implementations was very small. The main text has been edited to reflect this change.

Attachment

Submitted filename: Response to Reviewers - round 2.docx

pone.0305550.s010.docx (13.6KB, docx)

Decision Letter 2

Salim Heddam

2 Jun 2024

ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance data

PONE-D-24-01907R2

Dear Dr. Champredon

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #5: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #5: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #5: Yes: Adrian Lison

**********

Acceptance letter

Salim Heddam

7 Jun 2024

PONE-D-24-01907R2

PLOS ONE

Dear Dr. Champredon,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Salim Heddam

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Methodological differences when inferring daily incidence.

    (PDF)

    pone.0305550.s001.pdf (130.4KB, pdf)
    S2 File. Bayesian model to infer daily clinical report count.

    (PDF)

    pone.0305550.s002.pdf (152.8KB, pdf)
    S3 File. Linear interpolation to infer daily clinical report count.

    (PDF)

    pone.0305550.s003.pdf (170.8KB, pdf)
    S4 File. Sensitivity analysis to selected parameters.

    (PDF)

    pone.0305550.s004.pdf (170.2KB, pdf)
    S5 File. R code to perform sensitivitity analyses presented in S4 File.

    (R)

    pone.0305550.s005.R (7.8KB, R)
    S6 File. R code to evaluate the computing time of selected R packages that estimate the effective reproduction number.

    (R)

    pone.0305550.s006.R (7.6KB, R)
    S7 File. R code to associated with the methodological differences presented in S1 File.

    (R)

    Attachment

    Submitted filename: Review.pdf

    pone.0305550.s008.pdf (197KB, pdf)
    Attachment

    Submitted filename: PLOS-responses-to-reviewers.pdf

    pone.0305550.s009.pdf (386.1KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers - round 2.docx

    pone.0305550.s010.docx (13.6KB, docx)

    Data Availability Statement

    The R code for the package is currently available on the official R repository CRAN (https://CRAN.R-project.org/package=ern) and GitHub: https://github.com/phac-nml-phrsd/ern The data used in this manuscript is publicly available and attached to the package.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES