ern: An R package to estimate the effective reproduction number using clinical and wastewater surveillance data

David Champredon; Irena Papst; Warsame Yusuf

doi:10.1371/journal.pone.0305550

. 2024 Jun 21;19(6):e0305550. doi: 10.1371/journal.pone.0305550

`ern`: An $R$ package to estimate the effective reproduction number using clinical and wastewater surveillance data

David Champredon ^1,^*, Irena Papst ¹, Warsame Yusuf ¹

Editor: Salim Heddam²

PMCID: PMC11192340 PMID: 38905266

Abstract

The effective reproduction number, $R_{t}$ , is an important epidemiological metric used to assess the state of an epidemic, as well as the effectiveness of public health interventions undertaken in response. When $R_{t}$ is above one, it indicates that new infections are increasing, and thus the epidemic is growing, while an $R_{t}$ is below one indicates that new infections are decreasing, and so the epidemic is under control. There are several established software packages that are readily available to statistically estimate $R_{t}$ using clinical surveillance data. However, there are comparatively few accessible tools for estimating $R_{t}$ from pathogen wastewater concentration, a surveillance data stream that cemented its utility during the COVID-19 pandemic. We present the $R$ package ern that aims to perform the estimation of the effective reproduction number from real-world wastewater or aggregated clinical surveillance data in a user-friendly way.

Introduction

The effective reproduction number, commonly denoted as $R_{t}$ , is a key metric in epidemiology. It is defined as the average number of new infections generated by an infected individual at time t during an epidemic. It differs from the basic reproduction number, $R_{0}$ , in that it additionally accounts for changes in population susceptibility and transmission at a given point in time. The parameter $R_{t}$ effectively measures the strength of transmission of an infectious pathogen within a population [1]. The value of $R_{t}$ has a simple interpretation depending on whether it is greater than, equal to, or less than one: it implies that the number of new infections is either increasing, constant, or decreasing over time, respectively. Usually, $R_{t}$ is estimated using the daily number of new cases reported via clinical surveillance. The importance of $R_{t}$ was reinforced during the SARS-CoV-2 pandemic when its estimates supported public health decisions in many jurisdictions worldwide [2].

Wastewater-based epidemiological surveillance emerged as a critical component of the public health arsenal to monitor the COVID-19 pandemic (e.g., [3, 4]), despite being used since at least since the 1940s to monitor the poliovirus [5]. While individuals infected with SARS-CoV-2 shed viral particles through various routes (such as urine, saliva, and sputum), stool shedding is the dominant source of viral shedding when examining community-level wastewater surveillance [6]. Once shed, viral particles enter the sewer network and reside in wastewater. Wastewater samples are typically collected at treatment plants and viral RNA is extracted from these samples using various laboratory methods. The concentration of viral RNA in these samples can be quantified using real-time quantitative polymerase chain reaction (RT-qPCR) as well as digital droplet PCR. The concentration is assumed to be proportional to the infection prevalence in the community living in the catchment area (up to a conversion factor). Fecal shedding occurs passively and irrespective of the symptomatic status of the infected individual [7], although shedding is likely to be at its peak during the symptomatic period [8]. Hence wastewater surveillance data does not have the same biases as clinical surveillance data, which tends to focus on symptomatic/severe infections.

In light of the utility of using wastewater-based surveillance during the COVID-19 pandemic, this methodology has been applied successfully to several other pathogens: human influenza, respiratory syncytial virus, and mpox are now routinely monitored in wastewater samples in many jurisdictions [9, 10]. Therefore, it is important for the public health community to be able to easily estimate $R_{t}$ of an infectious disease from wastewater data. Moreover, as wastewater-based epidemiological surveillance expands, public health organizations will likely leverage both clinical and wastewater-based surveillance data to monitor the spread of pathogens. As such, it would be useful to have a tool that estimates $R_{t}$ concordantly across both of these data sources.

The literature on methods to estimate $R_{t}$ from clinical data is vast due to the importance of $R_{t}$ in infectious disease epidemiology (for example [1, 11–15]). On the contrary, few studies have attempted to estimate $R_{t}$ from wastewater data. Huisman et al. [16] proposed a method based on deconvoluting the fecal shedding distribution. Previous work has developed epidemic compartmental models that can integrate wastewater-based surveillance [17–19] but $R_{t}$ cannot be derived explicitly (except for [19]). Jiang et al. [20] derived $R_{t}$ from an artificial neural network, and Amman et al. [21] approximated $R_{t}$ of SARS-CoV-2 variants from their relative abundance in wastewater samples. While these methods are useful, there have been relatively few efforts to port these theoretical frameworks into user-friendly software to apply them to real-world wastewater data. One recently-released $R$ package, EpiSewer, aims to address this gap [22].

Clinical data are often reported as aggregated cases over a period of time, typically weekly. However, a key parameter in estimating $R_{t}$ is the distribution of the intrinsic generation interval (defined as the interval between the time when an individual is infected by an infector and the time when this infector was infected). For many infectious pathogens, this interval is on the order of days. Many existing implementations of $R_{t}$ estimation in $R$ libraries require that the input data (clinical case reports) and the specification of the intrinsic generation interval [23] are on the same timescale (e.g., days). For example, H1N1 influenza has a mean intrinsic generation interval of about 3 days and a maximum value of about 7 days [24, 25]. If the data is reported weekly, it is not possible to define the generation interval distribution meaningfully in units of week. This is because the generation interval distribution must be discrete for existing methods, so it is not as easy as defining a continuous distribution rescaled to weeks. Hence, before estimating $R_{t}$ with existing methods, the input data must first be disaggregated onto the scale of days, which is not a straightforward process.

Several $R$ packages exist to estimate $R_{t}$ from clinical data. One popular package is EpiEstim, which initially implemented a Poisson-based model of the renewal equation [26]. This package has recently been improved to handle aggregated input data [27]. Briefly, the approach to estimating $R_{t}$ from aggregated clinical reports (typically reported weekly) relies on an expectation-maximization algorithm to disaggregate the counts into daily case reports, assuming a local exponential growth for transmission. As a result of this assumption, the inferred daily case reports have a piecewise exponential form, which may be problematic for downstream applications. Moreover, EpiEstim does not explicitly handle the various time delays like, for example, incubation period and reporting delays (the time between symptoms onset and reporting of a case) typically encountered in practice with epidemiological reports.

EpiNow2 is also a recent R package that aims to improve the estimation of $R_{t}$ including for example reporting delays and periodicity, as well as the propagation of parameter uncertainty [28]. The package also provides tools for short-term forecasting of case reports but cannot handle explicitly non-daily (e.g., weekly) reporting. Another $R$ package, epidemia provides a regression-based framework to estimate $R_{t}$ from daily clinical data [29]. We note that while theoretically possible, estimating $R_{t}$ from wastewater data with EpiNow2 or epidemia is not straightforward, especially for users who do not have a modelling background. Moreover, because of their reliance on the Bayesian inference software Stan [30], computing time may be long. The $R$ package estimateR is another tool to estimate $R_{t}$ from clinical data but does not explicitly handle wastewater data or aggregated clinical data [31].

Here, we present the $R$ library ern to address the gaps identified above, specifically:

to disaggregate the clinical reports into a shorter time unit to enable estimation of $R_{t}$ using an intrinsic generation interval on a useful timescale;
to provide a framework to estimate $R_{t}$ from wastewater data, consistent with an estimation based on clinical data;
to provide a user-friendly interface geared at public-health practitioners that may have limited proficiency in the $R$ programming language;
to perform an efficient and rapid $R_{t}$ estimation.

Table 1 summarises key features of the $R$ packages discussed above, along with the ern package.

Table 1. A comparison of `ern` with other $R$ packages built to estimate $R_{t}$ from epidemiological data.

Checkmarks (✓) indicate the presence of a feature and crosses (×) indicate the absence. A cross with an asterisk (×*) denotes a feature not built-in the package but technically possible though not straightforward for the average user (e.g., they may require additional modelling knowledge and/or the use of advanced/less documented features).

$R$ Package	Accepted inputs		Available features
$R$ Package	Wastewater concentration	Daily clinical case data	Reporting delays	Disaggregate case data
`ern`	✓	✓	✓	✓
`EpiSewer`	✓	×	✓	×
`EpiEstim`	×	✓	×	✓
`EpiNow2`	×*	✓	✓	×*
`epidemia`	×*	✓	✓	×*
`estimateR`	×*	✓	✓	×

Open in a new tab

The ern package ultimately uses the EpiEstim package for the core of the $R_{t}$ computation as EpiEstim already provides a robust and one of the fastest implementations of well-tested $R_{t}$ estimation algorithms. However, ern wraps complex and critical features for estimating $R_{t}$ from real-world clinical and wastewater data that have not all been implemented in any one existing $R$ package for $R_{t}$ estimation.

Materials and methods

The R code for the ern package is available on the Comprehensive $R$ Archive Network at https://cran.r-project.org/web/packages/ern/index.html.

Fig 1 gives a high-level overview of how the ern package computes $R_{t}$ for both wastewater and clinical input data. The pipeline for each data stream has three components:

Estimating daily incidence from the raw data (wastewater or clinical)
Estimating $R_{t}$ from the estimated daily incidence
Repeating $R_{t}$ estimates (previous two components) to generate an ensemble reflecting various sources of uncertainty

Throughout this work, we use the term incidence to denote the “true” underlying incidence of infections, as opposed to reported incidence (from clinical data), which we instead refer to as reports or reported cases.

Dashed elements represent optional components. Layered boxes represent replicates from resampling that inform uncertainty in the final $R_{t}$ ensemble. Resampled elements include the distributions used in deconvolutions and EpiEstim (sampled from the specified family of distributions for each quantity), the set of inferred daily reports (when these are estimated), and the underreporting proportion.

Estimating daily incidence with wastewater data

Our approach to estimating the daily incidence time series from wastewater data is similar to the one taken in [16], where the concentration of pathogen shed in wastewater, w_t, is assumed to be the convolution of the incidence of infections, i, and the fecal shedding distribution f (the relative proportion of pathogen shed in feces as a function of time since infection) of an average infected individual:

\begin{matrix} w_{t} = ω \sum_{k = 1}^{t - 1} i (t - k) f (k) \end{matrix}

(1)

The function f can be defined such that ∑_k>0f(k) = 1. The parameter ω denotes how much a single average infection contributes to wastewater concentration in total over the course of infection, as measured in the sewer system. This parameter captures baseline average shedding, but also reflects the loss of viral particles measured between the shedding and downstream sampling locations (dependant on the sewer system, environmental factors, and the processing pipeline of the laboratory).

Since we model the wastewater signal as a convolution of incidence with the fecal shedding distribution, we must perform a deconvolution of the wastewater signal with the fecal shedding distribution to recover incidence for $R_{t}$ estimation. However, sampled pathogen concentration in wastewater tends to be a noisy signal, so we smooth the time series of concentrations w_t. Wastewater samples are often taken a few days a week, so smoothing additionally interpolates the signal on a daily scale, which is a requirement for working with an intrinsic generation interval measured in days to estimate $R_{t}$ . Hence, we obtain W_t, a daily interpolated and smoothed time series of pathogen concentrations in wastewater:

\begin{matrix} W_{t} = smooth_interpolation (w_{t}, θ) \end{matrix}

(2)

where θ are the smoothing parameters.

The smoothing algorithms implemented in ern are moving average and LOESS (LOcally Estimated Scatterplot Smoothing), with default values set to provide a light smoothing of the time series. The moving average smooths a time series by taking an unweighted mean of all points in a window of each time point. ern users can specify the width and centering of the window with respect to the focal time point. The LOESS method is a generalisation of the moving average [32]; it still operates across subsets of the time series, but instead of computing the unweighted mean in each of these windows, it performs a weighted linear regression at each point and returns the predicted value of the focal time point. Weighting is done by distance from the focal time point, with closer points carrying more weight. The window size is controlled by a span parameter, which an ern user can specify, along with a minimum concentration to prevent zero or negative values in the smoothed time series when inputting low-concentration measurements.

Finally, to extract the daily incidence i(t), we substitute w_t by W_t in Eq 1 and we use the Richardson-Lucy algorithm [33–35] to deconvolute W_t using the fecal shedding distribution f as the kernel:

\begin{matrix} i (t) = R L (W, f) \end{matrix}

(3)

where RL represents the deconvolution algorithm.

Estimating daily incidence with clinical data

To estimate a daily incidence time series from daily clinical reports, the reports are optionally smoothed to eliminate some noise from the signal. As with wastewater input data, the smoothing algorithms available in ern are LOESS and moving average. Then, reports are scaled to account for underreporting and bring the signal to the scale of actual infections. Next, the smoothed and scaled time series is deconvoluted (similarly as in the wastewater method) using i) a reporting delay distribution kernel and ii) an incubation period distribution kernel. These two deconvolutions estimate daily “true” incidence (i.e., tallied by date of infection, not the report date).

In some cases, the clinical reporting frequency may not be compatible with the relevant timescale of the intrinsic generation interval distribution. For example, seasonal influenza cases are typically reported on a weekly basis, but its generation interval should be defined in units of days because it is shorter than a week for most cases [24, 25]. (For a detailed discussion on why the reporting frequency and timescale of the intrinsic generation interval must match, see section “Daily incidence to $R_{t}$ ”.) The package ern implements two methods to interpolate aggregate reports and produce inferred daily reports used to compute $R_{t}$ .

The first method is called the “renewal” method as it involves a statistical model that infers the latent daily reports from aggregate counts using a standard “Susceptible-Infectious-Recovered” (SIR) epidemic model via the renewal equation [36, 37].

This approach ensures the inferred daily reports follow a realistic epidemic curve, as opposed to, e.g., an ad-hoc estimate such as naively dividing weekly reports by 7. A poor approximation of the exponential transmission process of the disease, as reflected in the inferred daily reports, could significantly impact the quality of the $R_{t}$ estimates. See S1 File for an example.

With the renewal interpolation method, SIR model parameters are fitted to the aggregated (e.g., weekly) clinical reports using a Markov Chain Monte Carlo (MCMC) algorithm and then daily reports are inferred from the fitted model. We use the $R$ package rjags to perform this inference. More details about this statistical model are given in S2 File.

While the renewal method better represents the process that generates observed aggregate case reports, it can be computationally intensive. Thus, we also provide a faster, alternative method using simple linear interpolation, described fully in S3 File.

Daily incidence to $R_{t}$

Once daily incidence has been estimated from either data stream, we feed this time series into the function estimate_r() of the package EpiEstim, along with a specific intrinsic general interval distribution. We use the mean value, as well as the 2.5% and 97.5% quantiles, as reported by EpiEstim::estimate_r() as a single estimate of $R_{t}$ . (Resampling to produce an ensemble $R_{t}$ estimate is discussed in the next section.)

Underpinning the EpiEstim::estimate_r() estimation of $R_{t}$ is the following equation governing how incidence at the current time, i(t), is modelled by $R_{t}$ , the generation interval distribution g, and past incidence:

\begin{matrix} i (t) = R_{t} \sum_{k \geq 1} g (k) i (t - k) \end{matrix}

(4)

Here, k = 1, … is a discrete-time index: incidence is being observed (inferred from reports) at discrete times, i(t − k), and it is being weighted by a discrete generation interval distribution g(k) and scaled by $R_{t}$ to calculate current incidence i(t). In other words, current incidence is a function of past incidence (and the generation interval distribution).

The discrete timescale used here is not prescribed (i.e. doesn’t necessarily have to be daily, weekly, etc.), but Eq 4 shows that the timescales of the generation interval and the observed incidence must match. Many infectious diseases, like influenza and COVID-19, produce generation intervals that are mostly less than a week, and so representing their generation interval distributions on the timescale of weeks (e.g. to match weekly reported incidence data input into EpiEstim::estimate_r()) would not yield useful results.

To understand precisely why a coarse generation interval may not yield useful results, let’s consider the example of influenza A/H1N1, which has a generation interval distribution smaller than 7 days in most settings [25] and assume we work with data reported weekly (so the unit of index k is week). In this case, we would need to define the generation interval distribution on a weekly scale as g(1) = 1 and g(k) = 0 for all k > 1 (the generation interval is 0 for any time larger than a week), and so

\begin{matrix} i (t) = R_{t} i (t - 1) \Rightarrow R_{t} = i (t) / i (t - 1) \end{matrix}

(5)

The parameter $R_{t}$ is often used in public health surveillance to determine whether a disease is spreading or receding in a population by comparing it to the $R_{t} = 1$ threshold. The crude approximation in Eq 4 would be >1, indicating the disease is spreading, exactly when i(t)>i(t − 1), and receding when i(t) < i(t − 1). If there is any noise in the incidence time series (inferred from observed reports), which there always is in real data, the approximation in Eq 5 would not be able to distinguish a true increase (or decrease) signal from noise.

For $R_{t}$ to be a useful surveillance metric for infectious diseases, the generation interval must be represented in a timescale that describes finely enough the temporal variation of disease transmission. Many infectious disease data (especially respiratory ones) are reported on a coarser timescale (e.g, weeks), which is why we have built methods into ern to disaggregate input clinical data (as discussed in section “Estimating daily incidence with clinical data”).

Generating an $R_{t}$ ensemble reflecting uncertainty

The package ern accounts for various sources of uncertainty in estimating $R_{t}$ . There is uncertainty in some inputs used to estimate daily incidence for each data stream, as well as statistical uncertainty incorporated in the daily incidence to $R_{t}$ estimate. The latter case is handled by EpiEstim through its Poisson-based model of the renewal equation [26]. The former case is handled by ern. Indeed, ern performs the $R_{t}$ calculation repeatedly and then summarizes the results in an ensemble. Each realization of the ensemble involves (re)sampling each uncertain input.

For the wastewater data, the uncertain inputs can be:

the fecal shedding distribution,
the intrinsic generation interval distribution.

For the clinical data, the uncertain inputs can be:

the inferred daily reports,
the underreporting fraction,
the incubation period distribution,
the reporting delay distribution,
the intrinsic generation interval distribution.

Uncertain distributions are specified for ern as a family of distributions, where each distribution parameter has an associated standard deviation. Supported families of distributions include Gamma, Normal, and Log-Normal. One can also specify a Uniform distribution (e.g., for the underreporting proportion). Distribution parameters are assumed to be Gamma-distributed to ensure sampled values (which specify a sampled distribution) are strictly positive. Inferred daily reports are drawn from posterior samples produced by the MCMC fit (if estimated). We sample 300 posterior replicates (using EpiEstim::sample_posterior_R()) from every single estimate of $R_{t}$ (i.e., each realization of the final $R_{t}$ ensemble) and calculate by date the mean of those posteriors along with 2.5% and 97.5% quantiles for $R_{t}$ to produce a single ensemble time series.

Results

The package ern has two functions with which to estimate the daily effective reproduction number, $R_{t}$ , for each supported data stream:

estimate_R_ww, which uses the concentration of a pathogen in wastewater over time as the input signal;
estimate_R_cl, which uses the count of clinically reported cases over time as the input signal.

We give an illustration of each method below.

Example with wastewater data

The function estimate_R_ww estimates $R_{t}$ from the pathogen concentration measured in wastewater. Its first input, ww.conc, is a dataframe with columns date (measurement date) and value (concentration value) that specifies the pathogen concentration in wastewater over time. The other inputs dist.fec and dist.gi specify parameters for two families of distributions: one for the fecal shedding rate distribution and the other for the intrinsic generation interval distribution, respectively.

We start by loading a subset of wastewater data that is attached in the ern package. This dataset contains daily average concentration data of SARS-CoV-2 (N2 gene), measured in gene copies per milliliter of wastewater, from the Iona Island wastewater treatment plant in Vancouver, British Columbia collected between 7 July 2023 and 5 November 2023 [38]. Note that the type of normalization of the wastewater data (e.g., viral concentration normalized by flow, other biomarkers, suspended solids mass, etc) is left to the user as this choice depends on each sampling site and laboratory methods.

R> ww.conc = ern::ww.data

This data is plotted in the top panel of Fig 3.

As this example uses the SARS-CoV-2 virus, we can define fecal shedding and generation interval as the following:

R> dist.fec = ern::def_dist(
+   dist     = "gamma",
+   mean     = 12.9,
+   mean_sd  = 1.1,
+   shape    = 1.7,
+   shape_sd = 0.27,
+   max      = 33
+ )

R> dist.gi  = ern::def_dist(
+   dist     = "gamma",
+   mean     = 6.8,
+   mean_sd  = 0.74,
+   shape    = 2.4,
+   shape_sd = 0.36,
+   max      = 15
+ )

Each distribution family is defined by a structured list:

> print(dist.fec)
$dist
[1] "gamma"

$mean
[1] 12.9

$mean_sd
[1] 1.1

$shape
[1] 1.7

$shape_sd
[1] 0.27

$max
[1] 33

The first element of each distribution family list, dist, gives the shape of the distribution family. The nomenclature of distribution names follows the one used in $R$ (e.g., gamma from the $R$ functions d/r/q/pgamma). The next four elements give parameters for this family of distributions, stated in terms of the mean and standard deviation, along with an associated standard deviation (_sd) for each distribution parameter. The final element of this list, max gives the maximum value to be drawn from this distribution; this is where the density is truncated (and then re-normalized to ensure it still sums to 1). This structure for the distribution list applies to Gamma, Normal, and Log-Normal families. For Uniform, ern currently supports only the specification of a single distribution (as opposed to a family). In this case, the distribution list specifying a Uniform would have three entires: dist, which would be equal to “unif”, and then min and max, to specify the minimum and maximum values with non-zero density (i.e., the support of the Uniform distribution).

We can visualize distributions by calling the function plot_dist. This convenience function will plot the mean distribution of the given family, that is, the distribution corresponding to the mean of each distribution parameter in the family. For example, plot_dist(dist.fec) was used to produce Fig 2 from the parameters for SARS-CoV-2 specified above.

Fig 2 — A possible choice for the mean fecal shedding distribution used for SARS-CoV-2 wastewater data.

The function estimate_R_ww also takes a number of parameters that give the user control over various components of the $R_{t}$ estimation:

scaling.factor is the average number of infections attributable to a unit of pathogen concentration per day. This quantity is typically estimated from i) clinical cases, ii) wastewater concentrations and iii) an “ascertainment rate” that estimates the number of infections missed by clinical surveillance (for example, using serological data).
prm.smooth defines the smoothing settings for the input wastewater data;
prm.R defines the settings for the $R_{t}$ estimates.

# Initializing scaling factor
R> scaling.factor = 1

# Initializing smoothing parameters
R> prm.smooth = list(
+   method = 'loess',  # smoothing method
+   align  = 'center', # smoothing alignment
+   span   = 0.30,     # smoothing span (used for loess smoothing only)
+   floor  = 5         # minimum smoothed concentration value
+                      # (optional, LOESS smoothing only)
+ )

# Initializing Rt settings
R> prm.R = list(
+   iter   = 20,           # number of resampling iterations 
+                          # to evaluate Rt ensemble
+   CI     = 0.95,         # confidence interval
+   window = 10,           # backward time window for Rt calculations
+   config.EpiEstim = NULL # optional EpiEstim configuration
                           # for Rt calculations
+ )

Once we have specified all of these settings, we can feed them in, along with the input wastewater concentration data and the relevant distributions, to estimate $R_{t}$ :

R> r.estim = estimate_R_ww(
+   ww.conc        = ww.conc,
+   dist.fec       = dist.fec,
+   dist.gi        = dist.gi,
+   scaling.factor = scaling.factor,
+   prm.smooth     = prm.smooth,
+   prm.R          = prm.R
+ )

estimate_R_ww returns a list with four elements:

ww.conc: the original input of pathogen concentration in wastewater over time
ww.smooth: the smoothed wastewater concentration over time; includes columns:
- t: internal time index
- obs: smoothed value of the observation
- date
inc: the daily incidence inferred over time; includes columns:
- date
- mean: mean of the inferred daily incidence
- lwr, upr: lower and upper bounds of the 95% confidence interval for the inferred daily incidence
R: the estimated daily reproduction number over time; includes columns:
- date
- mean: mean $R_{t}$ value
- lwr, upr: lower and upper bounds of the confidence interval (width as specified in prm.R) for $R_{t}$

The function plot_diagnostic_ww conveniently displays all of the output data to help assess the quality of the $R_{t}$ estimates (Fig 3).

Example with clinical data

As shown in Fig 1, a key feature implemented in ern is the ability to handle clinical data that is reported on a time scale that is coarser than the typical generation interval timescale when estimating $R_{t}$ .

The function estimate_R_cl requires a data frame, cl.data, with one column for the report date (date) and another for the count of clinical reports (value). In addition, the user must specify a reporting fraction distribution (dist.repfrac) and three distribution families:

dist.repdelay: reporting delay;
dist.incub: incubation period;
dist.gi: intrinsic generation interval.

If input clinical reports are not reported daily, an additional parameter must be provided: popsize, representing the size of the population being considered, in order for daily reports to be inferred using the “renewal” method (see S1 File).

A sample of Canadian COVID-19 clinical reports are included in ern. This data set includes weekly reports from the provinces of British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, and Quebec, between 1 Feb 2020 and 1 Apr 2023 [39]. As an example, we start by loading a subset of the weekly clinical report data for Quebec:

R> # --- data
+ dat <- (ern::cl.data
+   |> dplyr::filter(
+        pt == "qc", 
+        dplyr::between(date, 
+                       as.Date("2021-06-01"),
+                       as.Date("2021-09-01"))
+   )
)

We define distributions for the reporting fraction, reporting delay, incubation period, and intrinsic generation interval:

R> # --- distributions
+ # reporting fraction
+ dist.repfrac = ern::def_dist(
+     dist = "unif",
+     min  = 0.1,
+     max  = 0.3
+ )
+ # reporting delay
+ dist.repdelay = ern::def_dist(
+     dist    = 'gamma',
+     mean    = 5, 
+     mean_sd = 1,
+     sd      = 1,
+     sd_sd   = 0.1,
+     max     = 10
+ )
+ # incubation period
+ dist.incub = ern::def_dist(
+     dist     = "gamma",
+     mean     = 3.49,
+     mean_sd  = 0.1477,
+     shape    = 8.5,
+     shape_sd = 1.8945,
+     max      = 8
+ )
+ # generation interval
+ dist.gi = ern::def_dist(
+     dist     = "gamma",
+     mean     = 6,
+     mean_sd  = 0.75,
+     shape    = 2.4,
+     shape_sd = 0.3,
+     max      = 10
+ )

The data set we are working with reports COVID-19 on a weekly basis, which is substantially longer than the typical generation interval of about 5 days for SARS-CoV-2 [40]. ern will estimate daily incidence from non-daily data. We specify the settings for this inference via prm.daily:

R> # --- settings
+ # daily report inference
+ prm.daily <- list(
+     method  = "renewal",
+     popsize = 8.5e6,     # Q3 (July 1) 2022 estimate for Quebec
+     burn    = 500,       # "burn-in" for MCMC
+     iter    = 500,       # MCMC iterations after burn-in
+     chains  = 2,         # number of chains
+     # priors for the R0 distribution (Gamma)
+     prior_R0_shape = 1.1, prior_R0_rate = 0.6,
+     # priors for the alpha distribution 
+     prior_alpha_shape = 1, prior_alpha_rate = 1
+ )

The method = “renewal” setting specifies the use of the renewal-equation-based epidemic model fitted with an MCMC algorithm, described fully in S1 File. This algorithm requires the specification of a total population size, which we source from Statistics Canada for this example [41]. The rest of the arguments in prm.daily give settings for the MCMC algorithm. The output of estimate_R_cl() has an element called diagnostic.mcmc which contains objects that help assess the convergence of the MCMC algorithm. In particular, a warning message is displayed if the Gelman-Rubin statistics [42] of the latent daily incidence variable is above 1.025, prompting the user to increase the number of MCMC iterations.

After the inference of the daily reports is performed, a check is run to ensure that the posterior aggregated daily reports are not too different from the observed aggregated reports (given as input). The parameter agg.reldiff.tol is the maximum tolerance (as a percentage) accepted for the relative difference between the observed and posterior aggregates:

R> # daily report inference check
+ prm.daily.check <- list(
+     agg.reldiff.tol = 10
+ )

The Bayesian model tends to be most error-prone at the start of the input time series, so after performing this check, ern will drop any inferred values before the differences first fall below the specified tolerance. It will not filter out observations after that point to ensure the inferred time series remains daily. It will also produce a warning to ensure the user is aware how many observations were dropped, along with some advice on how to increase the accuracy of the MCMC fit to decrease the number of dropped observations.

Choosing a number of MCMC iterations that is not very large (to avoid long computation times, for example) may lead to daily report posteriors that are not very smooth. This, in turn, can affect the quality of $R_{t}$ estimates. Hence, ern provides a smoothing of the posterior daily reports in order to improve the quality of $R_{t}$ inference. The smoothing parameters are defined as follows:

R> # smoothing
+ prm.smooth <- list(
+     method = "rollmean",
+     align  = "center",
+     window = 7
+ )

In the example above, the smoothing performs a centered moving average with a sliding window of 7 days. The same smoothing options are available across the wastewater and clinical methods.

We specify the parameters for the $R_{t}$ ensemble, just as we did in the wastewater example:

R> # Rt computation
+  prm.R <- list(
+    iter            = 20,
+    CI              = 0.95, 
+    window          = 7,
+    config.EpiEstim = NULL
+  )

Finally, we can call the main ern function to estimate $R_{t}$ from clinical data:

R> r.estim <- estimate_R_cl(
+   dat = cl.data,
+   dist.repdelay   = dist.repdelay,
+   dist.repfrac    = dist.repfrac,
+   dist.incub      = dist.incub,
+   dist.gi         = dist.gi,
+   prm.daily       = prm.daily,
+   prm.daily.check = prm.daily.check,
+   prm.smooth      = prm.smooth,
+   prm.R           = prm.R
+ )

estimate_R_cl returns a list with four elements:

cl.data: the original input of clinical disease reports over time, with an added column t for an internal time index
cl.daily: reports as input for $R_{t}$ calculation (inferred daily counts if original inputs were aggregates, smoothed if specified); includes columns:
- id: identifier for each realization (resampling iteration) of the daily report inference
- date: daily date
- value: inferred daily report count
- t: internal time index
inferred.agg: inferred daily reports re-aggregated on the reporting schedule as input in cl.data; includes columns:
- date: report date
- obs: original (aggregated) observations
- mean.agg: mean of the aggregated posterior daily reports
- lwr.agg, upr.agg: lower and upper bounds of a 95% confidence interval of the aggregated inferred daily reports
R: the estimated daily reproduction number over time; includes columns:
- date
- mean: mean $R_{t}$ value
- lwr, upr: lower and upper bounds of a confidence interval for each $R_{t}$ estimate
- use: logical flag, FALSE denotes estimated $R_{t}$ values that may be particularly unreliable as they fall within the maximum time range of one (truncated) generation interval from the start of the clinical report time series
diagnostic.mcmc: a list with various MCMC diagnostics, including
- plot.traces: trace plots for fitted parameters
- plot.gelmanrubin: plot of the Gelman Rubin statistics for fitted parameters
- jags.obj: the JAGS output mcmc.list object, as produced by rjags [43]

The function plot_diagnostic_cl summarises this output (Fig 4).

Sensitivity analysis for wastewater $R_{t}$

We perform a sensitivity analysis of $R_{t}$ estimations with wastewater input data to investigate various input choices since the methods used for this data stream are still relatively new. The code to replicate these results is provided in S5 File.

The package ern currently includes two smoothing methods: rolling mean and LOESS. Using similar smoothness parameters, i.e., a centered rolling mean on a 5-day window and a span parameter of 0.3 for LOESS, Fig S4–1 in S4 File shows that the $R_{t}$ estimates are comparable.

Because of the paucity of clinical studies, there is a fair amount of uncertainty regarding the temporal profile of fecal shedding for respiratory infections. Hence, in Fig S4–2 in S5 File, we show how the $R_{t}$ estimates can be significantly impacted by assuming differing profiles based Gamma, normal, uniform, and exponential distribution-like shapes for the fecal shedding distribution.

When the prevalence of infections is low in the population of interest, the epidemic “signal”, represented by a low count of clinical reports and/or low viral concentration in wastewater, is dominated by noise. In this case, the estimation of $R_{t}$ may be challenging. In Figs S4–3 in S4 File, we illustrate this using wastewater data by estimating $R_{t}$ on sample data multiplied by a factor of 0.01, 0.1, 1 and 10. The $R_{t}$ estimates are similar for multipliers 1 and 10, but very different (and unreliable) when the multiplier is 0.1 or 0.01, confirming the difficulty of estimating $R_{t}$ when prevalence is (very) low.

Computing time benchmarks

Rapid $R_{t}$ estimation can be important in some cases, such as during an epidemic being monitored daily in order to follow its evolution closely and assess the success of ongoing interventions meant to reduce transmission. Here, a computation time of less than a day is key. Rapid $R_{t}$ calculation is also important in cases where there are many input datasets. For instance, if one is calculating $R_{t}$ with wastewater data across an entire country, they may wish to do so by computing one $R_{t}$ per wastewater sampling location (it can be difficult to meaningfully combine wastewater data sampled from different sites into a single signal). Here, it is important for the $R_{t}$ calculation to be quick so that one can produce $R_{t}$ estimates for a large number of wastewater sampling locations in a reasonable amount of time.

As an example, Table 2 shows computing times to calculate $R_{t}$ with different $R$ packages using either weekly clinical case reports or viral concentration in wastewater. These times are simply meant to illustrate the order of magnitudes of the calculation times, and do not represent a thorough benchmarking exercise. In this example, estimates for wastewater-based $R_{t}$ take about one second with ern compared to about four minutes with EpiSewer. The latter uses a Hamiltonian Markov Chain Monte Carlo (via Stan) to estimate latent variables, which is much more computationally intensive than the simple deconvolution performed in ern. For $R_{t}$ estimation based on weekly clinical reports, the computing time is on the order of one second for both ern (using the linear method) and EpiEstim. The code to reproduce this example is given in S6 File.

Table 2. Sample computing times (in seconds) for R_t estimates using different R packages.

The wastewater data is taken from the data set shipped with the package, which consists of four months of daily SARS-CoV-2 concentration measurements for the city of Zurich. The clinical data are simulated weekly reports. See S6 File for more details.

	`R` package	data type	method (daily report inference)	compute time (s)
1	`ern`	wastewater	-	0.85
2	`EpiSewer`	wastewater	-	251.23
3	`ern`	clinical	linear	1.89
4	`ern`	clinical	renewal	20.72
5	`EpiEstim`	clinical	expectation maximization	0.81

Open in a new tab

Discussion and conclusions

The $R$ package ern was designed with public health practitioners in mind, specifically to provide them with a tool to estimate, in a user-friendly way, the effective reproduction number $R_{t}$ from typical clinical reports and/or data reporting pathogen concentration in wastewater. The inferences for $R_{t}$ rely on various distributions (e.g., fecal shedding, incubation period, generation interval) that are rarely perfectly known. To reflect this uncertainty, these distributions are defined as family of distributions and the estimation process samples from those families to propagate this source of uncertainty into the final $R_{t}$ estimates. Clinical cases of infectious diseases are rarely reported on a daily basis despite being the most natural time unit (at least for respiratory diseases) in $R_{t}$ models. The package ern accepts non-daily clinical reports and can infer daily incidence using a genuine transmission model.

The methods implemented in ern to estimate $R_{t}$ from clinical or wastewater data are similar to other existing methods. For example, the deconvolution of the incubation period and reporting delays in ern use the same Richardson-Lucy algorithm as in [15, 16]. The LOESS or rolling mean smoothing of the wastewater data as a way to preprocess the data to reduce the noise is also use broadly. Indeed, the $R$ package ern leverages previous works and focuses its scientific contribution on bringing these different methodological approaches into a single, consistent, user-friendly package.

There are several limitations of the ern package. For clinical inputs, the renewal method depends on JAGS, which may not be straightforward to install for the average user. The computing time when using aggregated clinical reports and the renewal method may be too long for some applications. Moreover, the renewal method does not have a time-dependent transmission parameter in its current implementation, so estimating $R_{t}$ using this method is appropriate for a single epidemic wave without any significant change in transmission (for example, a typical seasonal influenza wave in a non-tropical region). The linear method can handle temporal changes in transmission, though it may not always infer a realistic epidemic curve for inferred daily reports.

Another limitation is that the model in ern does not have the latent incidence as a random variable when estimating $R_{t}$ from wastewater data (unlike, for example, the R package EpiSewer), so this uncertainty is not accounted for. Even if the uncertainty of the fecal shedding distribution is propagated, it does not capture the full scale of uncertainty. This can be problematic for real-time surveillance because the uncertainty for $R_{t}$ estimates may be underestimated for dates close to the estimation time.

For wastewater inputs, the scaling factor used to convert between prevalence and viral concentration in wastewater is difficult to estimate in practice. Ideally, one would need, over multiple days, i) an accurate estimate of the actual prevalence in the catchment area from (extensive) clinical surveillance and ii) viral concentration measurements over the same period. The scaling factor would then be proportional to the ratio of prevalence over concentration (and depending on the laboratory method used to measure the viral concentration, additional normalization, for instance by flow rate or suspended solid mass, may be required).

ern currently allows users to define a particular distribution for fecal shedding kinetics. Studies examining SARS-CoV-2 shedding have shown that fecal shedding kinetics can vary among infected individuals [44, 45]. Moreover, the scaling factor in ern is held constant over time, which may not be realistic as new viral lineages emerge and the immune profile of the population evolves over time; both of these factors can affect pathogen shedding in wastewater. As a result, the “inferred incidence” estimated by ern (using the output estimate_R_ww(…)$inc) must be interpreted carefully.

Wastewater sample concentration can also be affected by environmental and structural factors of sewer systems. Flow from rainfall and snowmelt can dilute sample concentration readings [46] and sewer transit time can impact the rate at which viral particles degrade prior to sample collection [47].

Future versions of ern will attempt to address the above limitations.

In conclusion, the $R$ package ern aims to provide a relatively user-friendly environment to empower public health professionals with a tool to estimate the effective reproduction number $R_{t}$ from clinical and wastewater-based data.

Computational details

The results in this paper were obtained using $R$ version 4.3.1 with packages EpiEstim version 2.4, rjags version 4–14, and the software JAGS version 4.3.1. $R$ itself and all packages used are available from the Comprehensive $R$ Archive Network (CRAN) at https://CRAN.R-project.org/.

Supporting information

S1 File. Methodological differences when inferring daily incidence.

(PDF)

pone.0305550.s001.pdf^{(130.4KB, pdf)}

S2 File. Bayesian model to infer daily clinical report count.

(PDF)

pone.0305550.s002.pdf^{(152.8KB, pdf)}

S3 File. Linear interpolation to infer daily clinical report count.

(PDF)

pone.0305550.s003.pdf^{(170.8KB, pdf)}

S4 File. Sensitivity analysis to selected parameters.

(PDF)

pone.0305550.s004.pdf^{(170.2KB, pdf)}

S5 File. R code to perform sensitivitity analyses presented in S4 File.

(R)

pone.0305550.s005.R^{(7.8KB, R)}

S6 File. R code to evaluate the computing time of selected R packages that estimate the effective reproduction number.

(R)

pone.0305550.s006.R^{(7.6KB, R)}

S7 File. R code to associated with the methodological differences presented in S1 File.

(R)

pone.0305550.s007.R^{(7KB, R)}

Acknowledgments

We thank Shokoofeh Nourbakhsh for testing early versions of the package. We also thank the reviewers for their insightful comments that sparked improvements in this article and in the ern package presented here.

Data Availability

The R code for the package is currently available on the official R repository CRAN (https://CRAN.R-project.org/package=ern) and GitHub: https://github.com/phac-nml-phrsd/ern The data used in this manuscript is publicly available and attached to the package.

Funding Statement

The author(s) received no specific funding for this work.

References

1. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1991. [Google Scholar]
2. Inglesby TV. Public health measures and the reproduction number of SARS-CoV-2. Jama. 2020;323(21):2186–2187. doi: 10.1001/jama.2020.7878 [DOI] [PubMed] [Google Scholar]
3. Kirby AE, Walters MS, Jennings WC, Fugitt R, LaCross N, Mattioli M, et al. Using wastewater surveillance data to support the COVID-19 response—United States, 2020–2021. Morbidity and Mortality Weekly Report. 2021;70(36):1242. doi: 10.15585/mmwr.mm7036a2 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Castiglioni S, Schiarea S, Pellegrinelli L, Primache V, Galli C, Bubba L, et al. SARS-CoV-2 RNA in urban wastewater samples to monitor the COVID-19 pandemic in Lombardy, Italy (March–June 2020). Science of The Total Environment. 2022;806:150816. doi: 10.1016/j.scitotenv.2021.150816 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Melnick JL, et al. Poliomyelitis Virus in Urban Sewage in Epidemic and in Nonepidemie Times. American journal of hygiene. 1947;45(2):240–53. [DOI] [PubMed] [Google Scholar]
6. Crank K, Chen W, Bivins A, Lowry S, Bibby K. Contribution of SARS-CoV-2 RNA shedding routes to RNA loads in wastewater. Science of The Total Environment. 2022;806. doi: 10.1016/j.scitotenv.2021.150376 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Wölfel R, Corman VM, Guggemos W, Seilmaier M, Zange S, Müller MA, et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. doi: 10.1038/s41586-020-2196-x [DOI] [PubMed] [Google Scholar]
8. Jones DL, Baluja MQ, Graham DW, Corbishley A, McDonald JE, Malham SK, et al. Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. Science of The Total Environment. 2020;749:141364. doi: 10.1016/j.scitotenv.2020.141364 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. de Jonge EF, Peterse CM, Koelewijn JM, van der Drift AMR, van der Beek RF, Nagelkerke E, et al. The detection of monkeypox virus DNA in wastewater samples in the Netherlands. Science of The Total Environment. 2022;852:158265. doi: 10.1016/j.scitotenv.2022.158265 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Mercier E, D’Aoust PM, Thakali O, Hegazy N, Jia JJ, Zhang Z, et al. Municipal and neighbourhood level wastewater surveillance and subtyping of an influenza virus outbreak. Scientific Reports. 2022;12(1):15777. doi: 10.1038/s41598-022-20076-z [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical biosciences. 2002;180(1-2):29–48. doi: 10.1016/S0025-5564(02)00108-6 [DOI] [PubMed] [Google Scholar]
12. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American journal of epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, R t. PLoS computational biology. 2020;16(12):e1008409. doi: 10.1371/journal.pcbi.1008409 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Alvarez L, Colom M, Morel JD, Morel JM. Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique. Proceedings of the National Academy of Sciences. 2021;118(50):e2105112118. doi: 10.1073/pnas.2105112118 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Huisman JS, Scire J, Angst DC, Li J, Neher RA, Maathuis MH, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. eLife. 2022;11:e71345. doi: 10.7554/eLife.71345 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Huisman JS, Scire J, Caduff L, Fernandez-Cassi X, Ganesanandamoorthy P, Kull A, et al. Wastewater-based estimation of the effective reproductive number of SARS-CoV-2. Environmental health perspectives. 2022;130(5):057011. doi: 10.1289/EHP10050 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. McMahan CS, Self S, Rennert L, Kalbaugh C, Kriebel D, Graves D, et al. COVID-19 wastewater epidemiology: a model to estimate infected populations. The Lancet Planetary Health. 2021;5(12):e874–e881. doi: 10.1016/S2542-5196(21)00230-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Fazli M, Sklar S, Porter MD, French BA, Shakeri H. Wastewater-Based Epidemiological Modeling for Continuous Surveillance of COVID-19 Outbreak. medRxiv. 2021. doi: 10.1101/2021.10.19.21265221 [DOI] [Google Scholar]
19. Nourbakhsh S, Fazil A, Li M, Mangat CS, Peterson SW, Daigle J, et al. A wastewater-based epidemic model for SARS-CoV-2 with application to three Canadian cities. Epidemics. 2022;39:100560. doi: 10.1016/j.epidem.2022.100560 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Jiang G, Wu J, Weidhaas J, Li X, Chen Y, Mueller J, et al. Artificial neural network-based estimation of COVID-19 case numbers and effective reproduction rate using wastewater-based epidemiology. Water research. 2022;218:118451. doi: 10.1016/j.watres.2022.118451 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Amman F, Markt R, Endler L, Hupfauf S, Agerer B, Schedl A, et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nature Biotechnology. 2022;40(12):1814–1822. doi: 10.1038/s41587-022-01387-y [DOI] [PubMed] [Google Scholar]
22.Lison A. adrian-lison/EpiSewer: EpiSewer 0.0.1; 2024. Available from: 10.5281/zenodo.10569102. [DOI]
23. Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proceedings of the Royal Society B: Biological Sciences. 2015;282(1821):20152026. doi: 10.1098/rspb.2015.2026 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Boëlle PY, Ansart S, Cori A, Valleron AJ. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review. Influenza and other respiratory viruses. 2011;5(5):306–316. doi: 10.1111/j.1750-2659.2011.00234.x [DOI] [PMC free article] [PubMed] [Google Scholar]
25. te Beest DE, Wallinga J, Donker T, van Boven M. Estimating the generation interval of influenza A (H1N1) in a range of social settings. Epidemiology. 2013; p. 244–250. doi: 10.1097/EDE.0b013e31827f50e8 [DOI] [PubMed] [Google Scholar]
26.Cori A. EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves; 2023. Available from: https://github.com/mrc-ide/EpiEstim.
27. Nash RK, Bhatt S, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. PLOS Computational Biology. 2023;19(8):e1011439. doi: 10.1371/journal.pcbi.1011439 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Sam Abbott, Joel Hellewell, Katharine Sherratt, Katelyn Gostic, Joe Hickson, Hamada S Badr, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020.
29.Scott JA, Gandy A, Mishra S, Unwin J, Flaxman S, Bhatt S. epidemia: Modeling of Epidemics using Hierarchical Bayesian Models; 2020. Available from: https://imperialcollegelondon.github.io/epidemia/.
30.Stan Development Team. Stan Modeling Language Users Guide and Reference Manual; 2024. Available from: https://mc-stan.org/.
31. Scire J, Huisman JS, Grosu A, Angst DC, Lison A, Li J, et al. estimateR: an R package to estimate and monitor the effective reproductive number. BMC bioinformatics. 2023;24(1):310. doi: 10.1186/s12859-023-05428-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Cleveland WS. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association. 1979;74(368):829–836. doi: 10.1080/01621459.1979.10481038 [DOI] [Google Scholar]
33. Richardson WH. Bayesian-based iterative method of image restoration. JoSA. 1972;62(1):55–59. doi: 10.1364/JOSA.62.000055 [DOI] [Google Scholar]
34. Lucy LB. An iterative technique for the rectification of observed distributions. The astronomical journal. 1974;79:745. doi: 10.1086/111605 [DOI] [Google Scholar]
35. Goldstein E, Dushoff J, Ma J, Plotkin JB, Earn DJ, Lipsitch M. Reconstructing influenza incidence by deconvolution of daily mortality time series. Proceedings of the National Academy of Sciences. 2009;106(51):21825–21829. doi: 10.1073/pnas.0902958106 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london Series A, Containing papers of a mathematical and physical character. 1927;115(772):700–721. [Google Scholar]
37. Champredon D, Dushoff J, Earn DJ. Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation. SIAM Journal on Applied Mathematics. 2018;78(6):3258–3278. doi: 10.1137/18M1186411 [DOI] [Google Scholar]
38.Government of Canada. COVID-19 wastewater monitoring dashboard: Viral signal trend; 2023. https://health-infobase.canada.ca/src/data/covidLive/wastewater/covid19-wastewater.csv.
39.Government of Canada. COVID-19 epidemiology update: cases and deaths data; 2023. https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv.
40. Challen R, Brooks-Pollock E, Tsaneva-Atanasova K, Danon L. Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number. Statistical Methods in Medical Research. 2022;31(9):1686–1703. doi: 10.1177/09622802211065159 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Statistics Canada. Table 17-10-0009-01 Population estimates, quarterly; 2023. 10.25318/1710000901-eng. [DOI]
42. Gelman A, Rubin DB. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992;7(4):457–472. doi: 10.1214/ss/1177011136 [DOI] [Google Scholar]
43.Plummer M. rjags: Bayesian Graphical Models using MCMC; 2023. Available from: https://CRAN.R-project.org/package=rjags.
44. Arts PJ, Kelly JD, Midgley CM, Anglin K, Lu S, Abedi GR, et al. Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. mSphere. 2023;8(4):e00132–23. doi: 10.1128/msphere.00132-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Uncertainties in estimating SARS-CoV-2 prevalence by wastewater-based epidemiology. Chemical Engineering Journal. 2021;415:129039. doi: 10.1016/j.cej.2021.129039 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Wade MJ, Lo Jacomo A, Armenise E, Brown MR, Bunce JT, Cameron GJ, et al. Understanding and managing uncertainty and variability for wastewater monitoring beyond the pandemic: Lessons learned from the United Kingdom national COVID-19 surveillance programmes. Journal of Hazardous Materials. 2022;424:127456. doi: 10.1016/j.jhazmat.2021.127456 [DOI] [PMC free article] [PubMed] [Google Scholar]
47. McCall C, Fang ZN, Li D, Czubai AJ, Juan A, LaTurner ZW, et al. Modeling SARS-CoV-2 RNA degradation in small and large sewersheds. Environ Sci: Water Res Technol. 2022;8:290–300. doi: 10.1039/D1EW00717C [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0305550.r001

Decision Letter 0

Salim Heddam

31 Jan 2024

PONE-D-24-01907ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance dataPLOS ONE

Dear Dr. Champredon,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 16 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We are unable to open your Supporting Information file [suppl_S1_interpolation_impact.R]. Please kindly revise as necessary and re-upload.

Additional Editor Comments:

Reviewer 1#:

please abstract must be more clear

material and methods not Models and software 93

discussion must be wrirten more advanced refernce with more knowedge about your artcle

conculsion must be written with detail

Reviewer 2#:

The authors present the R package ern for the estimation of the effective reproduction number from wastewater or aggregated clinical surveillance data.

The package provides a framework for an efficient and quicker estimation of effective reproduction number using a user-friendly interface.

The manuscript is well-written and makes a relevant contribution to the field.

I thoroughly enjoyed reviewing this manuscript and only have some minor requests for revision, as follows:

Lines 39 to 42: Do not start a sentence using a citation number. In Line 39, you may write, "Huisman et al. [14] proposed a method....". Do the same for lines 40, 41 and 42.

Line 261: "...for the for the ...". Delete the repetition.

Reviewer 3#:

1- It would have been better to talk about the Rt factor, which was mentioned in the research, in numerical terms, with something simple in the abstract

2-It was possible to dispense with some paragraphs in figure or table in the introduction

3-The researcher did not provide a research review of references that address the same topic, even in a simple way

4- It is possible to clarify the work algorithm in the form of clear points or in an algorithmic form, on the basis of which the steps of the example are clarified

5- The discussion was narrative and did not clarify any future idea or plan of action for researchers working in the future in the same field and what difficulties they may face.

Reviewer 4#:

1. Describe dataset features in more details and its total size and size of (train/test) as a table.

2. Pseudocode / Flowchart and algorithm steps need to be inserted.

3. Time spent need to be measured in the experimental results.

4. Limitation and Discussion Sections need to be inserted.

5. The parameters used for the analysis must be provided in table

6. The architecture of the proposed model must be provided

7. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

8. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

9. Add future work in last section (conclusion) (if any)

10. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Optimizing epileptic seizure recognition performance with feature scaling and dropout layers

b) Optimizing classification of diseases through language model analysis of symptoms

c) Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction

d) Utilizing convolutional neural networks to classify monkeypox skin lesions

e) Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt

Reviewer 5#:

I would like to thank the authors for putting together the R package ern and for submitting this accompanying manuscript. I agree with the assessment that there is a need for user-friendly statistical software to estimate reproduction numbers from wastewater concentration measurements. This manuscript explains the motivation for developing the package ern, i.e. to provide a dedicated interface to estimate Rt from wastewater and clinical case data. It then describes the statistical approach used and presents a vignette-style illustration of the main functionalities of the package. The authors also propose a new approach to disaggregate non-daily case counts for subsequent Rt estimation.

I have read the manuscript in detail and I have tested the package both using the example data from Canada as provided by the authors, and using wastewater data from Switzerland. I tried to structure my review into manuscript-related and package/method-related major comments, plus a collection of minor comments.

For transparency, I am a (co)author of two packages mentioned in my review, i.e. the package "estimateR" and the package "EpiSewer".

Kind regards

Adrian Lison

## Major points manuscript

First let me say that I found the manuscript clean and well-written.

### Related work

I think the manuscript can provide more details on what is methodologically novel and what not. You mention that the method implemented is similar to the method by Huisman et al., but aside from the approach to disaggregate non-daily case data, it seems at first glance to be EXACTLY the method by Huisman et al., i.e. as detailed in https://doi.org/10.7554/eLife.71345 for case data and in https://doi.org/10.1289/EHP10050 for wastewater data (LOESS smoothing, deconvolution using Richardson-Lucy algorithm, scaling, R estimation using EpiEstim, uncertainty quantification via resampling). I am raising this not only because of attribution but because it is important to clearly describe the differences and similarities between related methods such that readers can compare them properly.

Furthermore, there are several related works that are worth mentioning in my opinion.

First, the method by Huisman et al. has also been implemented as an open source R package (https://doi.org/10.1186/s12859-023-05428-4) "estimateR", and can similarly be used to estimate Rt from wastewater and case data (see e.g. https://ibz-shiny.ethz.ch/wastewaterRe/). Compared to the package ern, the interface and plotting functionality of estimateR are not explicitly tailored to wastewater data, therefore I think that ern is more user-friendly for this domain. Also, estimateR offers no option to disaggregate non-daily case counts. Aside from that, I think that estimateR and ern are highly similar since they are based on almost the same method.

Aside from ern, there are also other R packages for modeling wastewater data, including the package EpiSewer (https://zenodo.org/doi/10.5281/zenodo.10569101), your own package wem (https://github.com/phac-nml-phrsd/wem/tree/main), and the Covid19 Wastewater Analysis Package (https://github.com/UW-Madison-DSI/Covid19Wastewater/tree/main), although the latter does not produce Rt estimates. I do not think a detailed comparison or benchmarking of these packages is necessary, but a short discussion of their differences would be useful. I believe it is valuable to give potential users an overview over available options.

### Sensitivity analyses

I liked the illustrations of the package, but what I missed are some sensitivity analyses that give users an idea of what behavior to expect in different situations and what limitations the method might have. Some interesting analyses that I can think of would for example be about the smoothing (what role do the hyperparameters play, how do moving average and LOESS compare, how does the interpolation deal with larger gaps of data), or what happens if the fecal shedding distribution is misspecified, or how the method performs when concentrations are low. I know that such kinds of analyses require some work, but I think having some sensitivity analyses would add much value to the paper. You can then also point to these analyses from the package documentation.

### Limitations

Lines 22f: I agree that wastewater has several advantages over clinical data and does not have the same biases, but it would be good to shortly mention also potential biases of wastewater. In particular I am worried that the statement "Fecal shedding occurs passively and irrespective of the symptomatic status of the infected individual" could be misunderstood by readers. While it is true that asymptomatic patients also seem to shed into wastewater, there is also large variation in shedding loads and distributions between patients and it is not yet clear what the main factors are (see e.g. https://doi.org/10.1016/j.scitotenv.2020.141364 and https://doi.org/10.1128/msphere.00132-23). Wastewater concentrations could also be to a large part be driven by "supershedders" and we don't know how representative this subgroup is of the overall population. Other sources of bias worth mentioning are changing populations in the catchments and environmental factors like rainfall.

Scaling factor: Can you give some more details on how you would choose this in practice, and what the implications of a potential misspecification are (for example that a constant factor of misspecification will strongly bias incidence estimates but at least not bias Rt except in certain edge cases).

For the above reason, I would stress in the manuscript (and also in the package documentation) that the inferred incidence directly depends on the hard-to-estimate scaling factor and must therefore be carefully interpreted. Otherwise there is a risk that people will use this to estimate prevalence etc.

I suggest to add unit information to the concentration and scaling factor. For example, what are the typical units of the exemplary data in ww.input?

### Disaggregation

The proposed method for disaggregation of non-daily cases and the comparison with the method by Nash et al. is quite interesting. Based on your illustration, it seems like the method by Nash et al. could have important limitations not found in the original study by Nash et al.. Since this would be a strong result, can you provide

more details, e.g. which sliding window was used in the example? Also in Figure 5, can you also plot Rt estimated from the daily incidence time series?

In your model for inferring daily case counts, you do not seem to account for potential changes in transmission other than due to susceptible depletion. What happens if you fit this over longer time periods with multiple waves or time series with strong non-pharmaceutical interventions?

Lines 379f: I think this is a rather problematic approach - drawing not enough posterior samples and then applying smoothing to improve the irregular posterior. I think this can easily lead to an unrepresentative posterior and also distort the uncertainty estimates.

In the introduction, you mention long runtimes of epidemia and EpiNow2 as disadvantages to overcome, but the disaggregation of ern also requires MCMC sampling. How do the runtimes compare on non-daily data, is ern still considerably faster?

## Major points package / method

The package ern currently only seems to offer a fixed scaling factor. Explicitly supporting a time-varying scaling factor to account for flow would be great. Scaling concentrations by daily flow volumes at the treatment plant can be quite important in my experience because there can be a strong effect of dilution of the viral particles by rainfall etc. on the measured concentration.

Lines 173: What output of EpiEstim is used for Rt? Do you use the estimated mean of Rt? Or do you draw samples from the posterior Gamma distribution estimated by EpiEstim?

I noticed that the Rt estimates provided by ern do not have higher uncertainty towards the present, although this should definitely be the case (Rt of today cannot be estimated with the same accuracy as Rt of last week because of delays in fecal shedding / reporting). Do you account for uncertainty in the deconvolution step?

I think it is a great feature that ern also supports uncertain distributions. One question I had is if you could also support correlated parameters. At the moment, the mean and sd of a distribution get drawn independently, but they may be correlated in reality. Another scenario that may be quite realistic is that users have several different distributions / parameters from the literature. In this case you could allow users to provide a list of distributions from which to draw with replacement. These are just suggestions, not requests for this paper.

I am still a bit unsure about the default distributions provided in the package. On the one hand, this is a practical feature, but on the other hand: how are you planning to maintain/update this epidemiological data? They seem to be hard-coded in the package, so if newer distributions become available or old distributions are invalidated by new research, users will only get the updates if they install a newer version of the package. It would be good to comment on this.

I am particularly skeptical of providing default reporting delay distributions or reporting proportions as they will differ a lot between surveillance systems / countries etc.

I currently find it hard to define custom distributions with my own parameters. Functions like def_dist_incubation_period only accept the name of a pathogen as input. I know I can also construct a list with custom parameters myself, but a constructor function with the relevant parameters would be helpful.

There is an argument for subtypes/variants in def_dist_fecal_shedding, but it does not seem to do anything, I always get back the same distribution.

Also, def_dist_reporting_fraction does not accept a custom value and assumes a uniform distribution between 0.1 and 0.3 per default. This seems quite arbitrary to me.

I like the diagnostic plots, but having the option to produce individual plots for concentration, incidence, Rt etc. would be valuable. They are currently all merged into one plot via the patchwork package, making it hard to customize individual plots.

## Minor points

Line 5: "It differs from the basic reproduction number, R0, in that it takes into account the level of susceptibility in the population at a given point in time." Maybe say more generally that it accounts for "changes in transmission" - this may be due to changing susceptibility or other factors like different contact patterns, infection control measures etc.

Lines 21f: I would also mention digital droplet PCR as an alternative quantification method. Also, maybe shortly mention that viral RNA is first extracted from the wastewater sample using various laboratory methods.

Line 78: EpiNow2 and epidemia do not use the rstan package, they use cmdstanr package. I would just write "stan".

Lines 113: This is an important point and well explained.

Lines 166f: This paragraph felt a bit informal and was difficult to understand. Is your main point that it is important to estimate Rt using the number of cases by time of infection, not by time of report / time of sample?

I personally don't find the name of the function ern::ww.input very intuitive, I would not have expected that this returns example data! Also, can you provide more details in the function documentation, e.g. what the column "pt" means and what the unit of the concentration is in this example?

Lines 137f, reporting delay: I would say more clearly that this is the delay between symptom onset and case report.

As a suggestion, you could register the current version of the package in an online archive like zenodo. This will give you a DOI for the package, which can then be referenced in the manuscript.

## References

1. Huisman, J. S. _et al._ Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. _eLife_ **11**, e71345 (2022).

2. Huisman, J. S. _et al._ Wastewater-Based Estimation of the Effective Reproductive Number of SARS-CoV-2. _Environmental Health Perspectives_ **130**, 057011 (2022).

3. Scire, J. _et al._ estimateR: an R package to estimate and monitor the effective reproductive number. _BMC Bioinformatics_ **24**, 310 (2023).

4. Lison, A. EpiSewer: Estimate Reproduction Numbers from Wastewater Measurements. Zenodo (2024).

5. Jones, D. L. _et al._ Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. _Science of The Total Environment_ **749**, 141364 (2020).

6. Arts, P. J. _et al._ Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. _mSphere_ **8**, e00132-23 (2023).

7. Nash, R. K., Bhatt, S., Cori, A. & Nouvellet, P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. _PLOS Computational Biology_ **19**, e1011439 (2023).

Reviewer 6#:

(See attached PDF for a full evaluation.)

The authors presented an R software package, which implements statistical methods to estimate the actual number of new infections using the number of reported cases or the wastewater data. It is important to note that the package allows the input data to be sampled by a period higher then one day (e.g., aggregated weekly data is also acceptable). Still, the output is a daily time series, which allows to estimate the effective reproduction number using the already existing tool, EpiEstim. To estimate the hidden time-series (i.e., the unknown input) from the measured output, the Authors applied a deconvolution using an existing Richardson-Lucy implementation. As far as I know, this technique is equivalent to a dynamic inversion, which was already used to infer the effective reproduction number.

Although I cannot detect any scientific contribution in this manuscript, the ``attached'' R package may be useful for a certain community (e.g., public health practitioners), and the software description in the main body is clear and didactic.

The manuscript has therefore a raison d'être, possibly not in such a high impact journal (but the Editor is the final judge on that).

Anyway, the Authors need to be better justify why their software tool is preferable or more convenient compared to other existing packages.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

Reviewer #5: Yes

Reviewer #6: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: please abstract must be more clear

material and methods not Models and software 93

discussion must be wrirten more advanced refernce with more knowedge about your artcle

conculsion must be written with detail

Reviewer #2: The authors present the R package ern for the estimation of the effective reproduction number from wastewater or aggregated clinical surveillance data.

The package provides a framework for an efficient and quicker estimation of effective reproduction number using a user-friendly interface.

The manuscript is well-written and makes a relevant contribution to the field.

I thoroughly enjoyed reviewing this manuscript and only have some minor requests for revision, as follows:

Lines 39 to 42: Do not start a sentence using a citation number. In Line 39, you may write, "Huisman et al. [14] proposed a method....". Do the same for lines 40, 41 and 42.

Line 261: "...for the for the ...". Delete the repetition.

Reviewer #3: 1- It would have been better to talk about the Rt factor, which was mentioned in the research, in numerical terms, with something simple in the abstract

2-It was possible to dispense with some paragraphs in figure or table in the introduction

3-The researcher did not provide a research review of references that address the same topic, even in a simple way

4- It is possible to clarify the work algorithm in the form of clear points or in an algorithmic form, on the basis of which the steps of the example are clarified

5- The discussion was narrative and did not clarify any future idea or plan of action for researchers working in the future in the same field and what difficulties they may face.

Reviewer #4: 1. Describe dataset features in more details and its total size and size of (train/test) as a table.

2. Pseudocode / Flowchart and algorithm steps need to be inserted.

3. Time spent need to be measured in the experimental results.

4. Limitation and Discussion Sections need to be inserted.

5. The parameters used for the analysis must be provided in table

6. The architecture of the proposed model must be provided

7. Address the accuracy/improvement percentages in the abstract and in the conclusion sections, as well as the significance of these results.

8. The authors need to make a clear proofread to avoid grammatical mistakes and typo errors.

9. Add future work in last section (conclusion) (if any)

10. To improve the Related Work and Introduction sections authors are recommended to review this highly related research work paper:

a) Optimizing epileptic seizure recognition performance with feature scaling and dropout layers

b) Optimizing classification of diseases through language model analysis of symptoms

c) Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction

d) Utilizing convolutional neural networks to classify monkeypox skin lesions

e) Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt

Reviewer #5: I would like to thank the authors for putting together the R package ern and for submitting this accompanying manuscript. I agree with the assessment that there is a need for user-friendly statistical software to estimate reproduction numbers from wastewater concentration measurements. This manuscript explains the motivation for developing the package ern, i.e. to provide a dedicated interface to estimate Rt from wastewater and clinical case data. It then describes the statistical approach used and presents a vignette-style illustration of the main functionalities of the package. The authors also propose a new approach to disaggregate non-daily case counts for subsequent Rt estimation.

For transparency, I am a (co)author of two packages mentioned in my review, i.e. the package "estimateR" and the package "EpiSewer".

Kind regards

Adrian Lison

## Major points manuscript

First let me say that I found the manuscript clean and well-written.

### Related work

Furthermore, there are several related works that are worth mentioning in my opinion.

### Sensitivity analyses

### Limitations

I suggest to add unit information to the concentration and scaling factor. For example, what are the typical units of the exemplary data in ww.input?

### Disaggregation

more details, e.g. which sliding window was used in the example? Also in Figure 5, can you also plot Rt estimated from the daily incidence time series?

## Major points package / method

Lines 173: What output of EpiEstim is used for Rt? Do you use the estimated mean of Rt? Or do you draw samples from the posterior Gamma distribution estimated by EpiEstim?

I am particularly skeptical of providing default reporting delay distributions or reporting proportions as they will differ a lot between surveillance systems / countries etc.

There is an argument for subtypes/variants in def_dist_fecal_shedding, but it does not seem to do anything, I always get back the same distribution.

Also, def_dist_reporting_fraction does not accept a custom value and assumes a uniform distribution between 0.1 and 0.3 per default. This seems quite arbitrary to me.

## Minor points

Line 78: EpiNow2 and epidemia do not use the rstan package, they use cmdstanr package. I would just write "stan".

Lines 113: This is an important point and well explained.

Lines 137f, reporting delay: I would say more clearly that this is the delay between symptom onset and case report.

As a suggestion, you could register the current version of the package in an online archive like zenodo. This will give you a DOI for the package, which can then be referenced in the manuscript.

## References

1. Huisman, J. S. _et al._ Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. _eLife_ **11**, e71345 (2022).

2. Huisman, J. S. _et al._ Wastewater-Based Estimation of the Effective Reproductive Number of SARS-CoV-2. _Environmental Health Perspectives_ **130**, 057011 (2022).

3. Scire, J. _et al._ estimateR: an R package to estimate and monitor the effective reproductive number. _BMC Bioinformatics_ **24**, 310 (2023).

4. Lison, A. EpiSewer: Estimate Reproduction Numbers from Wastewater Measurements. Zenodo (2024).

6. Arts, P. J. _et al._ Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. _mSphere_ **8**, e00132-23 (2023).

Reviewer #6: (See attached PDF for a full evaluation.)

The manuscript has therefore a raison d'être, possibly not in such a high impact journal (but the Editor is the final judge on that).

Anyway, the Authors need to be better justify why their software tool is preferable or more convenient compared to other existing packages.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: rewan abdelaziz

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Tarek Abd El-Hafeez

Reviewer #5: Yes: Adrian Lison

Reviewer #6: Yes: Péter Polcz (Pázmány Péter Catholic University Faculty of Information Technology and Bionics)

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review.pdf

pone.0305550.s008.pdf^{(197KB, pdf)}

PLoS One. 2024 Jun 21;19(6):e0305550. doi: 10.1371/journal.pone.0305550.r002

Author response to Decision Letter 0

22 Apr 2024

please see file attached.

Attachment

Submitted filename: PLOS-responses-to-reviewers.pdf

pone.0305550.s009.pdf^{(386.1KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0305550.r003

Decision Letter 1

Salim Heddam

10 May 2024

PONE-D-24-01907R1ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance dataPLOS ONE

Dear Dr. Champredon,

Please submit your revised manuscript by Jun 24 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Reviewer 5#:Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

First, thank you for answering my question with regard to accounting for uncertainty of the Rt estimates towards the present. I understand the current deconvolution method is not able to account for this uncertainty. But I think it is very important to highlight this limitation more strongly in the manuscript, i.e. in the discussion, but also to make users aware of this in the package documentation etc. Otherwise, the use of the package could lead to wrong conclusions when using it for real-time surveillance. The uncertainty towards the present because of partial information is an inherent characteristic of the data and means that we can only get accurate Rt estimates with a certain delay. If this is not reflected in the uncertainty information provided by the method, users should be made aware of this limitation.

Second, you describe that to pool the uncertainty of the Rt estimates, you get the mean and quantiles for each realization and then compute means or quantiles of these across realizations. This seems rather approximate to me, and I wonder why you don't just draw e.g. 100 Rt samples from the posterior Gamma distribution as described by Cori et al. (you can use the estimated mean and cv from EpiEstim) for each realization, then combine all the draws across your realizations and compute the mean, empirical quantiles etc. on this pooled posterior sample. This should also be very fast and more accurate.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

Reviewer #5: (No Response)

Reviewer #6: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

Reviewer #5: Yes

Reviewer #6: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

Reviewer #6: Yes

**********

6. Review Comments to the Author

Reviewer #1: Thanks for your reply

Reviewer #2: (No Response)

Reviewer #3: (No Response)

Reviewer #4: An updated manuscript addressing previous comments and suggestions was evaluated positively. The updated submission demonstrates significant improvement and provides valuable insights relevant to the research community.

Reviewer #5: Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

Reviewer #6: Upon perusing the Authors' response letter, I have become convinced of the manuscript's significant contribution to the scientific community. In the revised manuscript, the authors have carefully outlined the discipline to which they wish to contribute and how. Table 1 is very useful and informative.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: Yes: Tarek Abd El-Hafeez

Reviewer #5: Yes: Adrian Lison

Reviewer #6: No

**********

PLoS One. 2024 Jun 21;19(6):e0305550. doi: 10.1371/journal.pone.0305550.r004

Author response to Decision Letter 1

28 May 2024

Response to Reviewers – Round 2

We would like to thank the reviewers for their time to read and comment on our revised manuscript.

In response to Reviewer #5’s comments:

Thank you for the thorough revision and for addressing the comments. I have two remaining remarks, which however do not necessitate another review round in my opinion.

Response: Thank you for this suggestion. We have added a paragraph in the Discussion section to reflect this limitation.

Response: Thank you very much for this suggestion. Indeed, the way this was implemented was not statistically correct (we were not aware of the EpiEstim function “sample_posterior_R”). It is now implemented as suggested (link). A few tests showed that the numerical difference between both implementations was very small. The main text has been edited to reflect this change.

Attachment

Submitted filename: Response to Reviewers - round 2.docx

pone.0305550.s010.docx^{(13.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0305550.r005

Decision Letter 2

Salim Heddam

2 Jun 2024

ern: an R package to estimate the effective reproduction number using clinical and wastewater surveillance data

PONE-D-24-01907R2

Dear Dr. Champredon

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Salim Heddam

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #5: Yes

**********

6. Review Comments to the Author

Reviewer #5: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #5: Yes: Adrian Lison

**********

PLoS One. doi: 10.1371/journal.pone.0305550.r006

Acceptance letter

Salim Heddam

7 Jun 2024

PONE-D-24-01907R2

PLOS ONE

Dear Dr. Champredon,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Salim Heddam

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Methodological differences when inferring daily incidence.

(PDF)

pone.0305550.s001.pdf^{(130.4KB, pdf)}

S2 File. Bayesian model to infer daily clinical report count.

(PDF)

pone.0305550.s002.pdf^{(152.8KB, pdf)}

S3 File. Linear interpolation to infer daily clinical report count.

(PDF)

pone.0305550.s003.pdf^{(170.8KB, pdf)}

S4 File. Sensitivity analysis to selected parameters.

(PDF)

pone.0305550.s004.pdf^{(170.2KB, pdf)}

S5 File. R code to perform sensitivitity analyses presented in S4 File.

(R)

pone.0305550.s005.R^{(7.8KB, R)}

S6 File. R code to evaluate the computing time of selected R packages that estimate the effective reproduction number.

(R)

pone.0305550.s006.R^{(7.6KB, R)}

S7 File. R code to associated with the methodological differences presented in S1 File.

(R)

pone.0305550.s007.R^{(7KB, R)}

Attachment

Submitted filename: Review.pdf

pone.0305550.s008.pdf^{(197KB, pdf)}

Attachment

Submitted filename: PLOS-responses-to-reviewers.pdf

pone.0305550.s009.pdf^{(386.1KB, pdf)}

Attachment

Submitted filename: Response to Reviewers - round 2.docx

pone.0305550.s010.docx^{(13.6KB, docx)}

Data Availability Statement

[pone.0305550.ref001] 1. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1991. [Google Scholar]

[pone.0305550.ref002] 2. Inglesby TV. Public health measures and the reproduction number of SARS-CoV-2. Jama. 2020;323(21):2186–2187. doi: 10.1001/jama.2020.7878 [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref003] 3. Kirby AE, Walters MS, Jennings WC, Fugitt R, LaCross N, Mattioli M, et al. Using wastewater surveillance data to support the COVID-19 response—United States, 2020–2021. Morbidity and Mortality Weekly Report. 2021;70(36):1242. doi: 10.15585/mmwr.mm7036a2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref004] 4. Castiglioni S, Schiarea S, Pellegrinelli L, Primache V, Galli C, Bubba L, et al. SARS-CoV-2 RNA in urban wastewater samples to monitor the COVID-19 pandemic in Lombardy, Italy (March–June 2020). Science of The Total Environment. 2022;806:150816. doi: 10.1016/j.scitotenv.2021.150816 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref005] 5. Melnick JL, et al. Poliomyelitis Virus in Urban Sewage in Epidemic and in Nonepidemie Times. American journal of hygiene. 1947;45(2):240–53. [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref006] 6. Crank K, Chen W, Bivins A, Lowry S, Bibby K. Contribution of SARS-CoV-2 RNA shedding routes to RNA loads in wastewater. Science of The Total Environment. 2022;806. doi: 10.1016/j.scitotenv.2021.150376 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref007] 7. Wölfel R, Corman VM, Guggemos W, Seilmaier M, Zange S, Müller MA, et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. doi: 10.1038/s41586-020-2196-x [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref008] 8. Jones DL, Baluja MQ, Graham DW, Corbishley A, McDonald JE, Malham SK, et al. Shedding of SARS-CoV-2 in feces and urine and its potential role in person-to-person transmission and the environment-based spread of COVID-19. Science of The Total Environment. 2020;749:141364. doi: 10.1016/j.scitotenv.2020.141364 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref009] 9. de Jonge EF, Peterse CM, Koelewijn JM, van der Drift AMR, van der Beek RF, Nagelkerke E, et al. The detection of monkeypox virus DNA in wastewater samples in the Netherlands. Science of The Total Environment. 2022;852:158265. doi: 10.1016/j.scitotenv.2022.158265 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref010] 10. Mercier E, D’Aoust PM, Thakali O, Hegazy N, Jia JJ, Zhang Z, et al. Municipal and neighbourhood level wastewater surveillance and subtyping of an influenza virus outbreak. Scientific Reports. 2022;12(1):15777. doi: 10.1038/s41598-022-20076-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref011] 11. Van den Driessche P, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical biosciences. 2002;180(1-2):29–48. doi: 10.1016/S0025-5564(02)00108-6 [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref012] 12. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American journal of epidemiology. 2013;178(9):1505–1512. doi: 10.1093/aje/kwt133 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref013] 13. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, R t. PLoS computational biology. 2020;16(12):e1008409. doi: 10.1371/journal.pcbi.1008409 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref014] 14. Alvarez L, Colom M, Morel JD, Morel JM. Computing the daily reproduction number of COVID-19 by inverting the renewal equation using a variational technique. Proceedings of the National Academy of Sciences. 2021;118(50):e2105112118. doi: 10.1073/pnas.2105112118 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref015] 15. Huisman JS, Scire J, Angst DC, Li J, Neher RA, Maathuis MH, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2. eLife. 2022;11:e71345. doi: 10.7554/eLife.71345 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref016] 16. Huisman JS, Scire J, Caduff L, Fernandez-Cassi X, Ganesanandamoorthy P, Kull A, et al. Wastewater-based estimation of the effective reproductive number of SARS-CoV-2. Environmental health perspectives. 2022;130(5):057011. doi: 10.1289/EHP10050 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref017] 17. McMahan CS, Self S, Rennert L, Kalbaugh C, Kriebel D, Graves D, et al. COVID-19 wastewater epidemiology: a model to estimate infected populations. The Lancet Planetary Health. 2021;5(12):e874–e881. doi: 10.1016/S2542-5196(21)00230-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref018] 18. Fazli M, Sklar S, Porter MD, French BA, Shakeri H. Wastewater-Based Epidemiological Modeling for Continuous Surveillance of COVID-19 Outbreak. medRxiv. 2021. doi: 10.1101/2021.10.19.21265221 [DOI] [Google Scholar]

[pone.0305550.ref019] 19. Nourbakhsh S, Fazil A, Li M, Mangat CS, Peterson SW, Daigle J, et al. A wastewater-based epidemic model for SARS-CoV-2 with application to three Canadian cities. Epidemics. 2022;39:100560. doi: 10.1016/j.epidem.2022.100560 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref020] 20. Jiang G, Wu J, Weidhaas J, Li X, Chen Y, Mueller J, et al. Artificial neural network-based estimation of COVID-19 case numbers and effective reproduction rate using wastewater-based epidemiology. Water research. 2022;218:118451. doi: 10.1016/j.watres.2022.118451 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref021] 21. Amman F, Markt R, Endler L, Hupfauf S, Agerer B, Schedl A, et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nature Biotechnology. 2022;40(12):1814–1822. doi: 10.1038/s41587-022-01387-y [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref022] 22.Lison A. adrian-lison/EpiSewer: EpiSewer 0.0.1; 2024. Available from: 10.5281/zenodo.10569102. [DOI]

[pone.0305550.ref023] 23. Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proceedings of the Royal Society B: Biological Sciences. 2015;282(1821):20152026. doi: 10.1098/rspb.2015.2026 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref024] 24. Boëlle PY, Ansart S, Cori A, Valleron AJ. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review. Influenza and other respiratory viruses. 2011;5(5):306–316. doi: 10.1111/j.1750-2659.2011.00234.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref025] 25. te Beest DE, Wallinga J, Donker T, van Boven M. Estimating the generation interval of influenza A (H1N1) in a range of social settings. Epidemiology. 2013; p. 244–250. doi: 10.1097/EDE.0b013e31827f50e8 [DOI] [PubMed] [Google Scholar]

[pone.0305550.ref026] 26.Cori A. EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves; 2023. Available from: https://github.com/mrc-ide/EpiEstim.

[pone.0305550.ref027] 27. Nash RK, Bhatt S, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: A statistical modelling approach and software tool. PLOS Computational Biology. 2023;19(8):e1011439. doi: 10.1371/journal.pcbi.1011439 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref028] 28.Sam Abbott, Joel Hellewell, Katharine Sherratt, Katelyn Gostic, Joe Hickson, Hamada S Badr, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020.

[pone.0305550.ref029] 29.Scott JA, Gandy A, Mishra S, Unwin J, Flaxman S, Bhatt S. epidemia: Modeling of Epidemics using Hierarchical Bayesian Models; 2020. Available from: https://imperialcollegelondon.github.io/epidemia/.

[pone.0305550.ref030] 30.Stan Development Team. Stan Modeling Language Users Guide and Reference Manual; 2024. Available from: https://mc-stan.org/.

[pone.0305550.ref031] 31. Scire J, Huisman JS, Grosu A, Angst DC, Lison A, Li J, et al. estimateR: an R package to estimate and monitor the effective reproductive number. BMC bioinformatics. 2023;24(1):310. doi: 10.1186/s12859-023-05428-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref032] 32. Cleveland WS. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association. 1979;74(368):829–836. doi: 10.1080/01621459.1979.10481038 [DOI] [Google Scholar]

[pone.0305550.ref033] 33. Richardson WH. Bayesian-based iterative method of image restoration. JoSA. 1972;62(1):55–59. doi: 10.1364/JOSA.62.000055 [DOI] [Google Scholar]

[pone.0305550.ref034] 34. Lucy LB. An iterative technique for the rectification of observed distributions. The astronomical journal. 1974;79:745. doi: 10.1086/111605 [DOI] [Google Scholar]

[pone.0305550.ref035] 35. Goldstein E, Dushoff J, Ma J, Plotkin JB, Earn DJ, Lipsitch M. Reconstructing influenza incidence by deconvolution of daily mortality time series. Proceedings of the National Academy of Sciences. 2009;106(51):21825–21829. doi: 10.1073/pnas.0902958106 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref036] 36. Kermack WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london Series A, Containing papers of a mathematical and physical character. 1927;115(772):700–721. [Google Scholar]

[pone.0305550.ref037] 37. Champredon D, Dushoff J, Earn DJ. Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation. SIAM Journal on Applied Mathematics. 2018;78(6):3258–3278. doi: 10.1137/18M1186411 [DOI] [Google Scholar]

[pone.0305550.ref038] 38.Government of Canada. COVID-19 wastewater monitoring dashboard: Viral signal trend; 2023. https://health-infobase.canada.ca/src/data/covidLive/wastewater/covid19-wastewater.csv.

[pone.0305550.ref039] 39.Government of Canada. COVID-19 epidemiology update: cases and deaths data; 2023. https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv.

[pone.0305550.ref040] 40. Challen R, Brooks-Pollock E, Tsaneva-Atanasova K, Danon L. Meta-analysis of the severe acute respiratory syndrome coronavirus 2 serial intervals and the impact of parameter uncertainty on the coronavirus disease 2019 reproduction number. Statistical Methods in Medical Research. 2022;31(9):1686–1703. doi: 10.1177/09622802211065159 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref041] 41.Statistics Canada. Table 17-10-0009-01 Population estimates, quarterly; 2023. 10.25318/1710000901-eng. [DOI]

[pone.0305550.ref042] 42. Gelman A, Rubin DB. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992;7(4):457–472. doi: 10.1214/ss/1177011136 [DOI] [Google Scholar]

[pone.0305550.ref043] 43.Plummer M. rjags: Bayesian Graphical Models using MCMC; 2023. Available from: https://CRAN.R-project.org/package=rjags.

[pone.0305550.ref044] 44. Arts PJ, Kelly JD, Midgley CM, Anglin K, Lu S, Abedi GR, et al. Longitudinal and quantitative fecal shedding dynamics of SARS-CoV-2, pepper mild mottle virus, and crAssphage. mSphere. 2023;8(4):e00132–23. doi: 10.1128/msphere.00132-23 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref045] 45.Uncertainties in estimating SARS-CoV-2 prevalence by wastewater-based epidemiology. Chemical Engineering Journal. 2021;415:129039. doi: 10.1016/j.cej.2021.129039 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref046] 46. Wade MJ, Lo Jacomo A, Armenise E, Brown MR, Bunce JT, Cameron GJ, et al. Understanding and managing uncertainty and variability for wastewater monitoring beyond the pandemic: Lessons learned from the United Kingdom national COVID-19 surveillance programmes. Journal of Hazardous Materials. 2022;424:127456. doi: 10.1016/j.jhazmat.2021.127456 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0305550.ref047] 47. McCall C, Fang ZN, Li D, Czubai AJ, Juan A, LaTurner ZW, et al. Modeling SARS-CoV-2 RNA degradation in small and large sewersheds. Environ Sci: Water Res Technol. 2022;8:290–300. doi: 10.1039/D1EW00717C [DOI] [Google Scholar]

PERMALINK

ern: An R package to estimate the effective reproduction number using clinical and wastewater surveillance data

David Champredon

Irena Papst

Warsame Yusuf

Roles

Abstract

Introduction

Table 1. A comparison of ern with other R packages built to estimate Rt from epidemiological data.

Materials and methods

Fig 1. Overview of the ern data pipeline to estimate Rt.

Estimating daily incidence with wastewater data

Estimating daily incidence with clinical data

Daily incidence to Rt

Generating an Rt ensemble reflecting uncertainty

Results

Example with wastewater data

Fig 2. Fecal shedding distribution example.

Fig 3. Output of the function plot_diagnostic_ww.

Example with clinical data

Fig 4. Output of the function plot_diagnostic_cl.

Sensitivity analysis for wastewater Rt

Computing time benchmarks

Table 2. Sample computing times (in seconds) for Rt estimates using different R packages.

Discussion and conclusions

Computational details

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Salim Heddam

Roles

Author response to Decision Letter 0

Decision Letter 1

Salim Heddam

Roles

Author response to Decision Letter 1

Decision Letter 2

Salim Heddam

Roles

Acceptance letter

Salim Heddam

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

`ern`: An $R$ package to estimate the effective reproduction number using clinical and wastewater surveillance data

Table 1. A comparison of `ern` with other $R$ packages built to estimate $R_{t}$ from epidemiological data.

Fig 1. Overview of the `ern` data pipeline to estimate $R_{t}$ .

Daily incidence to $R_{t}$

Generating an $R_{t}$ ensemble reflecting uncertainty

Fig 3. Output of the function `plot_diagnostic_ww`.

Fig 4. Output of the function `plot_diagnostic_cl`.

Sensitivity analysis for wastewater $R_{t}$

Table 2. Sample computing times (in seconds) for R_t estimates using different R packages.