Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Mar 1;827:154235. doi: 10.1016/j.scitotenv.2022.154235

Model-based assessment of COVID-19 epidemic dynamics by wastewater analysis

Daniele Proverbio a, Françoise Kemp a, Stefano Magni a, Leslie Ogorzaly b, Henry-Michel Cauchie b, Jorge Gonçalves a,c, Alexander Skupin a,d,e, Atte Aalto a,
PMCID: PMC8886713  PMID: 35245552

Abstract

Continuous surveillance of COVID-19 diffusion remains crucial to control its diffusion and to anticipate infection waves. Detecting viral RNA load in wastewater samples has been suggested as an effective approach for epidemic monitoring and the development of an effective warning system. However, its quantitative link to the epidemic status and the stages of outbreak is still elusive. Modelling is thus crucial to address these challenges. In this study, we present a novel mechanistic model-based approach to reconstruct the complete epidemic dynamics from SARS-CoV-2 viral load in wastewater. Our approach integrates noisy wastewater data and daily case numbers into a dynamical epidemiological model. As demonstrated for various regions and sampling protocols, it quantifies the case numbers, provides epidemic indicators and accurately infers future epidemic trends. Following its quantitative analysis, we also provide recommendations for wastewater data standards and for their use as warning indicators against new infection waves. In situations of reduced testing capacity, our modelling approach can enhance the surveillance of wastewater for early epidemic prediction and robust and cost-effective real-time monitoring of local COVID-19 dynamics.

Keywords: COVID-19, Wastewater-based epidemiology, Surveillance of wastewater for early epidemic prediction (SWEEP), Epidemiological modelling, Kalman filter, Early warning system

Graphical abstract

Unlabelled Image

1. Introduction

Effective mitigation of the COVID-19 pandemic relies on accurate estimates of the epidemic dynamics. Over the past months, vaccination programs and non-pharmaceutical interventions have managed to contain the diffusion of COVID-19 and many countries are aiming to return to normality, but the virus and its appearing variants might relapse once such measures are withdrawn. Hence, it is imperative to continue monitoring even after active RT-PCR or antigen testing is reduced. Analysing SARS-CoV-2 abundance in wastewater offers a cost-effective alternative to population-based large scale testing (Farkas et al., 2020; Larsen and Wigginton, 2020) and is largely independent of healthcare-seeking behaviors, access to clinical testing and asymptomatic cases (Peccia et al., 2020). It thus bears the potential for faster and more reliable warning indications for long-term epidemic surveillance (Weidhaas et al., 2021; Wurtzer et al., 2020; Randazzo et al., 2020). To date, more than 50 countries and 260 universities have wastewater surveillance systems in place (Naughton et al., 2021), and wastewater sampling to search for COVID-19 traces has been effectively used in multiple occasions (Ahmed et al., 2021; Quilliam et al., 2020; Reeves et al., 2021).

Despite its high potential, wastewater-based epidemiology has still several hurdles to overcome, related to experimental, data processing and modelling procedures (Daughton, 2020; Bandala et al., 2021; Lahrich et al., 2021). In particular, the usefulness of its results would greatly improve with models that can infer the size of the shedding population, account for uncertainties and infer future outcomes from current trends (Tiwari et al., 2021). In a recent call for models, Zhu et al. (2021) stress that modelling tools to reduce uncertainties and cope with noisy data are highly demanded, along with other mathematical models presenting different structures that could decently capture the correlation between viral load in wastewater and the shedding/infected population. Recently, a number of studies have proposed methods to infer the size of the shedding population, based on the viral abundance sampled in wastewaters. Some are restricted to qualitative and semi-quantitative retrospective studies of lagged correlations (Nemudryi et al., 2020; Kumar et al., 2020), others employ parametrical and non-parametrical regression models (Cao and Francis, 2021; Vallejo et al., 2021; Li et al., 2021; Hasan et al., 2021; Huisman et al., 2021); an alternative is to use an epidemiological SEIR (Susceptible-Exposed-Infectious-Recovered) model informed by estimates of individual viral trajectories (McMahan et al., 2021). These models, tested on one specific country at a time, well showed the difficulty of estimating the true number of positive cases from wastewater samples, because of noisy data (including those of detected case numbers), uncertain ratios of detected and true positive cases, and the variety of individual infection periods.

In this article, we propose an alternative, automated and causal-based approach to address the aforementioned challenges. We develop a new method that couples a Susceptible-Exposed-Infectious-Recovered (SEIR) epidemiological model (Anderson and May, 1979) to the extended Kalman filter – EKF (Kalman, 1960), a natural approach for combining noisy measurement data and modelling. The EKF deals effectively with the problem of producing model simulations that are representative of real system observations, which are in turn prone to uncertainties. It has been a standard tool in systems theory, with many applications ranging from automated control (Nagy Stovner et al., 2018) to finance (Davis and Leo, 2013), bio-mechanics (Marchesseau et al., 2013), time series analysis (Harvey, 1990) etc. Such an approach allows employing a validated epidemiological model (the SEIR, like in McMahan et al. (2021)) with a selected number of free parameters, fitted during the Kalman filtering steps using Bayesian methods. This way, the model estimates population-level COVID-19 diffusion without the need to rely on individual viral trajectories. It is also very flexible with respect to sampling routines and considered regional areas. Hence, including empirical measurements and tuning the model can be done in a straightforward way. In addition, the underlying SEIR model allows interpretation of the inferred infection dynamics in terms of transmitting interactions and overcomes the extrapolation limitations of correlation-based statistical approaches (Cao and Francis, 2021; Li et al., 2021). A Kalman filter was used for wastewater viral abundance data by Cluzel et al. (2022) and Courbariaux et al. (2022), but they employed a linear first-order autoregressive model. This makes it a signal processing tool primarily aimed at reducing noise in the wastewater measurements, setting our approach apart.

After calibration, our CoWWAn (COVID-19 Wastewater Analyser) method causally and quantitatively links the results of wastewater analysis with those of population-wide testing, also accounting for the “dark number” (ratio of real and detected case numbers) through meta-parameters. This way, we are able to quantify the goodness-of-fit to observed cases (what is empirically measured) and to justify the reconstruction of the shedding population. Then, CoWWAn causally infers the shedding population, estimates the effective reproduction number R eff, and provides projections of future epidemic trends. Quantifying these variables enables assessment of the epidemic status within a region and comparison between regions, and supports effective mitigation policy making. We also use this model to quantitatively assess the warning potentials of wastewater monitoring, combined with population testing or on its own. Originally, we applied CoWWAn in support to the Research Luxembourg COVID-19 taskforce of our local government. Moreover, to demonstrate its general applicability, we applied CoWWAn to public datasets from 12 regional areas from Europe and North America, associated with different population sizes and based on different wastewater data processing protocols.

2. Materials and methods

2.1. Data

Our pipeline requires at least three types of data to be calibrated: data about the COVID-19 RNA load in wastewater, detected cases associated to the area covered by the sewage system, and possibly the estimates about the ratio of detected versus true case numbers. In addition to its routine application to Luxembourg data, CoWWAn was tested on various datasets with different normalisation protocols for wastewater data, to show its general applicability after proper calibration. The dataset was constructed according to the following criteria. First, we employed the COVID19 Poops Dashboard (Naughton et al., 2021) to list all worldwide resources about wastewater sampling projects; among those, we focused on those having readily accessible databases. To allow proper calibration of the model, we selected time series data starting no later than beginning 2021 covering a time range of at least six months, having at least one sample per week on average and having the corresponding detected case numbers available. We rejected wastewater data with smoothing among data points to avoid introducing bias and breaking the causality of projections. Finally, if time series from multiple treatment plants were available from a single regional database, we selected two representative ones, usually with the largest population. The detected case numbers were obtained from the same publicly available databases and corresponded to officially reported numbers, confirmed with positive RT-PCR tests. We made sure that, for the majority of the time period considered, the selected countries did not have a share of positive tests exceeding 5%, which would indicate severe undertesting (according to WHO guidelines, https://bit.ly/3dARcy1) and would thus bias the results. When available, we also traced seroprevalence studies to better tune our pipeline to specific regional areas (e.g. Snoeck et al. (2020); Pollán et al. (2020)).

As a result, the selected datasets are: Barcelona Prat de Llobregat (Spain), Kitchener (Canada), Kranj (Slovenia), Lausanne (Switzerland), Ljubljana (Slovenia), Luxembourg, Milwaukee (USA), Netherlands, Oshkosh (USA), Raleigh (USA), Riera de la Bisbal (Spain), Zurich (Switzerland). Data from Luxembourg sewage sampling were made available by the Research Luxembourg COVID-19 initiative CORONASTEP (researchluxembourg.lu/coronastep), while detected case numbers and R eff were obtained from the Luxembourg Ministry of Health website (COVID19.public.lu/fr/graph). The other datasets (including both wastewater data and case numbers) were downloaded from publicly available official sources, listed in Supplementary Table 1. All datasets are updated up to August 2021. We refer to each source (see Supplementary Table 1) for details about the experimental protocols, for the units of measure of wastewater data and for their associated equivalent population, as well as for the detected case numbers.

2.2. Preliminary analysis

Among the data collected, we observed some peculiarities. First, Raleigh county reported case numbers normalised to 10,000 inhabitants and rounded to an integer value; their subsequent up-scaling induces a further uncertainty. Second, the countrywide wastewater data for Netherlands are reported as averages over a week. To improve the temporal resolution of the data, we used instead the data from all communal treatment plants, averaging over samples from the same day. Third, the wastewater data from Kitchener have a sudden jump on May 17, 2021 during a time when case numbers remain stable (Supplementary Fig. 1). Interestingly, the performance of our method increased considerably after scaling data after that date by a factor of 0.4, suggesting possible sudden changes in testing or sampling strategies. This extra analysis shows the impact of including corrections for different testing policies. Results in the main text are shown without this scaling, but we report results with and without scaling in Supplementary Table 2.

We also investigated other prominent features of case numbers and wastewater data, to inform the development of the model. Considered time series of tested positive case numbers and of RT-qPCR wastewater data, as well as their mutual relationship, are shown in Supplementary Figs. 1 and 2. The figures highlight the close but not perfect correlation between case numbers and wastewater data, stressing both the usefulness of wastewater data for epidemic monitoring, and the importance of models based on complex epidemiological dynamics. The fact that the mutual relationship between case numbers and wastewater data is not perfectly linear justifies the inclusion of a scaling parameter in the cost function used for parameter fitting (Eq. (5) in Section 2.5).

2.3. The SEIR stochastic model

As a basis for the Extended Kalman filter to model the epidemic dynamics, we use a modified SEIR model, which has been shown to accurately describe COVID-19 epidemic dynamics (Proverbio et al., 2021; He et al., 2020). As we aim at estimating community incidence from noisy data, we choose a simple and descriptive model rather than a complex one, which is difficult to calibrate and could suffer from identifiability issues (Roda et al., 2020; Kemp et al., 2021).

The classic, deterministic SEIR model considers Susceptible S(t), Exposed E(t), Infectious I(t) and Removed R(t) compartments, and population flows governed by rate parameters. We follow the standard interpretation of E compartment as the set of individuals who have been exposed and infected, but who are not yet infectious due to incubation lag (Lai et al., 2020). The mean incubation period α −1 models the progression to becoming contagious (Kollepara et al., 2021). The total community population is conserved, i.e. S(t) + E(t) + I(t) + R(t) = N (with constant N) and we assume no possibility of re-infection during each period of infection transmission (no RS). The latter assumption is supported by waning immunity being estimated in a matter of months (Goldberg et al., 2021). To model intrinsic stochasticity in transmission processes and viral shedding, we employ a stochastic version of this SEIR model, associating each transition between compartments with a random process:

{ddtS(t)=β(t)S(t)I(t)Nβ(t)S(t)I(t)Nw1(t)ddtE(t)=β(t)S(t)I(t)NαE(t)+β(t)S(t)I(t)Nw1(t)αE(t)w2(t)ddtI(t)=αE(t)τI(t)+αE(t)w2(t)τI(t)w3(t)ddtR(t)=τI(t)+τI(t)w3(t) (1)

where w j are mutually independent white noise processes. See Appendix A for details. The β-parameter is assumed to be time-varying, reflecting changes in social interaction, other mitigation measures (masks, vaccines, etc.), and varying infectivity of emerging viral variants. β(t) will as well be estimated by the Kalman filter.

In order to model viral flows into wastewater, we introduce another variable A(t) to model the effective number of active shedding cases producing virions to wastewater. Similarly to above, we incorporate stochastic processes. The dynamics of A are given by:

ddtA(t)=β(t)S(t)I(t)NγA(t)+β(t)S(t)I(t)Nw1(t)γA(t)w4(t). (2)

The A compartment is parallel to E, I, and R, that is, it still holds that S(t) + E(t) + I(t) + R(t) = N. The influx to the A compartment is the same as that to the E compartment, while the outflux lumps together the dynamics of viral production (which is known to follow some kinetic trajectory in the hosts' body (Néant et al., 2021)), the decay rate of SARS-CoV-2 RNA in water (Gundy et al., 2009; Sala-Comorera et al., 2021), and inertia in abundance dynamics due to mixing in wastewater collecting pools. Since the parameter γ lumps together properties of the virus and details on wastewater sampling, it is separately fitted for each region. We do not take into account delays associated with in-sewer travel time, as it was estimated to be significantly lower than the transmission time scales (median of 3.3 h versus 1 day (Kapo et al., 2017)). The A compartment thus allows to better follow the time evolution, including potential decaying inertia, and to consider explicitly the uncertainties associated to the shedding mechanism instead of the disease progression. Together, Eqs. (1), (2) form the combined SEIR-WW system.

The outputs from the model that are compared to the real-world measurements are the number of daily detected cases and the virion abundance in wastewater. The number of detected cases on day t is assumed to be a share of people passing the incubation period on that day, that is,

yc(t)=ctt1tαE(s)ds, (3)

where c t ∈ [0,1] is the share of detected cases out of all cases, to account for under-testing and asymptomatic cases (see Section 2.5 and Eq. (C.4) for further discussion). c t might depend on the day of the week, since there often are some weekday-dependent fluctuations in testing. The virion abundance in wastewater is assumed to be linearly dependent on A,

yw(t)=νA(t), (4)

where ν is a tuning parameter to reflect the incubation, production and shedding of viral load from infected people (Néant et al., 2021; Wölfel et al., 2020; Miura et al., 2021) and normalisation of the wastewater data. We do not consider explicit corrections linked to precipitations or other environmental factors, as previous studies evaluated them to be poorly correlated with RT-qPCR observations (Vallejo et al., 2021; Li et al., 2021). An implicit tuning is nonetheless included in the fitting, cf. Eq. (5).

2.4. The complete SEIR-WW-EKF model

An Extended Kalman filter requires an underlying dynamical model (such as a SEIR-like one), its output and associated noise covariance matrices, and measurement data. The extended Kalman filter algorithm to estimate the state of the SEIR-WW system, based on different types of data, is presented in Algorithm 1. The inputs for the algorithm are the update function f(x) (implementing Eq. (1)), the observation matrices C(t) (for case numbers and wastewater data), the state noise Q (uncertainty on estimated variables), and the measurement error covariance U(t) (uncertainties on empirical data). Then, the method evaluates the set of variables of interest (x 1…6(t) = [S, E, I, A, D, β](t)) and their associated uncertainty matrix P.

The algorithm is used to calculate three different estimates using only case number data, only wastewater data or using both case and wastewater data. These were then used to estimate the data that were not employed for the state estimation, initially for calibration and reconstruction of daily cases, and then to perform the desired predictions.

For details about the numerical implementation, the characterization of inputs and outputs and the discussion of each matrix introduced in Algorithm 1, we refer to Appendix B. Our current implementation is done with custom Matlab 2019b code (see Code Availability section).

Algorithm 1

The Extended Kalman filter for the SEIR-WW model with time step Δt = 1/M (we use M = 10 d−1). J f is the Jacobian of the function f(x), obtained from the Jacobian of the reaction function by J f = BJ r. The algorithm is standard, but the prediction step consists in solving a time-discretised ODE. The observation matrix C(t) is chosen from the three possibilities described in (B.2). Note the resetting of Dt=x~5 before the prediction loop.

Unlabelled Image

2.5. Model parameters

Our model comes with a number of free parameters to be fixed from the data or with educated assumptions. As most time series data begin after the pandemic already diffused within a region, the initial sizes for the E and I compartments are automatically computed (cf. Appendix C for details).

Another parameter to be estimated is the average ratio of total and detected cases at day t, η t. This is necessary to link the measurements of population testing with those of wastewater analysis (ideally objective and insensitive to testing capacities). Implementing the results of early prevalence studies (Snoeck et al., 2020), we use η t = 3 for the first wave in Luxembourg (until June 1, 2020). Later, we use η t = 1.8. This choice was cross-validated with an independent SEIR model fitted to Luxembourg data (Kemp et al., 2021). The reduction is partially due to the launch of a large scale testing campaign in Luxembourg (Wilmes et al., 2021), and partially to overall increased testing activity. Further details on parameter values are discussed in Appendix C. For other regions, the available prevalence studies usually consider the early stages of the epidemic, but wastewater data for the corresponding period are often not available. Large changes between first and subsequent waves are expected for all regions, and therefore estimates from these prevalence studies are not usable for later stages. In the absence of additional reliable values, we maintain η t = 1.8 for all other regions. It is possible to further calibrate such values with further tailored prevalence studies. In principle, it is sufficient to have one estimate for η t to match virion abundance in wastewater with total case numbers. Once the calibration is done, CoWWAn estimates the total number of infections, including both detected and undetected cases. In order to obtain an estimate of detected cases, a potentially time-varying estimate of η t is then needed. Daily ratios c t of detected and total cases are obtained from η t modulated by a weekly testing rhythm that is automatically estimated (cf. Appendix C). The variance of the wastewater measurements U w is estimated from data as discussed in Appendix C.

A final detail to consider when optimising the model to reproduce the observations: due to dilution, non-mixing environment and other factors, the dependency of the wastewater measurement on the number of detected cases is not perfectly linear (Vallejo et al., 2021) (see also Supplementary Figs. 1 and 2). Hence, we do a simple power transformation to the wastewater samples, for which the exponent ε is regarded as a tuning parameter of slight nonlinearity. ε and the other proportional parameters γ and ν are fitted by calculating the Kalman filter state estimate using the wastewater data, and then minimising the cost function

minγ,ν,εt=1MyctCctx^wtγνε2such thatγ0.2,4,ε0.4,1. (5)

This way, we minimise the error in estimating the case numbers by the EKF state estimate using only wastewater data. Model parameters, either fixed by literature or fitted from Eq. (5), are reported in Table 1 . Note that, due to different wastewater data normalisations, ν parameters are not comparable between regions. Similarly, ε parameters might depend on the used techniques. Data from different laboratories may contain significant differences (Cluzel et al., 2022).

Table 1.

Model parameters: Parameter symbols, descriptions, values, and their sources. The parameter qβ, controlling the allowed change of β(t) in one day, is changed after 30 days. This is done to allow rapid changes in the beginning of the pandemic, when a strict lockdown quickly suppressed its propagation and to account for errors in initial β(0). d stands for “days”. When the source is not indicated, the parameter values is first initiated as an educated guess and then tested with sensitivity analysis (see Supplementary Fig. 18).

Symbol Explanation Value Source
α Rate EI 0.44 d−1 (Kemp et al., 2021)
τ Rate IR 0.32 d−1 (Kemp et al., 2021)
β(0) Initial infectivity 0.44 d−1
Δt Time step length 0.1 d
qβ, 1 Variance of β(t + 1) − β(t) when t ≤ 30 0.052d−2
qβ, 2 Variance of β(t + 1) − β(t) when t>30 0.0052d−2
κ EKF sensitivity parameter 4
N Population size Regional See Supplementary Table 1
γ Rate A → ∅ Regional Fitted by Eq. (5)
ν Ratio of yw/A Regional Fitted by Eq. (5)
ε Exponent in nonlinear mapping of WW data Regional Fitted by Eq. (5)
UW Measurement error variance of wastewater data Regional Estimated by Eq. (C.5)
E(0) Initial size of E-compartment Regional Estimated by Eq. (C.3)
I(0) Initial size of I-compartment Regional Estimated by Eq. (C.3)
var(E(0)) Uncertainty of E(0) Regional (E(0)/2)2
var(I(0)) Uncertainty of I(0) Regional (I(0)/2)2

The sensitivity of the model performance on assumed parameter values is assessed in Supplementary Fig. 18, which demonstrates the robustness of the model and justifies the current parameter choices. The sensitivity analysis was performed by varying the reference parameters up to ±50% of their original value. The results are reported in Supplementary Fig. 18, using Luxembourg as a reference. For most parameters, the projections are consistent and slightly vary for values very far from the reference ones. The model is most sensitive to the parameter c t, which is usually estimated with independent methods. The minimal error corresponds to the reference value, while deviations induce larger errors. In our pipeline, changes in c t are normally compensated by a change in ν by the same amount. This observation justifies the differing fitted values reported in Table 2 for each region and recalls that, the more accurate seroprevalence studies are, the smaller the error associated with projections would be. The projection error grows slower for overestimated c t than for underestimated c t. Therefore, in case a good estimate of the share of detected cases out of all cases is lacking, it is advised to use a possibly overestimated rather than underestimated value for short-term projections. Note, however, that this may lead to higher overshoots in long-term projections due to overestimation of the susceptible population size.

Table 2.

Model parameters: fitted values. Region-dependent fitted parameter values. Initial values for SEIR compartments are in units of equivalent inhabitants.

Parameter Barcelona Kitchener Kranj Lausanne Ljubljana Luxembourg
N 2,000,000 242,000 40,000 240,000 280,000 634,730
γ 0.20 d−1 4.00 d−1 1.43 d−1 3.05 d−1 3.21 d−1 1.62 d−1
ν 4.86 ⋅ 10−2 2.73 1.38 6.67 ⋅ 1010 2.77 6.40 ⋅ 104
ε 0.40 0.40 0.40 1.00 0.526 0.613
UW 656 2.68 58.5 3.43 ⋅ 1023 511 1.75 ⋅ 1012
E(0) 1824 379 17 156 76 8
I(0) 2527 525 24 203 105 11



Parameter Milwaukee Netherlands Oshkosh Raleigh Riera Zurich
N 615,934 17,178,109 68,000 460,000 100,000 450,000
γ 4.00 d−1 0.368 d−1 4.00 d−1 4.00 d−1 1.20 d−1 0.547 d−1
ν 0.134 185 3.73 2.58 ⋅ 103 12.7 1.87 ⋅ 107
ε 1.00 0.500 0.866 0.434 0.400 0.789
UW 1.03 2.50 ⋅ 1011 50.7 6.00 ⋅ 108 1.69 ⋅ 103 3.44 ⋅ 1017
E(0) 298 4412 166 1815 4 238
I(0) 413 6112 231 2514 6 330

2.6. Analysis of model outputs

To obtain variables of epidemiological interest, we further analysed the state estimates outputted by the SEIR-EKF model. Two estimates using only wastewater data are computed: one without interpolating data between sampling days (WW) and one with linear interpolation (ipWW).

The effective reproduction number R eff, the time-dependent average number of secondary infections from a single contagious case in a susceptible population (Althaus, 2014), is directly extrapolated as (Kemp et al., 2021)

Reff=β(t)τS(t)N, (6)

where β(t) and S(t) are state estimates. For N and τ, see Table 1.

Short and mid-term projections are possible at any time t 0 by stopping the Kalman filtering and simulating the model forward in time, starting from the latest state estimate and keeping the infectivity parameter constant (β(t) = β(t 0)). The effect of uncertainty in the parameter estimate β(t 0) can be quantified by simulating envelopes using βt0±2Pt066 in the simulation (for every t, P(6, 6) represents the variance associated to β in the Kalman filter update, as discussed in Algorithm 1). Note that other uncertainties are omitted in these simulations; therefore, the short-term uncertainty in particular is under-estimated by the envelope.

Quantifying the quality of short-term projections using either case data only, wastewater data only, or both provides more reliable estimates of the epidemic unfolding over short time horizons. At each time step when wastewater data is available, the Kalman filter state estimation is stopped, and the SEIR-WW model is simulated T days forward without taking into account any new data. The total number of observed cases from the projection is calculated and compared with the actual number of observed cases during the same time horizon. Their absolute difference constitutes the prediction error. The prediction errors are standardised by the square root of the true number of cases, which represents the standard deviation estimate (assuming case numbers on a given time are binomially distributed). The standardised scores so obtained are then averaged over all time points on which the prediction is made, obtaining an overall average normalised error. To enable comparison between countries, the average standardised error is scaled per 100,000 equivalent inhabitants. Overall, the scaled average standardised prediction error ξ is:

ξ=1Mi=1Myiyˆijyi100,000N, (7)

where i is the index of each point in any time horizon [t 0, T] with M points in total; j is an index that considers the original type of data used for projections, i.e. j = {c, w, b} for case data only, wastewater data only, or both combined (note that, in the state estimate using combined data, the wastewater data are not interpolated); hatted variables are the Kalman projections while non-hatted variables correspond to measured data; N is the equivalent population of interest (cf. Table 2).

3. Results

3.1. Data integration into a SEIR model

The workflow of our CoWWAn approach, integration of empirical data into a SEIR model through the Extended Kalman filter, is illustrated in Fig. 1 . The implemented SEIR model contains an additional compartment for active cases producing virions to wastewater. A detailed description of the workflow can be found in Section 2.4 (see in particular Algorithm 1). In a broad sense, our proposed Kalman filter combines a model of a dynamical system with measurements (case numbers y c, wastewater measurement y w or both) obtained from the real system that is being modelled. At each time step it first predicts the next state – the set of all variables – by propagating the old state estimate using the underlying model. From the predicted state estimate, the predicted measurement is calculated using the measurement model. Finally, the state estimate is updated based on the discrepancy of the true measurement and the model-predicted measurement. The model's state estimate then reflects the state of the real system, and it can be used to predict the system's dynamics in the future.

Fig. 1.

Fig. 1

Model workflow. The Kalman filter combines measurements from the real system with predictions from the dynamical model, which extends a SEIR model. Empirical data are daily positive cases, shown in blue as the smoothed moving average, and wastewater sampled data, shown in orange with unit of measure of RNA copies/day/100,000 equivalent inhabitants (example for Luxembourg). Details of the SEIR blocks are described in Section 2.4.

3.2. Reconstruction of case numbers

After appropriate calibration to test cases with parameter fitting, CoWWAn quantitatively reconstructs the time evolution of observed cases from wastewater data (Fig. 2a) by inferring the internal variables and parameters of the SEIR model. These include the susceptible, exposed and infectious population fractions, daily detected cases and time-dependent infection rate (see Section 2.4). In our case studies, full time series data were used for calibration for each region. When clear regime shifts in testing/sampling protocols are observed, the model can be re-calibrated appropriately to improve the performance, like for Kitchener (cf. Section 2.2 and Supplementary Table 2). To infer the global shedding population, the model needs additional information on the ratio of total and detected cases, typically obtained from seroprevalence studies (see Section 2.5). Thanks to the model structure, we could thus compare our results with the true number of detected cases (Fig. 2a, red and black lines), before inferring the global magnitude of the shedding population (Fig. 2a, blue line, from mean estimates). The latter is an extrapolation from CoWWAn estimates and information about the “dark number” of undetected cases (provided as a model parameter); further independent studies to estimate this quantity help fine-tune the results.

Fig. 2.

Fig. 2

Reconstruction of case numbers and inference of epidemic indicators. a: Reconstruction example for Luxembourg. Top: Comparison of case numbers, official detected data (black line), reconstructed by CoWWAn from wastewater data (red) including the 2 Standard Deviations ≃95% confidence interval (shadowed region), and total positive cases inferred by CoWWAn (blue). Bottom: Reff, estimated by CoWWAn (red, with its associated 2 SD shadowed region) or officially reported by the Luxembourg Ministry of Health. b: Pearson's correlation coefficients ρ from linear regression between detected cases and measured wastewater data (blue), ρ between detected cases and CoWWAn-reconstructed case numbers from wastewater data (red, corresponding to correlation values from panels c), and ρ between CoWWAn-reconstructed case numbers from wastewater data (after interpolating wastewater data) and detected cases (yellow). c: Reconstruction results for all considered regional areas, compared with detected case numbers. The dashed line represents equal values.

We compared our results with a linear regression model (after data curation to reduce the noise, in a similar spirit to (Vallejo et al., 2021)): CoWWAn's inferences achieve consistently higher correlation (Fig. 2b, blue and red sets), demonstrating the power of our mechanistic-based approach. These observations hold for all considered regions (Fig. 2c and Supplementary Figs. 3–14): the correlation coefficient ρ between inferred case numbers and true detected case numbers is typically in the range between 0.7 and 0.9 even for rather noisy data like Netherlands. Frequent sampling improves the model calibration and the subsequent reconstruction performance, like for Luxembourg with ρ=0.91 for two probes/week and Milwaukee with ρ=0.95 for two (sometimes more) probes/week compared e.g. to Barcelona with ρ=0.70 with one probe/week (Fig. 2d). The main discrepancies originate from either unnoticed changes in the share of detected cases or from changes in testing/sampling strategies (Supplementary Figs. 3–14). In addition, we notice (Fig. 2a and c) that the largest uncertainties come together with the highest case numbers, which are often associated to an augmented positivity rate (Ritchie et al., 2020). Detecting such discrepancies can provide additional evidence about potential undertesting and could guide targeted scaling of population tests. Interpolating wastewater data points before the EKF estimation can improve the reconstruction (Fig. 2b, red and yellow sets), in particular for regions with low sampling frequency like for Barcelona Prat de Llobregat (PdL) and Kranj. In general, the Extended Kalman filter improves its predictions as new data points are available, so an adequate sampling rate is recommended to improve its performance.

3.3. Estimation of epidemic indicators

CoWWAn allows estimation of the effective reproduction number R eff, an essential indicator for the trends of epidemic diffusion in a community (Huisman et al., 2021), which depends on containment measures, infectivity of viral variants, population behavior and other factors. As exemplified for Luxembourg (Fig. 2a), the R eff values inferred by CoWWAn from wastewater data (according to Eq. (6)) are consistent with the indicator reported by the Ministry of Health on its website (see Section 2.1) and exhibit the same noteworthy trends: the three waves in 2020 (March, June and late October), a small rebound in March 2021 attributed to the emergence of the alpha variant and one wave in late June 2021 attributed to the emergence of the gamma and delta variants, all characterised by R eff>1. For all other considered regions as well, wastewater-based R eff values (estimated with the same method and reported in Supplementary Figs. 3–14) are consistent with those estimated from case numbers using a SEIR model, and are usually smoother due to sampling frequency and independence to testing schemes. The R eff estimates reflect the trends in the development of the pandemic. On average, the R eff estimates are lower in the year 2021 compared to 2020 due to vaccine rollout, but new viral variants have still caused significant waves despite increasing vaccination coverage.

3.4. Short-term predictions of epidemic trends

CoWWAn's underlying SEIR model permits mechanistic-based predictions of the infection dynamics, for effective monitoring of the epidemic. To predict future trends, it is possible to simulate the model forward at any desired time, starting from the latest state estimate and keeping the transmission parameter constant (cf. Section 2.6). For the epidemic dynamics in Luxembourg, Fig. 3a shows an example of such 7-days predictions for each day of wastewater sampling, where the number of detected cases (blue) is compared with the predicted numbers derived from wastewater data or from case number data. Wastewater-based short-term predictions are well correlated both with case-based projections (ρ=0.95) and with true case numbers (ρ=0.94).

Fig. 3.

Fig. 3

Predictions of future epidemic trends using CoWWAn. a: Prediction examples for Luxembourg, comparing predictions over the 7-days ahead of each point (either estimated from case numbers or wastewater data) with the true detected cases in the same time period. b: Comparison of wastewater-based and cases-based predictions. The performance is evaluated in terms of average standardised error, normalised to equivalent population. The dashed line represents equal values. Error bars correspond to one standard deviation. c: Predictions performance for different time horizons (mean and 80th percentiles over the considered regions; outputs for single countries in Supplementary Fig. 17) for three inputs: case numbers, wastewater data, or both data combined. For all panels, “inh.” stands for inhabitants.

Overall, for the different epidemic phases and all considered regions, the short-term predictions compare well with the real case data and with the case-based predictions (see also Supplementary Figs. 3–14). To quantify their performance, we determined the average standardised prediction error as the average discrepancy between predicted and actual case numbers in the corresponding time frame, normalised to case numbers and equivalent population Eq. (7). The performance of our wastewater-based pipeline is usually slightly lower, as it reconstruct the case numbers themselves before making the predictions, but remains similar with that of case-based predictions: all regional estimates lie within one standard deviation of the 1:1 (equal performance) line (Fig. 3b). The only exceptions are estimates for Oshkosh, probably due to under-testing during late 2020 (refer to Supplementary Fig. 2) which induced discrepancies in the detected cases fraction, and Kranj, whose low case numbers are subject to larger uncertainties (Supplementary Fig. 5). In general, the largest discrepancies are observed when case numbers plateau or decline after a rapid increase, yielding a potential overshoot of the predictions (Fig. 3a and Supplementary Figs. 3–14). This effect is associated to large changes in social activities during epidemic waves and rapid implementations of stricter restrictions, which are not explicitly included in the model but implicitly learned from the epidemic curve by the EKF with some delay.

The standardised error grows quite linearly with increasingly long prediction horizons (Fig. 3c). There, wastewater predictions are more stable (their uncertainty grows slower for longer prediction horizons) than those based on case numbers as they are usually less susceptible to daily fluctuations (Supplementary Table 2). This aspect allows quantifying and comparing the precision for different horizons.

In addition to using one type of data at a time, CoWWAn's EKF-based approach enables integrating different types of data to further improve the quality of predictions. Including both wastewater and case data slightly but systematically improves the prediction accuracy compared to case data alone (Fig. 3c and Supplementary Table 2), further suggesting that wastewater data contains independent information about the state of the epidemics, as previously put forward by Fernandez-Cassi et al. (2021).

3.5. Long-term projections of epidemic scenarios

Due to heterogeneous and evolving adaptations of population behavior and institutional measures, epidemic forecasts are typically only meaningful for relatively short time horizons. In fact, it is known that small uncertainties for short-term predictions are amplified over longer periods and the precision drops, similarly to what happens in weather forecasts (Petropoulos and Makridakis, 2020). Nevertheless, long-term projections that assume no changes in infection dynamics can be useful for counterfactual analysis about the potential effects of current social or pharmaceutical measures and/or changed viral infectivity (Fig. 4a, b). They can also be used to investigate plausible scenarios, by artificially modifying the model parameters.

Fig. 4.

Fig. 4

Long term projections using CoWWAn. a: Long-term projected curves of daily cases compared with daily detected case numbers. b: Long-term projected curves of cumulative cases compared with cumulative detected case numbers. Blue and red ribbons represent ±2σ error bounds (σ corresponds to a standard deviation); note that the ribbons might overlap. Both panels a and b report examples for Luxembourg data, with projections starting at the date marked by the green triangle.

As for other models applied to complex systems, our projection uncertainties increase with longer time horizons (Fig. 4b), reflecting the set of potential changes of conditions. Nonetheless, projections based on case numbers or on wastewater data are consistent with each other within error bounds, therefore supporting the possibility of using wastewater data for consistent what-if analysis. In addition, our mechanistic-based model allows assessing the changes in desired precision. Models applied in quickly changing conditions are known to be uncertain (Santosh, 2020). Similarly, our projections' precision varies depending on whether they are conducted during a rapid increase of case numbers or during stable trends, calling for caution in interpreting these results as plausible projections rather than forecasts. Other examples are reported in Supplementary Fig. 15.

3.6. Modelling assesses wastewater warning performance

Among the purposes of this article is to investigate the utility of wastewater sampling to alert against new waves of infections and to inform its interpretation. It has been suggested by Cao and Francis (2021), D'Aoust et al. (2021) and Kumar et al. (2021) that wastewater analysis could provide early warnings for COVID-19 resurgence in a community. Using our approach, we investigate this idea beyond retrospective analysis. We recall that, for the real-time detection of impending epidemic resurgence, distinguishing between fluctuations and robust increases is crucial to optimise the true positive signals and minimise the false negatives. To evaluate the alerting power of on-line (real time) systems, it is thus not sufficient to compare two fully developed time series with a retrospective analysis. CoWWAn addresses this challenge by the EKF-based predictions, which capture robust trends in the epidemic dynamics. It thus allows to compare early warnings of COVID-19 resurgence, obtained from case-based and wastewater-based predictions. In Fig. 5 and Supplementary Fig. 16, we plot the predictions about the pandemic trends, obtained from wastewater data (red) and from detected case numbers (yellow), and compared with the true observed evolution (blue). We can then compare if and when the red and yellow curves correctly track the increasing trend of the blue curve. We observe that, overall, the prediction curves accurately increase when a new COVID-19 wave is observed in a region, but the timing might slightly differ depending, e.g., on the testing frequency. This analysis demonstrates the potential of wastewater data to detect incoming increasing trends and quantitatively verifies the recent calls by Bibby et al. (2021) for cautious interpretation: alerts based on wastewater analysis might be just-on-time or even lagging slightly behind the true infection waves. Nonetheless, they are often more advanced than reliable alerts based on case numbers alone, e.g. for Kitchener or Raleigh, despite counterexamples exist (e.g., Kranj). As a result, we suggest that wastewater-based monitoring could be an effective method to detect new waves of infection, but that the lead time should be carefully assessed case-by-case, according to the sampling frequency and other characteristics of the wastewater-analysis pipeline. In short, reliable warnings can be triggered, but it still remains to properly verify how “early”.

Fig. 5.

Fig. 5

Zoom into the epidemic resurgences visually recognised in the considered regions. Short-term projections used to identify robust trends in epidemic resurgence, for different examples (one per region; other examples in Supplementary Fig. 16). We compare 7-days projections from case numbers and from wastewater data with the true detected case numbers.

4. Discussion

CoWWAn combines two powerful approaches to process wastewater data in an automated and mechanistic-based manner: an epidemiological SEIR model and an extended Kalman filter, to fit the model parameters adequately and to provide predictions of epidemic trends. This allows for new avenues for wastewater-based epidemic monitoring. In situations of reduced population testing, our approach allows to enhance the performances and robustness of real-time surveillance in a cost-effective manner. Our model can support the reconstruction of the infection curves from wastewater data and allows projections of future trends, in particular close to epidemic resurgence. Since hospital admission is downstream of the susceptible-exposed-infectious flow (Kemp et al., 2021), healthcare management as well (D'Aoust et al., 2021; Saguti et al., 2021) can obtain crucial information from an early detection of increasing case numbers supported by quantitative models that account for noise. We recall that our approach provides information on a community level but does not single out the infected individuals, hence it does not enable contact tracing nor does it reveal detailed information like age distribution of cases or infection clusters. Nonetheless, as already proven in our applications for the Luxembourg government, it proves useful to track the evolution of the pandemic, as a complement or even supplement to widespread testing. As a consequence, our results can enhance the SWEEP (Surveillance of Wastewater for Early Epidemic Prediction) framework recently proposed by Tiwari et al. (2021) for implementation of wastewater-based epidemiology.

CoWWAn can be easily applied and extended to different areas, and it overcomes some of the limitations of previous studies. In addition to employing a mechanistic underlying model or the Kalman filter, it allows improving R eff estimation. In fact, Huisman et al. (2021) estimated R eff from wastewater samples in Zurich, deconvoluting smoothed wastewater data using a kernel based on the shedding load distribution. For some reason, these estimate drops quite early in October 2020 while the case numbers are still steeply increasing. Our estimates do not exhibit such a shortcoming and is overall comparable over the remaining periods, see Supplementary Fig. 14c. However, we acknowledge the limitations of our approach, to be further improved in future studies. To begin with, the reconstruction of case numbers depends on the mean-field SEIR approximation: although meaningful when concentrating on average epidemic trends (Kollepara et al., 2021), it might yield uncertainties in case of heterogeneous behaviors like clusters. In addition, we observe that tailoring region-specific model parameters is recommended to fine-tune the performance and reduce the uncertainties over the estimates. The parameters can usually be estimated with independent methods or educated prior information, in particular concerning seroprevalence, so we acknowledge that the current set of proposed parameters might not be complete for all countries. As observed in the sensitivity analysis, the projections are somewhat sensitive to the ratio of total and detected cases. This is a shortcoming of every model-based projection. Long-term projections are more influenced by the choice of this parameter, due to potential errors in the estimated level of natural immunity in the population. Short-term projections are less sensitive, since any error in the estimated size of the susceptible population is compensated by the infectivity parameter estimate. In particular, we recommend using reliable estimates for the ratio of true versus detected cases during the model calibration. The best estimates originate from seroprevalence studies that are able to distinguish between antibodies from previous infection and vaccination, that is, studies that detect antibodies against other parts of the virus than just the spike protein (Suhandynata et al., 2021). Future discrepancies between wastewater-based estimates and detected cases might be used as indications about changes in the share of detected cases and could be used to trigger a warning against potential undertesting. As for what concern predictions, we observe a close relationship between data and prediction quality: escalating the sampling precision and rates is essential to improve the model estimations. Finally, although we have paved the way for the assessment of on-line warnings of epidemic waves using model-based predictions, we suggest that future studies further quantify the lead time and the precision/recall. Future analysis may also concentrate on optimising the desired performance and the costs associated, or might focus on expanding the current methodology to other epidemic contexts.

5. Conclusion

Sampling and analysing SARS-CoV-2 fluxes in wastewater has been suggested as an efficient, non-invasive and cost-effective complement or alternative to testing routines. Our study leverages the potential of wastewater analysis to provide quantitative information for monitoring, alerting and decision-making. By introducing an effective coupling of causal-based models and wastewater sampling, our approach goes beyond statistical methods. In fact, it allows immediate interpretation of its outputs and enables counterfactual analysis, to estimate plausible epidemic scenarios. Overall, the flexibility of our freely available approach, its ease of implementation and its performance make it an important tool for long-term monitoring and support of epidemic mitigation.

Data availability

The wastewater and case numbers data that support the findings of this study are available from the websites listed in Section 2.1 and Supplementary Table 1. Luxembourg data for this study are available at gitlab.lcsb.uni.lu/SCG/cowwan.

Code availability

CoWWAn's implementation for Matlab 2019b is available at gitlab.lcsb.uni.lu/SCG/cowwan.

CRediT authorship contribution statement

Daniele Proverbio: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Roles/Writing - original draft, Writing - review & editing. Françoise Kemp: Methodology, Writing - review & editing. Stefano Magni: Methodology, Writing - review & editing. Leslie Ogorzaly: Data curation, Funding acquisition, Writing - review & editing. Henry-Michel Cauchie: Data curation, Funding acquisition. Jorge Gonçalves: Conceptualization, Funding acquisition, Supervision, Writing - review & editing. Alexander Skupin: Conceptualization, Funding acquisition, Supervision, Writing - review & editing. Atte Aalto: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Roles/Writing - original draft, Writing - review & editing.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Atte Aalto reports financial support was provided by Luxembourg National Research Fund (FNR). Daniele Proverbio reports financial support was provided by Luxembourg National Research Fund (FNR). Francoise Kemp reports financial support was provided by Luxembourg National Research Fund (FNR). Stefano Magni reports financial support was provided by Luxembourg National Research Fund (FNR). Leslie Ogorzaly reports financial support was provided by Luxembourg National Research Fund (FNR). Henri-Michel Cauchie reports financial support was provided by Luxembourg National Research Fund (FNR). Jorge Goncalves reports financial support was provided by 111 Project on Computational Intelligence and Intelligent Control.

Acknowledgments

D.P. and S.M. are supported by the Luxembourg National Research Fund (FNR) through PRIDE15/10907093/CriTiCS and F.K. by the FNR project PRIDE17/12244779/PARK-QC. A.A. is supported by the FNR through CORE19/13684479/DynCell. L.O. and H.M.C. are supported by the FNR through the COVID-19-FT2/14806023/Coronastep+. J.G. is partly supported by the 111 Project on Computational Intelligence and Intelligent Control, ref. B18024. The authors want to thank the Research Luxembourg COVID-19 Task Force for general support and collaborative spirit.

Editor: Damia Barcelo

Footnotes

Appendix D

Supplementary data to this article can be found online at https://doi.org/10.1016/j.scitotenv.2022.154235.

Appendix A. Derivation of the SEIR stochastic model

The SEIR model is based on the assumption that each susceptible person has probability β(t)I(t)/N   dt to become infected on an infinitesimal time interval [t, t + dt), and that infection events are independent. The number of new infections at [t, t + dt) is then a random variable from the binomial distribution (n, p) with n = S(t) and p = β(t)I(t)/N   dt. Assuming high enough number of cases and stationary rate parameters over a time interval Δt=1 day (Gillespie, 2000), the binomial distribution can be well approximated by the normal distribution with mean β(t)S(t)I(t)/N   dt, and variance βtIt/Ndt1βtIt/NdtSt=βtStIt/Ndt+Odt2. The same reasoning can be repeated for all other transitions between compartments. The stochastic SEIR model is then given in Eq. (1).

Appendix B. Developing the complete SEIR-WW-EKF model

To embed the SEIR dynamical system in the Extended Kalman filter, we formulate a time-discretised state-space version of the dynamical system Eq. (1) by explicit Euler method:

xt+Δt=xt+Δtfxt+wt. (B.1)

To obtain the number of daily new infections from the model on a given day, an additional auxiliary state variable D(t) is defined, whose dynamics are given by

{Dt=0,fortN,ddtDt=αEt,

that is, D(t) is the differential counterpart of y c(t) and is reset every day to keep track of new infections on the current day.

Including the auxiliary variable, the state space is 6-dimensional with variables x 1…6(t) = [SEIADβ](t). Due to conservation of N, R(t) is redundant and is therefore omitted. Eq. (B.1) is complemented with the resetting of x 5(t) to zero once per day. The function f(x) can be represented by a reaction function r(x) which is multiplied by the stoichiometric matrix B:

f(x)=x1x3x7/Nx1x3x7/Nαx2αx2τx3x1x3x7/Nγx4αx20=100011000110100101000000x1x3x7/Nαx2τx3γx4Br(x).

As argued in the previous section, the state noise w(t) can be well approximated as normally distributed with mean zero and covariance

Q(x)=κ2ΔtBdiag(r(x))B+ΔtQβ

arising from the stochastic model Eq. (1); note that each white noise process w j for j=1,...,4 in (1) corresponds to its respective reaction r j(x). The coefficient κ is used to account for modelling errors. In particular, the SEIR model implicitly assumes a homogeneous and perfectly mixed population. This assumption leads to a rather small uncertainty. The coefficient κ can also be interpreted as a sensitivity tuning parameter. Lower κ leads to higher sensitivity but noisy estimates. Higher κ decreases sensitivity but increases robustness against noise. The parameter β has no dynamics through f(x), but it is updated by the Kalman filter. The matrix Q β is otherwise zero, except for the element (6,6) being q β, which acts as a tuning parameter controlling the magnitude of change of β(t) in one day.

The measurements from the model are either detected cases on a given day and/or wastewater sampling. To this end, we define possible observation matrices:

Cct=0000ct0,Cw=000ν00,andCbt=CctCw, (B.2)

where the sub-indices refer to cases (c), wastewater (w), and both (b). We recall that c t is the share of detected cases on a day t. It is a coefficient that reflects the testing strategy, which often depends on the day (reduced testing on weekends and on public holidays). The empirical measurements are assumed to be noisy, with an additive, normally distributed noise with mean zero and covariance U(t) = diag  (U c(t),U w) (or just U(t) = U c(t) or U(t) = U w if only one of the measurements is available). The variance of observed cases, U c(t), is obtained by assuming that cases are detected independently with probability c t. This leads again to a Binomial distribution for detected cases with mean c t D(t), where D(t) is the number of new infections on day t. This is unknown to us, and we use a smoothed estimate D(t)=y¯c(t)/c¯t (barred variables stand for 7-days moving averages). The variance of the Binomial distribution is given by U c(t) = D(t)c t(1 − c t). For Raleigh, 232 is added to the variance U c(t) to account for the (independent) uncertainty due to the aforementioned rounding of the case numbers, where 23 is the largest possible rounding error (N/20,000).

Algorithm 1 is used to calculate three different state estimates: x^c(t) using only case number data (C(t) = C c(t)); x^w(t) using only wastewater data (C(t) = C w on days when wastewater sampling is done, otherwise Kalman update is skipped); x^b(t) using both case and wastewater data (C(t) = C b(t) on days when wastewater sampling is done, C(t) = C c(t) otherwise). These were then used to estimate the data that were not employed for the state estimation, that is, we calculated y^w(t)Cw(t)x^c(t) and y^c(t)Cc(t)x^w(t) (C i are the Kalman filter observation matrices, Eq. (B.2)).

The Kalman filter is complemented with a simple outlier saturation for the wastewater data. The model-predicted value for a wastewater measurement is given by Cwx~, with prediction error variance CwP~CwΤ+Uw. If the measurement differs from the model-prediction by more than four standard deviations, the measurement is replaced by the saturated value Cwx~±4CwP~CwΤ+Uw1/2.

Appendix C. Estimate of the model parameters

As discussed in the main text, we estimate the free parameters of our model from available data, to calibrate them appropriately.

The initial sizes for the E and I compartments are directly automatically estimated from the data by

E0=η0α1+t=15yct5andI0=η0τ1+t=15yct5, (C.3)

where α and τ are the transition rates EI and IR, respectively, whose inverses are the average duration an infected person remains in E and I compartments. η t is the average ratio of total and detected cases at day t. Considering 5 data points is a trade-off between approximating values on the first day and sensitivity to noise. The model is little sensitive to this choice, cf. Supplementary Fig. 18. For Luxembourg, the data starts from the very beginning of the epidemic, when testing was not performed as actively as in the later stages, hence the values discussed in Section 2.5.

The daily ratios c t of detected and total cases are obtained as follows. Initially, a weekly rhythm for case numbers is identified by averaging first over five weeks, and then by a moving average over three weeks:

c~t={355j=04ycmodt17+1+7js=135ycsfort35,213yct7+yct14+yct21s=t20tycsfort>35.

Then, these values are normalised by the weekly moving average:

ct=7c~tηts=t6tc~s. (C.4)

Note that the procedure for the first five weeks is not causal, but some data is anyway needed for model calibration. The later values c t are causally determined from data. To obtain final values on public non-weekend holidays, c t is reduced by a factor of 4 from the value given by Eq. (C.4) to account for reduced testing. In case the weekly rhythm is not regular, manual tuning could help improving performance (or estimating c t based on number of performed tests, for example).

The variance of the wastewater measurements U w is estimated from data by

Uw=Kmedianywtj15i=j2j+2ywti2, (C.5)

where each t i is the time point when wastewater sampling is done. The scaling factor K is either 1/10 when wastewater data is used alone and K=1 when both case and wastewater data are used, as well as for the outlier detection. In the plots of wastewater data reconstruction, K=1 is used for plotting the uncertainty envelope.

Appendix D. Supplementary data

Supplementary material

mmc1.pdf (6.3MB, pdf)

References

  1. Ahmed W., Bivins A., Simpson S.L., Bertsch P.M., Ehret J., Hosegood I., Metcalfe S., Smith W.J., Thomas K.V., Tynan J., et al. Wastewater surveillance demonstrates high predictive value for COVID-19 infection on board repatriation flights to Australia. Environ. Int. 2021;158 doi: 10.1016/j.envint.2021.106938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Althaus C.L. Estimating the reproduction number of Ebola virus (EBOV) during the 2014 outbreak in West Africa. PLoS Curr. 2014;6 doi: 10.1371/currents.outbreaks.91afb5e0f279e7f29e7056095255b288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anderson R.M., May R.M. Population biology of infectious diseases: part I. Nature. 1979;280(5721):361–367. doi: 10.1038/280361a0. [DOI] [PubMed] [Google Scholar]
  4. Bandala E.R., Kruger B.R., Cesarino I., Leao A.L., Wijesiri B., Goonetilleke A. Impacts of COVID-19 pandemic on the wastewater pathway into surface water: a review. Sci. Total Environ. 2021;774 doi: 10.1016/j.scitotenv.2021.145586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bibby K., Bivins A., Wu Z., North D. Making waves: plausible lead time for wastewater based epidemiology as an early warning system for COVID-19. Water Res. 2021;202 doi: 10.1016/j.watres.2021.117438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cao Y., Francis R. On forecasting the community-level COVID-19 cases from the concentration of SARS-CoV-2 in wastewater. Sci. Total Environ. 2021;786 doi: 10.1016/j.scitotenv.2021.147451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cluzel N., Courbariaux M., Wang S., Moulin L., Wurtzer S., Bertrand I., Laurent K., Monfort P., Gantzer C., Le Guyader S., et al. A nationwide indicator to smooth and normalize heterogeneous SARS-CoV-2 RNA data in wastewater. Environ. Int. 2022;158 doi: 10.1016/j.envint.2021.106998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Courbariaux M., Cluzel N., Wang S., Maréchal V., Moulin L., Wurtzer S., Mouchel J.-M., Maday Y., Nuel G., Bertrand, Obépine Consortium A flexible smoother adapted to censored data with outliers and its application to SARS-CoV-2 monitoring in wastewater. Front. Appl. Math. Stat. 2022;8 doi: 10.3389/fams.2022.836349. [DOI] [Google Scholar]
  9. D'Aoust P.M., Graber T.E., Mercier E., Montpetit D., Alexandrov I., Neault N., Baig A.T., Mayne J., Zhang X., Alain T., et al. Catching a resurgence: increase in SARS-CoV-2 viral RNA identified in wastewater 48 h before COVID-19 clinical tests and 96 h before hospitalizations. Sci. Total Environ. 2021;770 doi: 10.1016/j.scitotenv.2021.145319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Daughton C.G. Wastewater surveillance for population-wide COVID-19: the present and future. Sci. Total Environ. 2020;736 doi: 10.1016/j.scitotenv.2020.139631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Davis M., Leo S. Black-Litterman in continuous time: the case for filtering. Quant. Financ. Lett. 2013;1(1):30–35. doi: 10.1080/21649502.2013.803794. [DOI] [Google Scholar]
  12. Farkas K., Hillary L.S., Malham S.K., McDonald J.E., Jones D.L. Wastewater and public health: the potential of wastewater surveillance for monitoring COVID-19. Curr. Opin. Environ. Sci. Health. 2020;17:14–20. doi: 10.1016/j.coesh.2020.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fernandez-Cassi X., Scheidegger A., Bänziger C., Cariti F., Corzon A.T., Ganesanandamoorthy P., Lemaitre J.C., Ort C., Julian T.R., Kohn T. Wastewater monitoring outperforms case numbers as a tool to track COVID-19 incidence dynamics when test positivity rates are high. Water Res. 2021;200 doi: 10.1016/j.watres.2021.117252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gillespie D. The chemical Langevin equation. J. Chem. Phys. 2000;113(1):297–306. doi: 10.1063/1.481811. [DOI] [Google Scholar]
  15. Goldberg Y., Mandel M., Bar-On Y.M., Bodenheimer O., Freedman L., Haas E.J., Milo R., Alroy-Preis S., Ash N., Huppert A. Waning immunity after the BNT162b2 vaccine in Israel. N. Eng. J. Med. 2021;385(24) doi: 10.1056/NEJMoa2114228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gundy P.M., Gerba C.P., Pepper I.L. Survival of coronaviruses in water and wastewater. Food Environ. Virol. 2009;1(1):10–14. doi: 10.1007/s12560-008-9001-6. [DOI] [Google Scholar]
  17. Harvey A.C. Cambridge University Press; Cambridge: 1990. Forecasting, Structural Time Series Models and the Kalman Filter. [Google Scholar]
  18. Hasan S.W., Ibrahim Y., Daou M., Kannout H., Jan N., Lopes A., et al. Detection and quantification of SARS-CoV-2 RNA in wastewater and treated effluents: surveillance of COVID-19 epidemic in the United Arab Emirates. Sci. Total Environ. 2021;764 doi: 10.1016/j.scitotenv.2020.142929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. He S., Peng Y., Sun K. SEIR modeling of the COVID-19 and its dynamics. Nonlinear Dynam. 2020;101(3):1667–1680. doi: 10.1007/s11071-020-05743-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Huisman J.S., Scire J., Caduff L., Fernandez-Cassi X., Ganesanandamoorthy P., Kull A., Scheidegger A., Stachler E., Boehm A.B., Hughes B. 2021. Wastewater-based estimation of the effective reproductive number of SARS-CoV-2. medRxiv, 2021.04.29.21255961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kalman R. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960;82:35–45. doi: 10.1115/1.3662552. [DOI] [Google Scholar]
  22. Kapo K.E., Paschka M., Vamshi R., Sebasky M., McDonough K. Estimation of US sewer residence time distributions for national-scale risk assessment of down-the-drain chemicals. Sci. Total Environ. 2017;603:445–452. doi: 10.1016/j.scitotenv.2017.06.075. [DOI] [PubMed] [Google Scholar]
  23. Kemp F., Proverbio D., Aalto A., Mombaerts L., d’Herouel A.F., Husch A., Ley C., Goncalves J., Skupin A., Magni S. Modelling COVID-19 dynamics and potential for herd immunity by vaccination in Austria, Luxembourg and Sweden. J. Theor. Biol. 2021;530 doi: 10.1016/j.jtbi.2021.110874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kollepara P.K., Siegenfeld A.F., Bar-Yam Y. arXiv; 2021. Modeling complex systems: A case study of compartmental models in epidemiology. arXiv:2110.02947. [Google Scholar]
  25. Kumar M., Patel A.K., Shah A.V., Raval J., Rajpara N., Joshi M., Joshi C.G. First proof of the capability of wastewater surveillance for COVID-19 in India through detection of genetic material of SARS-CoV-2. Sci. Total Environ. 2020;746 doi: 10.1016/j.scitotenv.2020.141326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kumar M., Joshi M., Patel A.K., Joshi C.G. Unravelling the early warning capability of wastewater surveillance for COVID-19: a temporal study on SARS-CoV-2 RNA detection and need for the escalation. Environ. Res. 2021;196 doi: 10.1016/j.envres.2021.110946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lahrich S., Laghrib F., Farahi A., Bakasse M., Saqrane S., El Mhammedi M. Review on the contamination of wastewater by COVID-19 virus: impact and treatment. Sci. Total Environ. 2021;751 doi: 10.1016/j.scitotenv.2020.142325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lai S., Ruktanonchai N.W., Zhou L., Prosper O., Luo W., Floyd J.R., Wesolowski A., Santillana M., Zhang C., Du X., et al. Effect of non-pharmaceutical interventions to contain COVID-19 in China. Nature. 2020;585(2):410–413. doi: 10.1038/s41586-020-2293-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Larsen D.A., Wigginton K.R. Tracking COVID-19 with wastewater. Nat. Biotechnol. 2020;38(10):1151–1153. doi: 10.1038/s41587-020-0690-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li X., Kulandaivelu J., Zhang S., Shi J., Sivakumar M., Mueller J., Luby S., Ahmed W., Coin L., Jiang G. Data-driven estimation of COVID-19 community prevalence through wastewater-based epidemiology. Sci. Total Environ. 2021;789 doi: 10.1016/j.scitotenv.2021.147947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Marchesseau S., Delingette H., Sermesant M., Cabrera-Lozoya R., Tobon-Gomez C., Moireau P., i Ventura R.F., Lekadir K., Hernandez A., Garreau M., et al. Personalization of a cardiac electromechanical model using reduced order unscented Kalman filtering from regional volumes. Med. Im. An. 2013;17(7):816–829. doi: 10.1016/j.media.2013.04.012. [DOI] [PubMed] [Google Scholar]
  32. McMahan C.S., Self S., Rennert L., Kalbaugh C., Kriebel D., Graves D., Colby C., Deaver J.A., Popat S.C., Karanfil T., et al. Covid-19 wastewater epidemiology: a model to estimate infected populations. Lancet Planet. Health. 2021;5(12):e874–e881. doi: 10.1016/S2542-5196(21)00230-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Miura F., Kitajima M., Omori R. Duration of SARS-CoV-2 viral shedding in faeces as a parameter for wastewater-based epidemiology: re-analysis of patient data using a shedding dynamics model. Sci. Total Environ. 2021;769 doi: 10.1016/j.scitotenv.2020.144549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Nagy Stovner B., Johansen T., Fossen T., Schjølberg I. Attitude estimation by multiplicative exogenous Kalman filter. Automatica. 2018;95:347–355. doi: 10.1016/j.automatica.2018.05.038. [DOI] [Google Scholar]
  35. Naughton C.C., Roman F.A., Alvarado A.G.F., Tariqi A.Q., Deeming M.A., Bibby K., Bivins A., Rose J.B., Medema G., Ahmed W. 2021. Show us the data: Global COVID-19 wastewater monitoring efforts, equity, and gaps. medRxiv, 2021.03.14.21253564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Néant N., Lingas G., Le Hingrat Q., Ghosn J., Engelmann I., Lepiller Q., Gaymard A., Plantier J.-C., Cédric Hartard V.F. Modeling SARS-CoV-2 viral kinetics and association with mortality in hospitalized patients from the French COVID cohort. Proc. Natl. Acad. Sci. USA. 2021;118(8) doi: 10.1073/pnas.2017962118. e2017962118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Nemudryi A., Nemudraia A., Wiegand T., Surya K., Buyukyoruk M., Cicha C., Vanderwood K.K., Wilkinson R., Wiedenheft B. Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater. Cell Rep. Med. 2020;1(6) doi: 10.1016/j.xcrm.2020.100098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Peccia J., Zulli A., Brackney D.E., Grubaugh N.D., Kaplan E.H., Casanovas-Massana A., Ko A.I., Malik A.A., Wang D., Wang M., et al. Measurement of SARS-CoV-2 RNA in wastewater tracks community infection dynamics. Nat. Biotechnol. 2020;38(10):1164–1167. doi: 10.1038/s41587-020-0684-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Petropoulos F., Makridakis S. Forecasting the novel coronavirus COVID-19. PloS one. 2020;15(3) doi: 10.1371/journal.pone.0231236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Pollán M., Pérez-Gómez B., Pastor-Barriuso R., Oteo J., Hernán M.A., Pérez-Olmeda M., Sanmartn J.L., Fernández-Garca A., Cruz I., de Larrea N.F., et al. Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. Lancet. 2020;396(10250):535–544. doi: 10.1016/S0140-6736(20)31483-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Proverbio D., Kemp F., Magni S., Husch A., Aalto A., Mombaerts L., Skupin A., Gonçalves J., Ameijeiras-Alonso J., Ley C. Dynamical SPQEIR model assesses the effectiveness of non-pharmaceutical interventions against COVID-19 epidemic outbreaks. Plos one. 2021;16(5):1–21. doi: 10.1371/journal.pone.0252019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Quilliam R.S., Weidmann M., Moresco V., Purshouse H., O’Hara Z., Oliver D.M. COVID-19: the environmental implications of shedding SARS-CoV-2 in human faeces. Environ. Int. 2020;140 doi: 10.1016/j.envint.2020.105790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Randazzo W., Cuevas-Ferrando E., Sanjuán R., Domingo-Calap P., Sánchez G. Metropolitan wastewater analysis for COVID-19 epidemiological surveillance. Int. J. Hyg. Environ. Health. 2020;230 doi: 10.1016/j.ijheh.2020.113621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Reeves K., Liebig J., Feula A., Saldi T., Lasda E., Johnson W., Lilienfeld J., Maggi J., Pulley K., Wilkerson P.J., et al. High-resolution within-sewer SARS-CoV-2 surveillance facilitates informed intervention. Water Res. 2021;204 doi: 10.1016/j.watres.2021.117613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ritchie H., Mathieu E., Rodés-Guirao L., Appel C., Giattino C., Ortiz-Ospina E., Hasell J., Macdonald B., Beltekian D., Roser M. Our World in Data.; 2020. Coronavirus pandemic (COVID-19)https://ourworldindata.org/coronavirus [Google Scholar]
  46. Roda W.C., Varughese M.B., Han D., Li M.Y. Why is it difficult to accurately predict the COVID-19 epidemic? Inf. Dis. Mod. 2020;5:271–281. doi: 10.1016/j.idm.2020.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Saguti F., Magnil E., Enache L., Churqui M.P., Johansson A., Lumley D., et al. Surveillance of wastewater revealed peaks of SARS-CoV-2 preceding those of hospitalized patients with COVID-19. Water Res. 2021;189 doi: 10.1016/j.watres.2020.116620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sala-Comorera L., Reynolds L.J., Martin N.A., O’Sullivan J.J., Meijer W.G., Fletcher N.F. Decay of infectious SARS-CoV-2 and surrogates in aquatic environments. Water Res. 2021;201 doi: 10.1016/j.watres.2021.117090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Santosh K. Covid-19 prediction models and unexploited data. J. Med. Syst. 2020;44(9):1–4. doi: 10.1007/s10916-020-01645-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Snoeck C.J., Vaillant M., Abdelrahman T., Satagopam V.P., Turner J.D., Beaumont K., Gomes C.P.C., Fritz J.V., Schröder V.E., Kaysen A., et al. on behalf of the CON-VINCE study group . 2020. Prevalence of SARS-CoV-2 Infection in the Luxembourgish Population – The CON-VINCE Study. medRxiv. 2020.05.11.20092916. [DOI] [Google Scholar]
  51. Suhandynata R.T., Bevins N.J., Tran J.T., Huang D., Hoffman M.A., Lund K., Kelner M.J., McLawhon R.W., Gonias S.L., Nemazee D., Fitzgerald R.L. SARS-CoV-2 serology status detected by commercialized platforms distinguishes previous infection and vaccination adaptive immune responses. J. Appl. Lab. Med. 2021;6(5):1109–1122. doi: 10.1093/jalm/jfab080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Tiwari S.B., Gahlot P., Tyagi V.K., Zhang L., Zhou Y., Kazmi A., Kumar M. Surveillance of Wastewater for Early Epidemic Prediction (SWEEP): environmental and health security perspectives in the post COVID-19 Anthropocene. Environ. Res. 2021;195 doi: 10.1016/j.envres.2021.110831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Vallejo J.A., Trigo N., Rumbo-Feal S., Conde-Pérez K., Lopez-Oriona Á., Barbeito I., Vaamonde M., Tarro-Saavedra J., Reif R., Ladra S., et al. Modeling the number of people infected with SARS-COV-2 from wastewater viral load in Northwest Spain. Sci. Total Environ. 2021;811 doi: 10.1016/j.scitotenv.2021.152334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Weidhaas J., Aanderud Z.T., Roper D.K., VanDerslice J., Gaddis E.B., Ostermiller J., Hoffman K., Jamal R., Heck P., Zhang Y., et al. Correlation of SARS-CoV-2 RNA in wastewater with COVID-19 disease burden in sewersheds. Sci. Total Environ. 2021;775 doi: 10.1016/j.scitotenv.2021.145790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wilmes P., Zimmer J., Schulz J., Glod F., Veiber L., Mombaerts L., Rodrigues B., Aalto A., Pastore J., Snoeck C.J., et al. SARS-CoV-2 transmission risk from asymptomatic carriers: results from a mass screening programme in Luxembourg. Lancet Reg. Health-Eur. 2021;4 doi: 10.1016/j.lanepe.2021.100056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wölfel R., Corman V.M., Guggemos W., Seilmaier M., Zange S., Müller M.A., Niemeyer D., Jones T.C., Vollmar P., Rothe C., et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581(7809):465–469. doi: 10.1038/s41586-020-2196-x. [DOI] [PubMed] [Google Scholar]
  57. Wurtzer S., Marechal V., Mouchel J., Maday Y., Teyssou R., Richard E., Almayrac J., Moulin L. Evaluation of lockdown effect on SARS-CoV-2 dynamics through viral genome quantification in waste water, Greater Paris, France, 5 March to 23 April 2020. Eurosurveillance. 2020;25(50) doi: 10.2807/1560-7917.ES.2020.25.50.2000776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zhu Y., Oishi W., Maruo C., Saito M., Chen R., Kitajima M., Sano D. Early warning of COVID-19 via wastewater-based epidemiology: potential and bottlenecks. Sci. Total Environ. 2021;767 doi: 10.1016/j.scitotenv.2021.145124. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.pdf (6.3MB, pdf)

Data Availability Statement

The wastewater and case numbers data that support the findings of this study are available from the websites listed in Section 2.1 and Supplementary Table 1. Luxembourg data for this study are available at gitlab.lcsb.uni.lu/SCG/cowwan.


Articles from The Science of the Total Environment are provided here courtesy of Elsevier

RESOURCES