Skip to main content
Philosophical transactions. Series A, Mathematical, physical, and engineering sciences logoLink to Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
. 2022 Aug 15;380(2233):20210305. doi: 10.1098/rsta.2021.0305

Refining epidemiological forecasts with simple scoring rules

Robert E Moore 1,, Conor Rosato 1, Simon Maskell 1
PMCID: PMC9376716  PMID: 35965461

Abstract

Estimates from infectious disease models have constituted a significant part of the scientific evidence used to inform the response to the COVID-19 pandemic in the UK. These estimates can vary strikingly in their bias and variability. Epidemiological forecasts should be consistent with the observations that eventually materialize. We use simple scoring rules to refine the forecasts of a novel statistical model for multisource COVID-19 surveillance data by tuning its smoothness hyperparameter.

This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.

Keywords: Bayesian, multisource, COVID-19, forecasting, scores, NSES

1. Introduction

Several epidemiological modelling groups use statistical models of infectious disease to generate forecasts that contribute to a body of scientific evidence that informs the response to the COVID-19 pandemic in the UK. The models developed by the University of Cambridge MRC Biostatistics Unit and Public Health England (PHE) [1], the University of Warwick [2] and the London School of Hygiene & Tropical Medicine [3] provide three notable examples of statistical models used to produce such estimates.

Although Cramer et al. [4] and Funk et al. [5] consider the assessment of quantile-format forecasts for COVID-19, it does not appear to be standard practice to assess full-distribution epidemiological forecasts by comparing them to the observations that eventually materialize. Maishman et al. [6] provide a set of anonymized estimates for the effective reproduction number Rt that highlights the striking differences in the bias and variability of estimates that different epidemiological models can produce. We contribute to the collective effort of modelling COVID-19 in this paper by introducing a statistical model for multisource COVID-19 surveillance data and by using simple scoring rules to refine its forecasts and improve its predictive performance. The statistical model is novel by its use of symptom report data from the NHS 111 telephone service, its compartments for convalescing and terminally ill individuals, and its implementation, which uses a bespoke numerical integrator to solve the system of ODEs for the transmission model.

We begin by describing the novel statistical model in §2 and then proceed to define a set of simple scoring rules in §3. In §4, we use the simple scoring rules to refine the forecasts of the statistical model. Finally, in §5, we bring the paper to an end by presenting our conclusion.

2. Statistical model

The model we describe in this section builds on a previous version whose implementation we contributed to the CoDatMo (Covid Data Models) organization on GitHub [7]. The purpose of CoDatMo is to provide a collection of COVID-19 models, all written in the statistical modelling language Stan [8]. In addition to models originally implemented in Stan, CoDatMo provides Stan instantiations of COVID-19 models, initially written in other programming languages. By hosting a set of well-documented models, all implemented in a common language, CoDatMo hopes to improve understanding of the set of existing models and make it easier for newcomers and established researchers within the domain of epidemiology to make extensions and potential improvements.

We note that CoDatMo has already had some success against its objectives, with a group, primarily based at Universidade Nove de Julho in Brazil, creating a related but simpler model [9]. We also note that the UK Health Security Agency uses a slightly more sophisticated version of the model presented in this paper to generate weekly estimates of the effective reproduction number and growth rate for regions of the UK [10].

The statistical model consists of two parts: a transmission model that captures a simplified mechanism for the spread of coronavirus through the population; and an observation model that encapsulates the assumptions about the connection between the states of the transmission model and the observed surveillance data used to calibrate the model.

(a) . Transmission model

The simple SIR compartmental model developed by Kermack & McKendrick [11] provides the theoretical basis for the transmission model. As with the SIR model, we assume a single geographical region with a large population of identical individuals who come into contact with one another uniformly at random but do not come into contact with individuals from other areas. In contrast to some other models, for example, those developed by Birrell et al. [1] and Keeling et al. [2], we treat the population as identical in terms of age and sex and only discriminate between individuals based on their disease states. We also assume that the population is closed, meaning that no births or deaths occur and no migration in or out of the population occurs.

The SIR model divides the population into three disease states: individuals who are (S) susceptible to infection; individuals who have been infected with the disease and are (I) infectious to other people; and individuals who have (R) ‘recovered’ either by recuperating from the disease or dying. We augment these compartments in the transmission model by adding disease states for individuals who have been (E) exposed to the virus but are not yet infectious, individuals that are no longer infectious and whose final disease states are (P) pending, and individuals who have (D) died of the disease. The exposed and dead compartments are standard extensions to the original SIR model. By contrast, we believe the pending compartments to be a novel aspect of the model. These compartments contain individuals who are either convalescing or terminally ill as a result of infection. In addition to adding these compartments, we redefine the (R) recovered population to include the living only.

Inspired by the work of the University of Cambridge MRC Biostatistics Unit and PHE [1], we partition each of the intermediate disease states (E, I, P) into two sub-states. Partitioning the sub-states in this way makes the model more realistic by implicitly constraining the times spent in each of these disease states to have Erlang rather than exponential distributions.

We assume that there is at least one individual in each susceptible, exposed, and infectious compartment on 17 February 2020, which is the beginning of time in the model. At this stage, we assume that no population members are pending, have recovered, or have died. Two parameters, α1 and α2, determine the allocation of the rest of the population to the first five compartments of the transmission model. α1 is the proportion of the remaining population initially in the susceptible compartment, and α2 is the proportion of the infected population that is not yet infectious at time zero. For simplicity, we divide the exposed and infectious populations equally between the respective sub-states. We can express the exact relationship between the two parameters, α1 and α2, and the initial state of the transmission model as a set of equations

S(0)=(N5)α1+1, 2.1
E1(0)=12(N5)(1α1)α2+1, 2.2
E2(0)=12(N5)(1α1)α2+1, 2.3
I1(0)=12(N5)(1α1)(1α2)+1, 2.4
I2(0)=12(N5)(1α1)(1α2)+1, 2.5
P1(0)=0, 2.6
P2(0)=0, 2.7
R(0)=0 2.8
andD(0)=0. 2.9

Figure 1 provides a graphical illustration of the transmission model that captures the assumptions relating to the flow of individuals between disease states.

Figure 1.

Figure 1.

A graph of the transmission model. Individuals begin their journey in the susceptible (S) state. From here, they are infected and move into the exposed (E) state. After the virus has incubated for a while, they continue into the infectious (I) state. Next, they enter the pending (P) state, after which they either migrate into the recovered (R) state if convalescing or pass into the deceased (D) state if terminally ill.

We assume that the population randomly mixes as time elapses, with infectious and susceptible individuals coming into contact with one another, potentially transmitting the virus. Susceptible people who have become exposed through these contacts are not initially infectious. The virus replicates in their bodies for a time, known as the latent period, before they become infectious and have the potential to transmit the virus onto members of the remaining susceptible population. After being infectious for some time, we assume that individuals enter a state of pending before either recovering and becoming indefinitely immune to reinfection if they were convalescing or dying if they were terminally ill.

The number of individuals in each disease state varies with time according to a system of ordinary differential equations (ODEs)

dS(t)dt=β(t)I1(t)+I2(t)NS(t), 2.10
dE1(t)dt=β(t)I1(t)+I2(t)NS(t)2dLE1(t), 2.11
dE2(t)dt=2dL[E1(t)E2(t)], 2.12
dI1(t)dt=2dLE2(t)2dII1(t), 2.13
dI2(t)dt=2dII2(t)2dPP1(t), 2.14
dP2(t)dt=2dP[P1(t)P2(t)], 2.16
dR(t)dt=2dPP2(t)[1ω] 2.17
  and  dD(t)dt =2dPP2(t)ω, 2.18

where

  • S(t) is the number of susceptible individuals who have not yet been infected and are at risk of infection,

  • E1(t)+E2(t) is the number of exposed individuals who have been infected but are not yet infectious,

  • I1(t)+I2(t) is the number of infectious individuals,

  • P1(t)+P2(t) is the number of pending individuals who are either convalescing or are terminally ill,

  • R(t) is the number of recovered individuals,

  • D(t) is the number of dead individuals,

  • N=S(t)+E1(t)+E2(t)+I1(t)+I2(t)+P1(t)+P2(t)+R(t)+D(t) is the constant total number of individuals in the population,

  • dL is the mean time between infection and onset of infectiousness,

  • dI is the mean time for which individuals are infectious,

  • dP is the mean time for which individuals are pending,

  • ω, the infection fatality ratio (IFR), is the proportion of infected individuals who will die,

  • β(t) is the mean rate of contacts between individuals per unit time that are sufficient to lead to transmission if one of the individuals is infectious and the other is susceptible. β(t) is a continuous piecewise linear function of time

β(t)=j=1Jβj(t)χ[tj1,tj)(t), 2.19

where the mean rate of effective contacts during the jth time interval, βj(t), is given by

βj(t)=βj+1βjtjtj1(ttj1)+βj 2.20

and

χ[ti1,ti)(t)={1ift[ti1,ti),0ift[ti1,ti). 2.21

The effective contact rate parameters β1,β2,,βJ+1 in equation (2.20) are the values β(t) takes on a set of predefined dates t0,t1,,tJ. The first date is 17 February 2020, and each date that follows is 7 days after the last, with the second date being 24 March 2020, the first day after the prime minister announced the first national lockdown.

(b) . Observation model

Epidemiological modelling groups use many types of surveillance data to calibrate statistical models of infectious diseases. The observation model captures the assumptions about the relationship between the states of the transmission model and the surveillance data that we use for calibration.

We have designed the observation model to be extensible. Here, the model only has components for death, hospital admission and symptom report data to keep things simple. Nonetheless, the observation model can be extended to assimilate additional types of surveillance data, such as case data, by appending extra components similar in structure to those for ingesting the hospital admission and symptom report data.

(i). Death data

On their official website for coronavirus data [12], the UK government publishes a daily time series of the number of deaths of individuals whose death certificate mentioned COVID-19 as one of the causes. We assume that the number of deaths on day t, according to this definition, dobs(t), has a negative binomial distribution parameterized by a mean d(t) and parameter ϕdeaths which affects overdispersion

dobs(t)NegativeBinomial(d(t),ϕdeaths), 2.22

where we use the alternative parameterization of the negative binomial distribution as defined by the Stan Development Team [13]

NegativeBinomial(μ,ϕ)=(n+ϕ1n)(μμ+ϕ)n(ϕμ+ϕ)ϕ. 2.23

In equation (2.22), d(t) is the difference between the population of the D state of the transmission model between days t1 and t: d(t)=D(t)D(t1).

(ii). Hospital admission data

The UK government also publishes a daily time series of the number of COVID-19 patients admitted to hospital on their official website for coronavirus data [12]. We assume that, like the number of deaths, the number of hospital admissions on day t, hobs(t), has a negative binomial distribution parameterized by h(t) and ϕadmissions:

hobs(t)NegativeBinomial(h(t),ϕadmissions), 2.24

where

h(t)=ρadmissions(t)×2dII2, 2.25

i.e. the mean number of hospital admissions on day t equals the ratio of hospital admissions to potential patients, ρadmissions(t), multiplied by the number of new members of the pending state. ρadmissions(t) is a continuous piecewise linear function of time

ρadmissions(t)=k=1Kρadmissions,k(t)χ[tk1,tk)(t), 2.26

where the ratio of hospital admissions to potential patients during the kth time interval, ρadmissions,k(t), is given by

ρadmissions,k(t)=ρadmissions,k+1ρadmissions,ktktk1(ttk1)+ρadmissions,k, 2.27

and the indicator function for the kth time interval, χ[tk1,tk)(t), is defined in equation (2.21).

The parameters ρadmissions,1, ρadmissions,2,,ρadmissions,K+1 in equation (2.27) are the values ρadmissions(t) takes on a set of predefined dates t0,t1,,tK. The first date is 24 March 2020, and each date that follows is 12 weeks after the last, with the second date being 16 June 2020.

(iii). Symptom report data

Every weekday up to the previous calendar day, NHS Digital publishes a daily time series of the number of assessments completed through the NHS 111 telephone service where callers reported potential coronavirus (COVID-19) symptoms [14]. Leclerc et al. found a strong correlation between the volume of these symptom reports and the number of COVID-19 deaths reported 16 days later [15]. We assume that, like the other types of surveillance data, the number of assessment calls to NHS 111 on day t where callers reported potential COVID-19 symptoms, cobs(t), has a negative binomial distribution parameterized by c(t) and ϕcalls

cobs(t)NegativeBinomial(c(t),ϕcalls), 2.28

where

c(t)=ρcalls(t)×(2dLE2+2dII2), 2.29

i.e. the mean number of assessment calls to NHS 111 on day t where callers reported potential COVID-19 symptoms equals the ratio of symptom reports to potential symptom reporters, ρcalls(t), multiplied by the sum of the number of new members of the infectious and pending states.

ρcalls(t) is a continuous piecewise linear function of time almost identical to ρadmissions(t), which is defined by equations (2.26) and (2.27). The only difference is that the parameters ρcalls,1, ρcalls,2,,ρcalls,L+1 are associated with a different set of predefined dates t0, t1,,tL. The first of these dates is 24 March 2020, and each date that follows is four weeks after the last, with the second date being 16 March 2020.

3. Scoring rules

Scoring rules produce real numbers, also called numerical scores, that summarize the quality of probabilistic forecasts. More concretely, consider a probabilistic forecast P of an uncertain future quantity X for which the observation x eventually materializes. In a scenario such as this, a scoring rule provides a numerical score s(P,x) that quantifies the statistical consistency between the predictive distribution P and the observation x. Table 1 shows the simple scoring rules that we use in this paper.

Table 1.

The set of simple scoring rules that feature in this paper. In the definitions, px is the probability mass of the predictive distribution for an observed count x, ||p||2=k=0pk2, Pk is the value of cumulative predictive distribution for a count k, 1(.) is the indicator function, and μP and σP2 are the mean and variance of the predictive distribution.

scoring rule definition reference
logarithmic score logs(P,x)=logpx Good [16]
quadratic score qs(P,x)=2px+||p||2 Wecker [17]
spherical score sphs(P,x)=px||p|| Czado et al. [18]
ranked probability score rps(P,x)=k=0{Pk1(xk)}2 Epstein [19]
Dawid–Sebastiani score dss(P,x)=(xμPσP)2+2logσP Gneiting & Raftery [20]
squared error score ses(P,x)=(xμP)2 Czado et al. [18]
normalized squared error score nses(P,x)=(xμPσP)2 Carroll & Cressie [21]

The logarithmic, quadratic, spherical, ranked probability, Dawid–Sebastiani and squared error scores in table 1 are negatively oriented, with better forecasts typically resulting in lower scores. These scores are also said to be proper in the sense that a forecaster minimizes them when quoting their true belief. Proper scoring rules are considered essential for incentivising honest forecasting, a position argued by Gneiting & Raftery [20].

By contrast, the normalized squared error score is improper, a quality that has resulted in it being discredited, for example, by Czado et al. [18], as a tool for evaluating probabilistic forecasts. We argue that viewing it through the lens of propriety leads to an underappreciation of its unique properties, particularly its ability to distinguish between over-confidence and over-caution. Indeed, we see the normalized squared error score as a valuable diagnostic tool with advantages over proper scoring rules in certain situations.

Interestingly, the normalized squared error score is popular in the tracking and data fusion community, which has studied performance measures for evaluating estimation algorithms, such as Li & Zhao’s absolute [22] and relative [23] error measures. Blasch et al. [24] and Chen et al. [25] describe a now-popular relative error measure called the normalized estimation error squared (NEES), which is a generalization of the normalized squared error score to multiple dimensions. Researchers in the community have used the NEES extensively as an easily understood approach to arguing the merits of, for example, different extensions of the Kalman filter to specific nonlinear settings where the extended Kalman filter is routinely over-confident [26].

Scores are typically reported as averages over multiple probabilistic forecasts, each for a distinct point in time. Following Czado et al. [18], we use uppercase to denote the mean score over several forecasts. The tables in this paper use the mean scores LogS, QS, SphS, RPS, DSS, SES and NSES.

4. Computational experiments

(a) . Set-up

We implement the statistical model described in §2 in Stan [8]. Stan is a probabilistic programming language that allows users to articulate statistical models and calibrate them with data using a Markov chain Monte Carlo method called the No-U-Turn sampler (NUTS), proposed by Hoffman & Gelman [27]. Stan also provides diagnostic information, such as warnings about divergent transitions, to help users check it has sampled the posterior faithfully.

In addition to encoding the statistical model, the Stan implementation includes prior distributions for the model parameters. Table 2 gives details of the prior distributions, we have chosen to use in the computational experiments, along with the reasoning behind their selection. Four of the prior distributions, for the parameters dL, dI, dP and ω, are based on estimates from the published literature on COVID-19. We base the prior distributions for dL and dI on estimates of the time variables rather than estimates of their mean values, for which we could not identify reliable estimates. This decision results in looser prior distributions and, in turn, wider posterior distributions for these two parameters.

Table 2.

Prior distributions for the parameters of the statistical model with the rationale for their selection. We use the symbol + to indicate a distribution with its lower tail truncated at zero.

parameter(s) prior distribution comment
α1 β(5.0,0.5) this reflects our belief that most of the population is initially susceptible.
α2 β(1.1,1.1) this reflects our uninformed beliefs about the initial division of the infected population into those who are infectious and those who are not.
β1 normal+(0.0,0.5) this is a generic, weakly informative prior inspired by the work of Gelman [28].
β2,,βJ normal+(βi1,σβ) this is random-walk prior with smoothness hyperparameter σβ that enforces correlation between contiguous βi.
dL normal+(4.0,3.0) this is based on an estimate of the incubation period provided by Pellis et al. [29].
dI normal+(5.0,4.0) this is based on an estimate of the delay from onset of symptoms to hospitalization provided by Pellis et al. [29].
dP normal+(13.0,4.0) this is based on an estimate of the mean delay from hospitalization to death provided by Linton et al. [30].
ω β(5.7,624.1) this is based on an estimate provided by Ward et al. [31].
1ϕdeaths, 1ϕadmissions, 1ϕcalls exponential(5.0) this is a containment prior for the overdispersion parameter, which is discussed by Simpson [32].
ρadmissions,k, ρcalls,l β(1.1,1.1) this reflects our uninformed beliefs about these ratio parameters.

The software implementation, which is publicly available on GitHub,1 has two idiosyncrasies worthy of discussion. First, the current implementation does not use any of the integrators provided by Stan to solve the system of ODEs in equation (2.10). Instead, a bespoke implementation of the explicit trapezoidal method [33] solves the system of ODEs for the transmission model. Anecdotally, the trapezoidal integrator significantly reduces runtime while producing acceptable numerical errors. Second, the current implementation only uses Stan’s default initialization strategy for 1/ϕdeaths, 1/ϕadmissions, 1/ϕcalls, ρadmissions,1,,ρadmissions,K+1, ρcalls,1,, and ρcalls,L+1 by drawing values uniformly between 2 and 2 on the unconstrained parameter space. Rather than doing this for α1, α2, β1,,βJ, dL, dI, dP and ω, the implementation draws uniformly from custom intervals to prevent initialization failures caused by unrealistic parameter values.

We calibrate the model with data for England by using NUTS to draw six independent Markov chains for each of seven smoothness hyperparameter, σβ, values. Each chain draws 512 samples and discards the first 256 drawn during warmup. We calibrate over the period from 24 March 2020 to 31 December 2020 with the death, hospital admission, and symptom report data described in §2b. Each calibration job produces a posterior distribution for the statistical model’s parameters, shown in table 2. For each job, we generate two posterior predictive distributions for the daily number of deaths by simulating from the statistical model, first using the posterior samples for the parameters and second using only a point estimate for the parameters. We perceive that there are occasions where the use of point estimates can explain over-confident estimates. Specifically, we use the mean of the posterior samples as the point estimate for the parameters. The posterior predictive distributions span 17 February 2020 to 21 January 2021, with the last 21 elements being three-week forecasts for which we calculate the LogS, QS, SphS, RPS, DSS, SES and NSES.

(b) . Results

Mean scores for the forecasts that we generated with posterior samples for the parameters and point estimates for the parameters are presented in the top and bottom of table 3, respectively. The columns contain results for different values of the smoothness hyperparameter σβ defined in table 2. Smaller values of σβ make the random-walk prior on the effective contact rate β(t) tighter, causing it to vary more slowly and, if low enough, to underfit the data. Conversely, larger values of the smoothness hyperparameter loosen the random-walk prior and allow overfitting of the data if σβ is high enough.

Table 3.

Mean scores for the three-week forecasts generated with different σβ values. Top: Simulating from the statistical model with a point estimate for the parameters. Bottom: Simulating from the statistical model with posterior samples for the parameters. The best value for each mean score is in bold.

σβ
scoring rule 0.0005 0.001 0.0025 0.005 0.01 0.025 0.05
point estimate
LogS 9.595 7.403 9.212 9.711 9.752 9.260 10.000
QS 0.002 0.001 0.002 0.004 0.005 0.006 0.990
SphS 0.001 0.027 0.005 0.001 0.000 0.004 0.000
RPS 693.406 176.730 349.403 202.763 201.986 124.211 0.000
DSS 27.478 12.367 29.167 28.042 28.184 24.997 209338988
SES 665563 81671 268904 89048 89110 33435 1055387
NSES 16.908 1.689 19.666 19.743 19.895 17.410 209338994
posterior samples
LogS 9.624 7.313 9.195 6.988 6.949 6.076 6.200
QS 0.002 0.000 0.002 0.000 0.000 0.003 0.004
SphS 0.001 0.024 0.005 0.025 0.025 0.052 0.058
RPS 696.828 181.678 393.543 123.172 122.706 50.085 74.223
DSS 27.452 12.356 25.139 11.797 11.776 9.992 10.440
SES 669959 85439 261226 37660 37607 4690 44607
NSES 16.880 1.581 15.464 2.399 2.372 0.340 0.276

Generally, the mean scores in table 3 are lower, or closer to one in the case of NSES, for the forecasts generated with posterior samples for the parameters than for those generated with point estimates. This observation highlights the importance of allowing uncertainty propagation from statistical inference to forecasting. The differences between the values produced by the two forecasting methods are small for the lowest smoothness hyperparameter value of 0.0005 but increase with σβ until they become large and pronounced.

We have emboldened the best value for each of the mean scores, corresponding to the lowest value for the mean proper scores, LogS, QS, SphS, RPS, DSS and SES, and the value closest to and less than one for NSES. We select the best NSES value in this way because values of less than one correspond to over-cautious forecasts, which are arguably less damaging in terms of their impact on decision making than over-confident forecasts, which have NSES values of greater than one. The majority of the best mean scores are for the posterior samples for the parameters and a σβ value of 0.025. We can see that this forecast, shown in figure 2a, is over-cautious, which the NSES value correctly summarizes. We can also see the forecast for the point estimate for the parameters in figure 2a. This forecast is biased because the mean resides outside the posterior distribution’s region of high probability mass, as it does for the Hybrid Rosenbrock distribution [34]. Although most of the best mean scores are for a σβ value of 0.025, the best values for the QS and SphS are for a smoothness hyperparameter value of 0.05. The differences, however, between the mean scores, 0.001 for QS and 0.006 for SphS, are relatively small and indicate little difference between the quality of the two forecasts when assessed with QS and SphS.

Figure 2.

Figure 2.

Forecasts generated with point estimates and posterior samples for smoothness hyperparameter, σβ, values of (a) 0.025 and (b) 0.005. (Online version in colour.)

The forecast generated with the posterior samples for the parameters and a σβ value of 0.005 has an NSES value of 2.399, which indicates that it is over-confident. We can see in figure 2b that the forecast fails to predict most of the future observations and is indeed over-confident. The other scoring rules do not provide any information about the over-confidence of this forecast. We, therefore, believe that NSES’s ability to distinguish between over-confidence and over-caution, given a single forecast, makes it a valuable diagnostic tool that should be used alongside proper scoring rules.

5. Conclusion

We have shown how to use simple scoring rules to develop a statistical model and improve its forecasting performance. The computational experiments presented in §4 demonstrate that the statistical model introduced in §2 provides the best forecasts when we use posterior samples for the parameters and a smoothness hyperparameter σβ value of 0.025. We, therefore, advocate simple scoring rules for evaluating epidemiological forecasts and NSES specifically to establish if they are over-confident or over-cautious.

One of the significant limitations of simple scoring rules is that we can only use them to assess forecasts of observable variables. Epidemiological modellers cannot apply them to important latent quantities, such as the effective reproduction number Rt and the growth rate r, which they often forecast. Accordingly, the epidemiological community needs a method for assessing forecasts of quantities for which the truth is unknown. Simulation-based calibration (SBC) [35] is a candidate method for this task, worth investigating further.

There are four worthwhile directions in which we can extend the statistical model presented in §2. The first direction involves adding components to the observation model described in §2b to allow calibration with a greater quantity and diversity of surveillance data. The second direction involves modifying it to accommodate surveillance data from other countries. The third direction entails making the disease-specific, geography-independent parameters dL, dI, dT and ω global to facilitate information sharing between regions, as is done by Birrell et al. [1]. The fourth and final direction involves removing the assumption that recovered individuals are indefinitely immune to reinfection, allowing reinfection, which is more realistic.

Acknowledgements

We thank Public Health England (PHE), the Joint Biosecurity Centre (JBC) and the UK Health Security Agency (UKHSA) for their support. We thank Breck Baldwin and Jose Storopoli for their help in advancing CoDatMo. We thank Veronica Bowman, Alexander Phillips and John Harris for their suggestions and Matthew Carter for configuring the HPC environment. We also thank the anonymous reviewers for their insightful comments, which helped improve the paper significantly.

Footnotes

Data accessibility

The code and data for the computational experiments are publicly available on GitHub (https://github.com/codatmo/UniversityOfLiverpool_PaperSubmission).

Authors' contributions

R.E.M.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; C.R.: data curation, investigation, software, validation, visualization, writing—review and editing; S.M.: conceptualization, funding acquisition, methodology, project administration, supervision, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare that we have no competing interests.

Funding

This work was supported by an ICASE Research Studentship jointly funded by EPSRC and AWE (EP/R512011/1); a Research Studentship jointly funded by EPSRC and the ESRC Centre for Doctoral Training on Quantification and Management of Risk and Uncertainty in Complex Systems Environments grant no. (EP/L015927/1); and EPSRC through the Big Hypotheses grant no. (EP/R018537/1).

References

  • 1.Birrell P, Blake J, van Leeuwen E, Gent N, De Angelis D. 2021. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Phil. Trans. R. Soc. B 376, 20200279. ( 10.1098/rstb.2020.0279) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Keeling MJ, Dyson L, Guyver-Fletcher G, Holmes A, Semple MG, Hill EM. 2020. Fitting to the UK COVID-19 outbreak, short-term forecasts and estimating the reproductive number. medRxiv.
  • 3.Abbott S et al. 2020. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts [version 2; peer review: 1 approved with reservations]. Wellcome Open Res. 5, 112. ( 10.12688/wellcomeopenres.16006.1) [DOI] [Google Scholar]
  • 4.Cramer EY, Lopez VK, Niemi J, George GE, Cegan JC, Dettwiller ID, England WP, Farthing MW, Hunter RH, Lafferty B, Linkov I. 2021. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US. medRxiv.
  • 5.Funk S et al. 2020. Short-term forecasts to inform the response to the COVID-19 epidemic in the UK. medRxiv preprint-BMJ Yale.
  • 6.Maishman T, Schaap S, Silk DS, Nevitt SJ, Woods DC, Bowman VE. 2021. Statistical methods used to combine the effective reproduction number, R (t), and other related measures of COVID-19 in the UK. (http://arxiv.org/abs/2103.01742) [DOI] [PMC free article] [PubMed]
  • 7.CoDatMo. 2021. Welcome to the CoDatMo site. See https://codatmo.github.io (accessed 9 September 2021).
  • 8.Carpenter B et al. 2017. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1-32. ( 10.18637/jss.v076.i01) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Storopoli J, Santos ALMF, Pellini ACG, Baldwin B. 2021. Simulation-driven COVID-19 epidemiological modeling with social media. (http://arxiv.org/abs/2106.11686)
  • 10.UK Government. 2021. Reproduction number (R) and growth rate: methodology. See www.gov.uk/government/publications/reproduction-number-r-and-growth-rate-methodology/reproduction-number-r-and-growth-rate-methodology (accessed 21 September 2021).
  • 11.Kermack WO, McKendrick AG. 1927. A contribution to the mathematical theory of epidemics. Phil. Trans. R. Soc. A 115, 700-721. [Google Scholar]
  • 12.UK Government. 2021. Coronavirus (COVID-19) in the UK. See https://coronavirus.data.gov.uk (accessed 13 September 2021).
  • 13.Stan Development Team. 2021. Negative binomial distribution (alternative parameterization). See https://mc-stan.org/docs/2_27/functions-reference/nbalt.html (accessed 13 September 2021).
  • 14.NHS Digital. 2021. Potential Coronavirus (COVID-19) symptoms reported through NHS Pathways and 111 online. See https://digital.nhs.uk/data-and-information/publications/statistical/mi-potential-covid-19-symptoms-reported-through-nhs-pathways-and-111-online/latest (accessed 14 September 2021).
  • 15.Leclerc QJ, Nightingale ES, Abbott S, Jombart T. 2021. Analysis of temporal trends in potential COVID-19 cases reported through NHS pathways England. Sci. Rep. 11, 7106. ( 10.1038/s41598-021-86266-3) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Good IJ. 1952. Rational decisions. J. R. Stat. Soc. B, Methodol. 14, 107-114. [Google Scholar]
  • 17.Wecker WE. 1989. Assessing the accuracy of time series model forecasts of count observations. J. Bus. Econ. Stat. 7, 418-419. [Google Scholar]
  • 18.Czado C, Gneiting T, Held L. 2009. Predictive model assessment for count data. Biometrics 65, 1254-1261. ( 10.1111/j.1541-0420.2009.01191.x) [DOI] [PubMed] [Google Scholar]
  • 19.Epstein ES. 1969. A scoring system for probability forecasts of ranked categories. J. Appl. Meteorol. 8, 985-987. () [DOI] [Google Scholar]
  • 20.Gneiting T, Raftery AE. 2007. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359-378. ( 10.1198/016214506000001437) [DOI] [Google Scholar]
  • 21.Carroll SS, Cressie N. 1997. Spatial modeling of snow water equivalent using covariances estimated from spatial and geomorphic attributes. J. Hydrol. 190, 42-59. ( 10.1016/S0022-1694(96)03062-4) [DOI] [Google Scholar]
  • 22.Li XR, Zhao Z. 2001. Measures of performance for evaluation of estimators and filters. In Proc. SPIE 4473, signal and data processing of small targets, 26 November 2001. 10.1117/12.492751) [DOI]
  • 23.Li XR, Zhao Z. 2005. Relative error measures for evaluation of estimation algorithms. In 7th Int. Conf. on Information Fusion, 25–28 July 2005, p. 8. 10.1109/ICIF.2005.1591857) [DOI]
  • 24.Blasch EP, Rice A, Yang C. 2006. Nonlinear tracking evaluation using absolute and relative metrics. In Proc. SPIE 6236, Signal and Data Processing of Small Targets, 19 May 2006, 62360L. 10.1117/12.666463) [DOI]
  • 25.Chen Z, Heckman C, Julier S, Ahmed N. 2018. Weak in the NEES?: auto-tuning Kalman filters with Bayesian optimization. In 21st Int. Conf. on Information Fusion (FUSION), 10–13 July 2018, pp. 1072–1079. 10.23919/ICIF.2018.8454982) [DOI]
  • 26.Longbin M, Xiaoquan S, Yiyu Z, Kang SZ, Bar-Shalom Y. 1998. Unbiased converted measurements for tracking. IEEE Trans. Aerosp. Electron. Syst. 34, 1023-1027. ( 10.1109/7.705921) [DOI] [Google Scholar]
  • 27.Hoffman MD, Gelman A. 2014. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593-1623. [Google Scholar]
  • 28.Gelman A. 2020. Prior Choice Recommendations. See https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations (accessed 16 September 2021).
  • 29.Pellis L et al. 2021. Challenges in control of COVID-19: short doubling time and long delay to effect of interventions. Phil. Trans. R. Soc. B 376, 20200264. ( 10.1098/rstb.2020.0264) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Linton NM, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov AR, Jung S-m, Yuan B, Kinoshita R, Nishiura H. 2020. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data. J. Clin. Med. 9, 538. ( 10.3390/jcm9020538) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ward H et al. 2021. SARS-CoV-2 antibody prevalence in England following the first peak of the pandemic. Nat. Commun. 12, 1-8. ( 10.1038/s41467-020-20314-w) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Simpson D. 2018. Justify my love. See https://statmodeling.stat.columbia.edu/2018/04/03/justify-my-love/ (accessed 16 September 2021).
  • 33.Ascher UM, Petzold LR. 1998. Computer methods for ordinary differential equations and differential-algebraic equations. Philadelphia, PA: Siam. [Google Scholar]
  • 34.Pagani F, Wiegand M, Nadarajah S. 2021. An n-dimensional Rosenbrock distribution for Markov chain Monte Carlo testing. Scand. J. Stat. 49, 657-680. ( 10.1111/sjos.12532) [DOI] [Google Scholar]
  • 35.Talts S, Betancourt M, Simpson D, Vehtari A, Gelman A. 2020. Validating Bayesian inference algorithms with simulation-based calibration. (http://arxiv.org/abs/1804.06788)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The code and data for the computational experiments are publicly available on GitHub (https://github.com/codatmo/UniversityOfLiverpool_PaperSubmission).


Articles from Philosophical transactions. Series A, Mathematical, physical, and engineering sciences are provided here courtesy of The Royal Society

RESOURCES