Highlights
-
•
A Bayesian semi-mechanistic model was applied to the Ebola Forecasting Challenge.
-
•
Model fits to simulated data were obtained from particle Markov-chain Monte Carlo.
-
•
Posterior samples of model parameters were used to generate forecast trajectories.
-
•
The forecasts were assessed using subsequently released simulation points.
Keywords: Forecasting, Real-time modelling, Infectious disease dynamics, Outbreak
Abstract
Real-time forecasts of infectious diseases can help public health planning, especially during outbreaks. If forecasts are generated from mechanistic models, they can be further used to target resources or to compare the impact of possible interventions. However, paremeterising such models is often difficult in real time, when information on behavioural changes, interventions and routes of transmission are not readily available. Here, we present a semi-mechanistic model of infectious disease dynamics that was used in real time during the 2013–2016 West African Ebola epidemic, and show fits to a Ebola Forecasting Challenge conducted in late 2015 with simulated data mimicking the true epidemic. We assess the performance of the model in different situations and identify strengths and shortcomings of our approach. Models such as the one presented here which combine the power of mechanistic models with the flexibility to include uncertainty about the precise outbreak dynamics may be an important tool in combating future outbreaks.
1. Introduction
Forecasting the incidence of infectious diseases is an important part of public health and intervention planning. This was especially true during the 2013–2016 West African Ebola epidemic, when the rapid expansion of the outbreak triggered an enormous national and international public health response in late summer 2014. From November 2014, the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) at the London School of Hygiene & Tropical Medicine produced weekly situation reports presenting updates of publicly available epidemiological data, model fits and forecasts, and estimates of key epidemiological parameters. These reports were distributed to a wide range of public health planners, policy makers, field workers and academics in several countries by email, and were made publicly available on a dedicated web site (Center for the Mathematical Modelling of Infectious Diseases, 2015).
The forecasts in these situation reports were produced using a stochastic semi-mechanistic model of Ebola transmission (Camacho et al., 2015b). The model was mechanistic in the sense that it relied on a compartmental description of the epidemiological status of the population based on known aspects of Ebola infection such as the incubation rate or infectious period. To model transmission between individuals, however, we used a more phenomenological, stochastic approach. During an emergency such as the Ebola epidemic, it is difficult to determine the precise factors underlying disease transmission that are required to inform a fully mechanistic model. Information about the relative importance and intensity of transmission in the community, hospital or at funerals (Faye et al., 2015), about the exact extent of control measures and their impact (WHO Ebola Response Team, 2015), about behavioural changes in the community (Funk et al., 2014) as well as about the potential role of seasonality (Groseth et al., 2007) or genetic changes in the virus (Carroll et al., 2015) were not available in real-time. To capture the overall change in transmission arising from these different mechanisms, we modelled transmission between individuals using a time-varying stochastic rate.
Capturing the uncertainty in transmission in a stochastic term gives the model the flexibility to match the data in the presence of noise and uncertainty. In addition, the inferred trajectories of the transmission rate can directly be interpreted as change in the reproduction number and thus provide valuable information for decision makers, for example by indicating how far the outbreak is from being under control.
Here, we present model fits and forecasts generated as part of the Ebola Forecasting Challenge conducted in 2015 after the true epidemic had waned. The challenge was based on four scenarios of synthetic data inspired from the outbreak in Liberia outbreak, with increasing levels of noise and uncertainty (see Vespignani et al., in this issue). We used a similar model to the one used during the Ebola epidemic fitted to the simulated outbreak trajectories generated as part of the Challenge to produce forecasts of the number cases at upcoming time points. We particularly focus on methodological issues and forecasting performance. We report results on county-level data of scenario 1, and assess forecasts at time points 1, 2, 4 and 5 (time points 3 was not considered for logistical reasons). Results for the other scenarios are shown in the supplementary material.
2. Methods
Our semi-mechanistic model of Ebola dynamics is a modified Susceptible-Exposed-Infectious-Recovered (SEIR) model, accounting for delays in notification and time-varying transmission (Fig. 1). At time t, the force of infection experienced by susceptible individuals is λt = βtIt/N, where It is the overall number of infectious individuals, N the population size and βt the time-varying transmission rate which follows a random walk (Dureau et al., 2013):
(1) |
Here, Wt denotes a Wiener process (Durrett, 1984), σ the volatility of the transmission rate and the log-transform ensures positivity of βt. Modelling the time-varying transmission rate as a random walk means it is auto-correlated: the transmission rate on any day is most likely to be the same as on the previous one.
Fig. 1.
Flow between compartments of the transmission model.
Upon infection at rate λt, susceptible individuals (S) move from being exposed (E) to being infectious at a rate given by the reciprocal of the incubation period (1/Dinc). We used two exposed sub-compartments in sequence to obtain an Erlang-distribution of the incubation period with shape k = 2 (Lloyd, 2001), and split the estimated initial number of exposed individuals evenly between these two sub-compartments. To account for delays in reporting of new cases, the infectious compartment was split into two compartments representing infectious but not yet reported (C) and infectious and potentially reported (Q) cases. The transition between C and Q occurs at a randomly varying rate with a mean equal to the reciprocal of the average reporting delay (1/Drep) and 10% overdispersion following a Gamma distribution. This stochastic variation is introduced to capture non-independence in the time until cases get reported, in order to capture situations where, for example, several members of the same family might be reported simultaneously (Camacho et al., 2015b). Lastly, infectious individuals are removed (R) when they recover or die from the Q compartment at a rate equal to the reciprocal of the difference between the infectious period and the reporting delay (Dout = Dinf − Drep). The model can be formulated as a set of stochastic differential equations which was simulated with the noise term fixed for a time step of 1 day and a Runge–Kutta method solving the remaining ordinary differential components (Milstein and Tretyakov, 2004). The only stochastic components are the trajectory of the transmission rate and the reporting noise, in contrast to the model used for the situation reports during the true Ebola epidemic, which also included demographic noise.
The observation process was modelled to operate on the weekly incidence (Zt), given by the number of infectious individuals entering the Q compartment. The observed incidence () was assumed to follow a normal approximation (chosen for computational efficiency) to the negative binomial distribution with reporting probability p and overdispersion ϕ:
(2) |
where standard deviations smaller than 1 were rounded up to 1 to avoid the singularity at Zt = 0. Note that stochastic variation here captures variability in the probability that cases get reported, whereas the stochastic variation acting on the transition from C to Q captures variability in the delay until cases can get reported.
The model thus has 8 parameters, which we either estimated from the line list of cases, took from a study on a pre-2014 outbreak of Ebola (Camacho et al., 2014), or estimated from model fits process (Table 1). Prior ranges of the transmission rate volatility and reporting overdispersion were established in preliminary runs and chosen to be able to sufficiently capture sudden changes in cases without allowing a degree of variation that would render the algorithm unstable.
Table 1.
Parameters used in the model and their values/ranges.
Parameter | Value or prior range | Description | Source |
---|---|---|---|
Dinc | variable | Mean delay from infection to symptoms | line list (where available) |
6 days | (Camacho et al., 2014) | ||
Drep | 1 week | Mean delay from symptom onset to notification | assumption |
Dinf | variable | Mean delay from symptom onset to outcome | line list (where available) |
7.8 days | (Camacho et al., 2014) | ||
p | 0.7 | Proportion of cases reported | (Camacho et al., 2014) |
σ | Volatility of the transmission rate | Fitted | |
ϕ | Overdispersion in reporting | Fitted | |
E★ | Initial number of exposed individuals | Fitted | |
Initial reproduction number | Fitted |
We used a Metropolis-Hastings particle Markov chain Monte-Carlo (pMCMC) algorithm to sample from the joint posterior distributions of the estimated parameters and states of the model (i.e. the trajectories). In brief, at each MCMC step, a particle filter is used to estimate the likelihood of the proposed parameter set, and to generate a sampled trajectory of the states of the model and the transmission rate βt from their marginal posterior distribution (Andrieu et al., 2010).
Our forecasts were generated under a “no change” hypothesis: we assumed that the transmission rate would remain constant after the last observed data point. More precisely, we sampled 10,000 parameter sets from the posterior distribution in combination with the associated states and estimated values of βt at the last observed data point, and simulated the model forward one year. The future number of reported cases was generated by applying the observation process to the forecast incidence. Predicted reported peak incidence, death counts and final sizes were calculated from the sampled forecast observation trajectories. County-level forecasts were obtained under the assumption that no transmission occurred between counties.
Model fits were generated using a fully automated algorithm applied to all the regional and national data sets as follows, implemented to facilitate convergence of the computationally intensive pMCMC sampler and to avoid long burn-in and low effective sample sizes: First, Metropolis-Hastings MCMC was run on the deterministic equivalent of the transmission model (with constant transmission rate and no multiplicative noise) to determine a reasonable starting location in parameter space. Then, the covariance matrix of the multivariate normal proposal distribution of the MCMC was adapted using the empirical variance in each parameter within accepted proposals, iterating and adapting an overall scale factor until an acceptance rate between 0.1 and 0.5 was achieved. Using the last parameter sample from this procedure in the fully stochastic model, the number of particles in the particle filter was calibrated numerically as the minimum number of particles required to obtain a variance of the likelihood estimator below 1. Starting from the covariance matrix of the proposal distribution obtained by fitting the deterministic model, the fully stochastic model was fitted using pMCMC, with the calibrated number of particles, and the proposal distribution was then again adapted using the same procedure as in the first step. This adapted proposal was used to run a final pMCMC for 10,000 iterations. The procedure was implemented using libbi (Murray, 2013) and the R packages rbi (Jacob and Funk, 2016) and rbi.helpers (Funk, 2016). A typical model fit took 5–20 min on an 8-core 2.5 GHz node (with 8 parallel threads). All the code used is freely available at http://github.com/sbfnk/ebola_forecasting_challenge.
3. Results
Our model was able to reproduce the observed trajectories both in scenarios of exponential increase followed by rapid decline as well as more sustained unchanged transmission (see Fig. 2 for fits to an example time point of an example scenario, others are shown in the supplementary material). Greater variability in individual trajectories was captured by greater volatility in the transmission rate (σ) as well as overdispersion in the observation process (ϕ) (Fig. 3). The two corresponding noise terms enter the model in different ways, and the estimated variability in each depends on the trajectory in question. In the model, transmission noise is correlated in time because the transmission rate is assumed to follow a random walk. In other words, the transmission rate at any time varies stochastically around the transmission rate at the previous time. Observation noise, on the other hand, is uncorrelated, and the proportion of cases reported at any time is independent of the proportion of cases reported at a previous time.
Fig. 2.
Fitted (black) and predicted (blue) incidence at time point 4 (week 35) of scenario 1. Median lines and 50% (dark) and 95% (light) credible interval ranges are shown, calculated across all trajectories at every time point. Fitted data points are shown in red, future data points (not included in the fits) in black. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
Fig. 3.
Parameter estimates of the transmission rate volatility σ (top, blue) and reporting overdispersion ϕ (bottom, red) at time point 4 (week 35) of scenario 1. Shown are the median (vertical bar), interquartile range (box), the most extreme values within 1.5 times the interquartile range (outer lines) and outliers (dots). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
Trajectories of the regional reproduction numbers ( in the absence of significant depletion of susceptibles) showed a decline over the course of the epidemic in most counties, with occasional peaks and sometimes longer periods of a reproduction number close to 1, indicating the potential for sustained transmission (Fig. 4). There was no obvious pattern in the R0 trajectories, indicating that any attempt at finding a mechanistic basis for the time-varying behaviour of the transmission rate would have had to take into account differences between regions.
Fig. 4.
Fitted (black) and predicted (blue) trajectories of the transmission rate at time point 4 (week 35) of scenario 1, shown here rescaled with the infectious period to correspond to the reproduction number . Median lines and 50% (dark) and 95% (light) credible interval ranges are shown. The horizontal dashed lines are at . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
Retrospectively comparing the credible intervals of our district-level forecasts made at 4 with data provided later showed a decline in reliability as the forecast horizon increases (Fig. 5). The number of data points inside the 50% credible interval was slightly but consistently underestimated, with 50% and 60% of data points inside the interval, while errors were consistent with a level of 50%. This bias became stronger when we increased the forecasting horizon, with more than 70% of data points inside the 50% credible interval at 10 weeks forecast, and errors no longer consistent with a level of 50%. Further, the number of data points inside the 95% credible interval was slightly but consistently underestimated, while consistent with a level of 95%. At this level, no obvious change in performance could be observed as the forecasting horizon was expanded to 10 weeks. Lastly, our predictions were consistent with 50% of data greater than the predicted median. There was, however, a declining trend in the proportion of observations greater than the median, hinting at an overestimation of cases as the forecasting horizon increases.
Fig. 5.
Forecasting performance, shown as proportion of data points (across regions) within the 50% credible interval (left), 95% credible interval (centre) and above the median (right) as a function of the distance in weeks predicted ahead. Shown are the mean and Bayesian 95% confidence interval using a conjugate beta prior (Gelman et al., 2013) across time points and counties of scenario 1. Trend lines for the means were obtained using locally weighted smoothing.
4. Discussion
We used a stochastic semi-mechanistic model of Ebola transmission to fit the simulated trajectories in the Ebola Forecasting Challenge, and to produce forecasts that were compared to following data points. Our model is flexible enough to accommodate changes in the transmission rate that might occur due to unobserved processes. This is particularly useful in an outbreak situation where a combination of behavioural changes, potential biological changes such as viral evolution, and interventions by a variety of national and international organisations make it practically impossible to identify the key drivers of the outbreak dynamics required to inform a fully parameterised mechanistic model.
While the predictions of the model were good when nothing changes, there were divergences when predictions were made just before the epidemic peaks and the transmission rate declines (Fig. 2, Grand Gedeh), where there was resurgence or successive waves (Fig. 2, Gbarpolu), or where the epidemic was highly stochastic due to the small number of reported cases (Fig. 2, Sinoe). In the latter case, including demographic stochasticity (Keeling and Rohani, 2008) may have improved the reliability of the forecasts.
More generally, the quality of the forecasts could be seen to decrease as the forecasting horizon was expanded, with the number of data points inside the 50% credible interval increasing to more than 70%, and fewer than 40% of data points greater than the median. This would indicate that the model tended to overestimate the expected number of cases. A possible explanation for this is that all the data provided to us followed a scenario of an overall decreasing reproduction number, while our forecasts were derived assuming an assumption of no change in the transmission rate.
While the simulated trajectories presented in the Ebola Forecasting Challenge resembled the true trajectories of the West African Ebola epidemic, our model allowed for a broad range of possible scenarios including widespread epidemics where there was no clear evidence of a sustained decline. Some of our predictions may therefore appear to vastly overestimate future case numbers, but without any further information on the context of the epidemics, these scenarios could not be deemed unrealistic. Moreover, our model did not contain any mechanism for predicting increases or decreases before they occurred. In a real situation, other information could have been sourced to exclude some scenarios, or to find indicators of an impeding increase or decline. With respect to the Ebola Forecasting Challenge, using the provided situation reports may have helped yielded such information.
We chose not to change the model between forecasting time points, to provide a fair assessment of the performance of our model. However, a number of changes could have been made to improve the particular predictions requested. Our assumption of no change from the last data point meant that the forecasts were highly sensitive to fluctuations in the transmission rate at the prediction point. At that point, the model was relatively poorly informed by the data. The data at time T only inform the trajectory of the transmission rate up to time T − Drep − Dinc, that is the latest data point minus the delays because of reporting and incubation periods. After that, the transmission rate mostly follows an random walk on the logarithmic scale, which creates the potential for large variations in the transmission rate that are then propagated into the prediction. A possible improvement would have been be to average over the transmission rate over the last few time points. For the weekly forecasts we generated during the true West African Ebola epidemic, we presented both forecasts based on an average of transmission rates over the last three weeks and forecasts based on the last value.
Replacing our stochastic model for the transmission rate with a finite-variance random walk such as an Ornstein-Uhlenbeck process (Doob, 1942) would have allow us to propagate the estimated volatility instead of keeping the transmission rate fixed. Moreover, our parameter estimates including the volatility of transmission rate were informed by the entire fitted trajectories including early stages of the epidemic which may not have been relevant to later times, especially the declining phase. An approach where only the last few data points are used in the fits that inform the forecasts may have yielded more reliable forecasting results.
Because of its relatively simple mechanistic skeleton, our model is more suitable for short-term predictions of incidence than long-term predictions of final size or peak size and timing. The stochastic nature of the transmission rate allows the model to flexibly fit any trajectory and could mask underlying mis-specifications in the deterministic core. Our approach does, however, come with the important advantage of being able to test the impact of interventions or to inform vaccine trials, as has been successfully done during the West African Ebola epidemic (Camacho et al., 2015a). With this key capability, mechanistic models such as the one presented here, combined with improvements in forecasting methodology, can be expected to play a key role in informing the response to future outbreaks.
Acknowledgements
This work was supported by the Research for Health in Humanitarian Crises (R2HC) Programme, managed by Research for Humanitarian Assistance (Grant 13165). SF, AJK and AC were supported by fellowships from the UK Medical Research Council (SF: MR/K021680/1, AJK: MR/K021524/1, AC: MR/J01432X/1). RME was supported by the Innovative Medicines Initiative 2 (IMI2) Joint Undertaking under grant agreement EBOVAC1 (Grant 115854). The IMI2 is supported by the European Union Horizon 2020 Research and Innovation Programme and the European Federation of Pharmaceutical Industries and Associations.
Footnotes
Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.epidem.2016.11.003.
Appendix A. Supplementary data
The following are the supplementary data to this article:
Fits and predictions for all scenarios and time points. See the caption of Fig. 2 for details.
References
- Andrieu C., Doucet A., Holenstein R. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. 2010;72(pt. 3):269–342. [Google Scholar]
- Camacho A., Kucharski A., Funk S., Breman J., Piot P., Edmunds W. Potential for large outbreaks of Ebola virus disease. Epidemics. 2014;9:70–78. doi: 10.1016/j.epidem.2014.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camacho A., Eggo R.M., Funk S., Watson C.H., Kucharski A.J., Edmunds W.J. Estimating the probability of demonstrating vaccine efficacy in the declining Ebola epidemic: a Bayesian modelling approach. BMJ Open. 2015;5(12):e009346. doi: 10.1136/bmjopen-2015-009346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camacho A., Kucharski A., Aki-Sawyerr Y., White M.A., Flasche S., Baguelin M., Pollington T., Carney J.R., Glover R., Smout E., Tiffany A., Edmunds W.J., Funk S. Temporal changes in Ebola transmission in Sierra Leone and implications for control requirements: a real-time modelling study. PLoS Curr. 2015;7 doi: 10.1371/currents.outbreaks.406ae55e83ec0b5193e30856b9235ed2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carroll M.W., Matthews D.A., Hiscox J.A., Elmore M.J., Pollakis G., Rambaut A., Hewson R., García-Dorival I., Bore J.A., Koundouno R., Abdellati S., Afrough B., Aiyepada J., Akhilomen P., Asogun D., Atkinson B., Badusche M., Bah A., Bate S., Baumann J., Becker D., Becker-Ziaja B., Bocquin A., Borremans B., Bosworth A., Boettcher J.P., Cannas A., Carletti F., Castilletti C., Clark S., Colavita F., Diederich S., Donatus A., Duraffour S., Ehichioya D., Ellerbrok H., Fernandez-Garcia M.D., Fizet A., Fleischmann E., Gryseels S., Hermelink A., Hinzmann J., Hopf-Guevara U., Ighodalo Y., Jameson L., Kelterbaum A., Kis Z., Kloth S., Kohl C., Korva M., Kraus A., Kuisma E., Kurth A., Liedigk B., Logue C.H., Ldtke A., Maes P., McCowen J., Mély S., Mertens M., Meschi S., Meyer B., Michel J., Molkenthin P., Muñoz-Fontela C., Muth D., Newman E.N.C., Ngabo D., Oestereich L., Okosun J., Olokor T., Omiunu R., Omomoh E., Pallasch E., Pályi B., Portmann J., Pottage T. Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in West Africa. Nature. 2015;524(7563):97–101. doi: 10.1038/nature14594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Center for the Mathematical Modelling of Infectious Diseases . 2015. Visualisation and Projections of the Ebola Outbreak in West Africa.http://ntncmch.github.io/ebola/ [Google Scholar]
- Doob J.L. The Brownian movement and stochastic equations. Ann. Math. 1942;43(2):351. [Google Scholar]
- Dureau J., Kalogeropoulos K., Baguelin M. Capturing the time-varying drivers of an epidemic using stochastic dynamical systems. Biostatistics. 2013;14(3):541–555. doi: 10.1093/biostatistics/kxs052. [DOI] [PubMed] [Google Scholar]
- Durrett R. The Wadsworth Mathematics Series; Wadsworth, Belmont: 1984. Brownian Motion and Martingales in Analysis. [Google Scholar]
- Faye O., Boëlle P.-Y., Heleze E., Faye O., Loucoubar C., Magassouba N., Soropogui B., Keita S., Gakou T., Bah E.H.I., Koivogui L., Sall A.A., Cauchemez S. Chains of transmission and control of Ebola virus disease in Conakry, Guinea, in 2014: an observational study. Lancet Infect. Dis. 2015;15(3):320–326. doi: 10.1016/S1473-3099(14)71075-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Funk S., Knight G.M., Jansen V.A.A. Ebola: the power of behaviour change. Nature. 2014;515(7528):492. doi: 10.1038/515492b. [DOI] [PubMed] [Google Scholar]
- Funk S. 2016. RBi. Helpers: RBi Helper Functions, R Package version 0.2.https://github.com/sbfnk/RBi.helpers [Google Scholar]
- Gelman A., Carlin J., Stern H., Dunson D., Vehtari A., Rubin D. 3rd edition. CRC Press; Boca Raton, USA: 2013. Bayesian Data Analysis. [Google Scholar]
- Groseth A., Feldmann H., Strong J.E. The ecology of Ebola virus. Trends Microbiol. 2007;15(9):408–416. doi: 10.1016/j.tim.2007.08.001. [DOI] [PubMed] [Google Scholar]
- Jacob P.E., Funk S. 2016. RBi: R Interface to LibBi. R Package Version 0.4.1.https://CRAN.R-project.org/package=rbi [Google Scholar]
- Keeling M.J., Rohani P. Princeton University Press; Princeton: 2008. Modeling Infectious Diseases in Humans and Animals. [Google Scholar]
- Lloyd A.L. Realistic distributions of infectious periods in epidemic models: changing patterns of persistence and dynamics. Theor. Popul. Biol. 2001;60(1):59–71. doi: 10.1006/tpbi.2001.1525. [DOI] [PubMed] [Google Scholar]
- Milstein G.N., Tretyakov M.V. Stochastic Numerics for Mathematical Physics. 2004. Numerical methods for SDEs with small noise. [Google Scholar]
- Murray, L.M., 2013. Bayesian state-space modelling on high-performance hardware using LibBi. http://arxiv.org/abs/1306.3277.
- WHO Ebola Response Team West African Ebola epidemic after one year – slowing but not yet under control. N. Engl. J. Med. 2015;372(6):584–587. doi: 10.1056/NEJMc1414992. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Fits and predictions for all scenarios and time points. See the caption of Fig. 2 for details.