Abstract
Background: Emerging pathogens such as Zika, chikungunya, Ebola, and dengue viruses are serious threats to national and global health security. Accurate forecasts of emerging epidemics and their severity are critical to minimizing subsequent mortality, morbidity, and economic loss. The recent introduction of chikungunya and Zika virus to the Americas underscores the need for better methods for disease surveillance and forecasting.
Methods: To explore the suitability of current approaches to forecasting emerging diseases, the Defense Advanced Research Projects Agency (DARPA) launched the 2014–2015 DARPA Chikungunya Challenge to forecast the number of cases and spread of chikungunya disease in the Americas. Challenge participants (n=38 during final evaluation) provided predictions of chikungunya epidemics across the Americas for a six-month period, from September 1, 2014 to February 16, 2015, to be evaluated by comparison with incidence data reported to the Pan American Health Organization (PAHO). This manuscript presents an overview of the challenge and a summary of the approaches used by the winners.
Results: Participant submissions were evaluated by a team of non-competing government subject matter experts based on numerical accuracy and methodology. Although this manuscript does not include in-depth analyses of the results, cursory analyses suggest that simpler models appear to outperform more complex approaches that included, for example, demographic information and transportation dynamics, due to the reporting biases, which can be implicitly captured in statistical models. Mosquito-dynamics, population specific information, and dengue-specific information correlated best with prediction accuracy.
Conclusion: We conclude that with careful consideration and understanding of the relative advantages and disadvantages of particular methods, implementation of an effective prediction system is feasible. However, there is a need to improve the quality of the data in order to more accurately predict the course of epidemics.
Keywords: Chikungunya, Forecasting, Morphological models, Mechanistic models
Background
Mathematical models for infectious diseases have been used to gain insight into disease dynamics for more than a century [1–4]. However, only recently have models and systems begun to be designed specifically for the task of providing regularly updated quantitative forecasts of infectious disease spread that are analogous to those available for weather prediction. Forecasting approaches vary substantially in both method and complexity; for example, some use human judgment or prediction markets, some use purely statistical or machine learning approaches, and others rely upon disease transmission models of varying complexity [5–8].
In parallel, recent experiences responding to outbreaks have highlighted the significant utility of infectious disease forecasts to support decision-making [9, 10]. Models provide critical insight in the face of limited data by forecasting the international spread of viruses, illustrating the value of different mitigation strategies, and assessing the risk of continued danger in cases such as the 2009 influenza pandemic [11, 12]. Early predictions for the 2014-2015 Ebola outbreak in West Africa indicated that incidence would continue to grow rapidly unless significant mitigation measures were undertaken [13]. This information helped galvanize the international response to the crisis and indicate the importance of rapid deployment of resources. As the outbreak progressed, incidence forecasts were used to inform the planning and execution of clinical trials for vaccines and therapeutics by ensuring that activities were responding to the rapidly changing situation and that decision makers had adequate time to develop contingency plans [14, 15].
Disease forecasting has received significant attention among the mathematical epidemiological community as well as decision makers. For example, the 2012 National Strategy for Biosurveillance [16] specifically identified forecasting as one of the core functions of a national biosurveillance enterprise. Building upon this, the 2013 National Biosurveillance Science and Technology Roadmap identified several key research priorities, including additional research and development for disease forecasting technology, which are critical to achieving the overall goal of providing decision makers with more accurate and timely information during biological incidents.
In response to this madate, several United States (US) Government agencies have conducted challenge and prize competitions that involved infectious disease forecasting in an effort to help mature operational forecasting technologies. The Center for Disease Control and Prevention has organized consecutive challenges for the 2013-2018 influenza seasons that have focused on predicting the timing and intensity of influenza-like illness (ILI) in the US at the regional level [17, 18]. In 2015, several departments in the US Government joined together with the support of the National Science and Technology Council to launch an open dengue challenge that strove to forecast disease incidence using previously unpublished data from Peru and Puerto Rico [19]. The 2014-2015 DARPA Chikungunya Challenge was conceived as an effort to mobilize a wide variety of participants to foster innovation and advance the state of the art by attempting to predict chikungunya incidence across the Americas [20].
Nonetheless, significant challenges remain for the development of operational forecasting as a mature technology [21]. The fundamental science of forecasting needs to be developed and supported by a robust research program. Data availability is often limited, especially during outbreak responses, and this hampers the ability to provide critical insights in a timely fashion. While some decision makers have embraced the use of modeling and forecasting, others remain skeptical, having been presented with forecasts that were inaccurate and that did not make the inherent underlying uncertainties clear.
This manuscript summarizes the challenge and provides a description of the top six solver submissions including data sources and methodologies.
Chikungunya challenge
Chikungunya is a mosquito-borne viral infection of humans. Although rarely fatal, chikungunya is an emerging, debilitating viral disease that is transmitted among humans by mosquitoes [22]. There is no specific treatment for the disease, although palliative care has been shown to reduce its severity and duration. The chikungunya virus (CHIKV) was originally detected in Tanzania in 1952, with the name meaning ‘to become contorted’ in the Kimakonde language of Mozambique, referring to the effects of severe joint pain [23]. Chikungunya expanded to Asia and the Indo-Pacific islands, causing notably large outbreaks over the past 10-20 years.
The CHIKV epidemic was well suited for this Challenge because its spread to the Western Hemisphere had been expected for some years and presented a valuable opportunity to evaluate disease progression in a naive population. Further, there was a pre-existing reporting system via the Pan American Health Organization (PAHO) in place for tracking disease incidence across the Americas. The goal of the DARPA Chikungunya Challenge was to evaluate state-of-the-art epidemic modeling methods to forecast outbreaks of CHIKV throughout the Americas, to compare modeling strategies, and to provide insight into how different data streams could be incorporated into these models. The Challenge provided a baseline of current forecasting capabilities for infectious diseases and their applicability for vector-borne infectious diseases.
Design and execution of the DARPA Chikungunya challenge
The introduction of CHIKV into the Western Hemisphere had been anticipated, and the first case was recorded in Saint Martin in December 2013 [24]. Its emergence in the Caribbean caused substantial morbidity in the population and concern about subsequent spread in the Americas. After the first cases were reported in December 2013, the virus spread throughout the Eastern Caribbean islands and into Central and South America, reaching the United States in mid-July, 2014. Since then, Zika has been detected in several countries and territories of the Americas [25]. As of epidemiological week 35 of 2014 (September 18, 2014), when the DARPA Chikungunya Challenge was initiated, 659,367 cases, including 37 deaths, had been reported in the Americas. The disease was determined to be an ideal candidate for the DARPA Chikungunya Challenge because of the predictable spread of the virus among an immunologically naive population, and the availability of incidence data reported by participating countries to PAHO [25].
The Department of Defense’s (DOD) role in global health includes conducting timely, relevant, and comprehensive health surveillance to promote, maintain, and enhance the health of both the military and associated populations. Tracking disease outbreaks and emergence of new pathogens is an intrinsic component of this effort. Force health protection and readiness, protection of civilian populations, medical stability operations, and partnership engagement are key components to this mandate. Conducting health surveillance that can detect, contain, and prevent impacts of intentional or natural biological events is a critical part of the DOD’s ability to maintain force health while promoting stability and security abroad. To accomplish this, there needs to be a proactive approach to anticipating the geographic and temporal trajectory of infectious disease outbreaks.
Mathematical and statistical models (grouped under the morphological category in this manuscript) are used not only to forecast the spatial-temporal evolution of real world outbreaks, but also to estimate the potential value of mitigation efforts. The latter requires an accurate understanding of both public policy and the behavior of people in novel situations. A further challenge is how existing methods account for delayed reporting and underreporting, and how to use additional data streams to reduce systematic errors (bias) and forecasting uncertainties. The DARPA Chikungunya Challenge addressed this data gap by promoting innovation in data integration techniques.
The DARPA Chikungunya Challenge asked participants to forecast the cumulative total cases (suspected and confirmed, the latter including imported-confirmed) per week per country. A format was selected to inspire innovative approaches and encourage non-traditional participants, forecasting approaches, and data sources to improve overall infectious disease forecasting capabilities. The forecast submissions were evaluated and scored on a weighted basis (Table 1). The forecasts were submitted at various stages of the epidemic progression across the Americas (Fig. 1). The figure provides information on the epidemic progression as PAHO reports during the time of the reporting [26]. Evaluation of methodology was performed by a panel of non-competing government subject matter experts in infectious disease modeling, CHIKV, and other vector-borne diseases.
Table 1.
Deliverable | Due date | Content | Max |
---|---|---|---|
Points | |||
1 | September 1, 2014 | Initial methodology, documentation, and data sources | 5 |
2 | Septebmer 8, 2014 | Forecast for 6-month period (Epidemic week 36-9) | 5 |
3 | October 1, 2014 | Forecast for peak new cases | 10 |
4 | October 1, 2014 | Forecast for 5-month period (Epidemic week 36-9) | 15 |
5 | November 1, 2014 | Forecast for 4-month period (Epidemic week 36-9) | 20 |
6 | December 1, 2014 | Forecast for 3-month period (Epidemic week 36-9) | 15 |
7 | January 1, 2015 | Forecast for 2-month period (Epidemic week 36-9) | 10 |
8 | February 1, 2015 | Forecast for 1-month period (Epidemic week 36-9) | 5 |
9 | February 1, 2015 | Final methodology, documentation, and data sources | 15 |
Maximum total points | 100 |
Accuracy was scored based on the predicted number of cases and spread of CHIKV in the Americas compared to weekly publicly-available PAHO reporting of suspected and confirmed cases. Participants were encouraged to utilize any publicly available data for modeling and forecasting such as climate, clinical surveillance data, genetic information, and social media. Proprietary data were permitted for incorporation into models if obtained independently by participants. Participants were not required to disclose the content of proprietary data but had to include a detailed description of how it was obtained and used in the Challenge methodology deliverables. The methodology reports required sections describing: (1) data sources used, (2) model robustness, (3) applicability, (4) presentation, and (5) computational requirements.
Methods
Summaries of participants’ approaches
DARPA awarded cash prizes to six leading participants, including $150,000 for first place, $100,000 for second place, and $50,000 to each of four honorable mentions. The leading participants used varying methodologies and model types to inform their forecasts. The following are descriptions of their overall approach, methodologies to forecast the spread of chikungunya in the Americas, and a brief summary of their results.
First place submission (henceforth participant 1)
A simple model for the recent outbreaks of chikungunya in the Americas
Modeling Approach: Participant 1 relied on estimating the growth rate G(N) of the outbreak in each country as a function of N, where G=dN/dt,N is a smooth interpolation of the total number of cases reported on the PAHO website, and t is time in weeks. The function G implicitly reflects the combined effects of the meteorological, geographic, human, and vector characteristics that describe vector borne diseases. Participant 1 fitted G to a quadratic or piecewise quadratic function Gf, which describes N as proportional to the number of infected and recovered individuals in an Susceptible-Infectious-Recovered (SIR) model [27]. Participant 1 solved the differential equation dN/dt=Gf(N) and chose parameters in the expression of Gf as to optimize both (C1) (i.e., how well Gf(N) approximates G(N)) and (C2) (i.e., how well N(t), obtained from solving dN/dt=Gf(N), fits the reported cumulative epidemiological curve) [28].
Results: Model parameters were estimated by hand, with the help of a MATLAB graphical user interface, displayed in Fig. 2. The top right plot shows how G(N) (blue solid curve) for the Dominican Republic may be approximated by a quadratic function (inverted parabola in red). Parameter values are set by the sliders on the left. The bottom right plot compares the predicted and observed cumulative epidemiological curves: the red stars are the model predictions obtained by solving dN/dt=Gf(N); the reported data are shown as blue circles. By observing how changes in the model parameters affected these plots, parameter values that best fitted the data for each country were selected. Participant 1 organized the PAHO countries into groups, depending on dengue and CHIKV incidence and on whether a quadratic or piecewise quadratic fit for G was used. Attempts to connect these groups to economic (Gini Coefficient, per capita Gross Domestic Product), demographic (population density and percent of population living in urban areas), connectivity (number of ports, number of port calls, and distance between islands), and health indices (infant mortality and life expectancy) were unsuccessful.
Second place submission (henceforth participant 2)
Predicting the spread of chikungunya using a logistic S-curve
Modeling Approach: Participant 2 used a Bounded Geometric Growth approach (shown by a logistic function or S-curve on Fig. 3) to model CHIKV across the americas. Participant 2 used a macro-enabled Excel workbook to manually fit each curve to the PAHO data for each country.
Results: This approach described the overall dynamics for about half the countries. The results show that the model worked best for countries with higher incidence than for countries with low incidence.
Honorable mention #1 (henceforth participant 3)
Forecasting chikungunya fever
Modeling Approach: Participant 3 implemented three different predictive models for each country, namely the logistic model, the Cauchy model, and an epidemiological SIR model, which were fitted to the smoothed PAHO data. The basic assumption that all predictive models have is that the total cases for each country is a sigmoidal function of time (Fig. 4). The parameters of each model were estimated by regularized weighted non-linear least squares. In detail, the iterative Gauss-Newton algorithm was utilized for the minimization of the error (or cost) function. The weighting procedure assigns more weight to the recent data rather than to the past, modeling the fact that data from the far past contain less information about the future. Furthermore, due to the typical lack of enough data, especially at the early stages of an outbreak, the problem of minimization can be ill determined; therefore, the problem is regularized using Tikhonov (or ridge) regularization [29]. All considered, models had only three parameters to be estimated.
Results: The forecasts were obtained for each country by projecting the estimated predictive model to the future. Confidence intervals were provided for the estimated parameter vector based on the covariance matrix. The computed confidence intervals were able to create upper and lower bonds for the predicted values. Figure 4 shows the three-month forecasts for the USA. Notice that the SIR prediction has the best performance for the USA, but the logistic or Cauchy predictions were found to perform better in other countries.
Honorable mention #2 (henceforth participant 4)
A simple empirical approach to predict the spread of epidemics
Modeling Approach: Participant 4 used an empirical approach to fit the observed incidence provided by PAHO using the least-means squares. For epidemics where there is active transmission in a population, the incidence as a function of time I(t) can be fitted to incidence, I(t)=Atme−nt,where A, m and n are constants and m>0,as depicted in Fig. 5. The cumulative incidence for autochthonous and imported cases for each territory was obtained from the weekly PAHO data and used to derive the weekly incidence for each territory [30]. For simplicity, countries were considered to have either autochthonous transmission or imported cases. The cumulative number of cases was fitted to the incidence function for the model using the weekly incidence data derived from PAHO. Conditions were imposed to allow a solution to be derived. The solutions were found to be optimal when the total cases, and the cases in the last six weeks in predictions from the model were matched with observed data, and transmission was assumed to last no longer than one year [31]. Imported cases were predicted to follow the total infections in the region, and were scaled to the historical proportion of imported cases to total cases for each country.
Results: This simple and robust method provides satisfactory solutions, which may circumvent some of the problems of classical analytic methods for basic epidemics. The method outlined gives a good approximation for short-term forecasting especially with limited data but cannot give probabilistic forecasts nor provide an analytical model that can be refined using more detailed data of transmission, incident cases, and population movement.
Honorable mention #3 (henceforth participant 5)
Forecasting the Spread of Chikungunya Virus using a Coupled SEIR Transmission Model
Modeling Approach: Participant 5 used a stochastic, mechanistic model of transmission dynamics in each locality to forecast chikungunya epidemics for each country and territory in the PAHO data. A susceptible-exposed-infectious-recovered (SEIR) transmission model was developed to describe viral transmission between human and mosquito populations [32]. People in the susceptible class experience a force of infection and become infected at a rate, which depends on the biting rate of mosquitoes (α), the transmission efficiency of the virus from mosquito to humans (β1), and the number of infectious mosquitoes per human (Z/N). The force of infection scales non-linearly with the number of infectious mosquitoes (), where φ1<1. The human force of infection also includes exposed individuals coming into the population from elsewhere at rate ξ, which was represented using a gravity model, with the rate entering the population from another locality dependent on the sizes of each population and inversely proportional to the distance between the two populations [33]. This mechanistic model was implemented in a state-space modeling framework with an imperfect observation process on top of the transmission dynamics and stochasticity in both the infection and observation processes. Model parameter values were estimated and then used to generate weekly forecasts using an iterated filtering method for calculating maximum likelihood estimates implemented in the pomp package in R [34].
Results: The weekly forecasts were calculated as the median of 2000 simulations (Fig. 6). The overall number of cases predicted was fairly accurate, particularly for the one to four month forecasts. The number of country forecasts that were significantly over or underestimated also decreased over time. In addition, five large outbreaks (> 1000 reported cases) were severely (> 50%) underestimated in the five-month forecast.
Honorable mention #4 (henceforth participant 6)
Modeling the chikungunya epidemic in the Americas: Distributional ecology and population dynamics
Modeling Approach: Participant 6 used vector occurrence and climate variables [35] to generate ecological niche models (ENM) for vectors as multidimensional ellipsoid forms enclosing occurrences in a multidimensional environmental space, as described previously [36, 37]. The models depended on two main estimations: (i) rates at with which the virus is transmitted locally, and (ii) rates of importation of infections. To obtain these estimates, four “ingredients” were employed: primary occurrence data for mosquito species, 50-year climate data averages, estimated pairwise city-to-city airline passenger travel rates, and case report data from PAHO [30]. Aedes aegypti and Aedes albopictus occurrences were drawn from Campbell et al. [38]. Principal components analysis (PCA) was applied to the original climate variables to reduce their number and correlation [35]; the first three components (which explained 84.9% of the overall variance) were used as axes to define the multidimensional environmental space (NicheA 3.0 [39]). To identify areas with environmental conditions ideal for transmission [40–44], Participant 6 divided the ellipsoid for each vector into 100 layers summarizing proximity to the niche centroid to identify areas close to or far from the ENM centroid. Thus, areas close to the niche centroid (i.e., areas ideal for transmission) were identified as potential transmission hotspots (Fig. 7).
Results: Participant 6 found that most countries showed a dramatic pattern of intensive reporting in early weeks of the epidemic, followed by reduced reporting in later stages. This phenomenon was termed “surveillance fatigue” to refer to the reduction of collection, reporting, and publication of epidemiological data after explosive and sustained disease outbreak events. These models support the idea of higher incidences than those reported during late surveillance, suggesting that reduced reported rates may be driven by reduction in effort rather than a dramatic pause on local transmission. Countries closest to the centroid of vectors’ niches showed higher CHIKV prevalence. Fore a complete description of the model and methodology please refer to [45].
Results
Reported PAHO data
The distribution of chikungunya cases across the 50 participating PAHO countries, at three times during the Challenge is shown in Fig. 8, to complement the weekly incidences shown in Fig. 1. An interactive version of this map, showing the CHIKV epidemic progression across the Western Hemisphere is available at the website: http://bsvgateway.org/chikv/ (courtesy and copyright, LANL). PAHO groups countries based on their geographic location into the following regions: North America (Bermuda, Canada, Mexico, USA); Central America (Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua and Panama); Latin Caribbean (Cuba, Dominican Republic, French Guinea, Guadaloupe, Haiti, Martinique, Puerto Rico, Saint Barthelemy and Saint Martin (French Part); Andean Area (Bolivia, Colombia, Ecuador, Peru and Venezuela); South Zone (Argentina, Brazil, Chile, Paraguay and Uruguay) and the Non-Latin Caribbean countries (Anguilla, Antigua and Barbuda, Aruba, Bahamas, Cayman Islands, Curacao, Dominica, Grenada, Guyana, Jamaica, Montserrat, Saint Kitts and Nevis Saint Lucia, Saint Vincent and Grenadines, Saint Martin (Dutch part), Suriname, Trinidad and Tobago, Turks and Caicos, US Virgin Islands and UK Virgin Islands).
By week 36 of 2014 (corresponding to the week of September 6, 2014), at the beginning of the Challenge, 651,344 suspected cases were reported to PAHO, mostly in the Latin Caribbean region, with 8210 confirmed cases. The United States reported 762 imported cases. By week 48 of 2014, the epidemic was largely over in the Latin Caribbean region, but was peaking in Central America and the Andean region, with the total number of suspected at 914,960 and 15,906 confirmed cases. By the end of the Challenge in week 8 of 2015 (corresponding to the week of February 22, 2015), 1,247,359 cases had been reported to PAHO, of which 24,982 cases were confirmed. The epidemic had largely ended in the Latin Caribbean with a reported incidence of 2.2%, had subsided for the year in Central America with a reported incidence of 0.4%, and was still near a broad peak in the Andean area, with a reported incidence of 0.16%.
The 20 most-affected countries accounted for 98% of all reported chikungunya cases. The Dominican Republic reported the most cases, followed by El Salvador and Colombia. Both delayed and sporadic reporting were evident in the reported data, which should be kept in mind when this information is used to derive predictions of future epidemics. Accuracy and timeliness of the reported number of new cases may depend on the socio-economic structure, health care infrastructure, economic strength, and other factors.
We focused our discussion on a subset of the 50 PAHO countries with more complete data that allowed us to cross-check with alternative reports. The countries chosen represent the spectrum of variability associated with geography, socio-economic strata, population, weather and other parameters. Specifically, we analyzed Guadeloupe, Martinique, Dominican Republic, Haiti, United States, Mexico, El Salvador, Guatemala, Colombia, and Venezuela. Below, we present an analysis of solver entries for these countries. We chose to highlight different solver entries, including some that did not rank among the top 6, in the analysis presented in the manuscript. The reason being that certain submissions were more suitable for demonstration of a particular concept, and certain methodologies required attention, even though the entries did not rank among the top 6 solvers.
Choice of models
To better understand the participant submissions, it is important to define and describe the general modeling approaches used by top participants. Classification of participant-submitted models was challenging, as participants typically used hybrid models that combined aspects of different approaches. For the purpose of this manuscript, and ensuing discussion, we have categorized the models submitted by all participants (not just the winning ones) into three broad categories: morphological models, mechanistic models, and subject matter expert models (SME). Morphological models represent a curve-fitting approach, wherein the curves can be defined analytically or via a set of differential equations. The curves are fitted independently to each outbreak and/or derived from an entirely different outbreak (e.g., dengue), suitably scaled and translated (solvers 1-4 in this manuscript). Mechanistic models attempt to capture the dynamic interplay of outbreaks in multiple countries and/or describe a dynamic interplay in the host (humans) and vectors (mosquitoes)(solvers 5 and 6 in this manuscript). The SME-based model (i.e., participant defined subject matter experts), utilized by only one participant (who did not rank in the top 6, not discussed in this manuscript), required consensus subjective opinion of various experts in the field, and did not require any type of computation to generate a prediction. This approach relied exclusively on expert judgment as an alternate to explicit modeling, leveraging the collective expertise to maximize forecast accuracy and simultaneously minimizing the number and strength of assumptions made. It is worth noting that this approach has been traditionally used by public health practitioners in the absence of models to inform their decisions. As expected from their descriptions, the model types overlap with each other in many cases. For example, many participants used subject matter expertise to inform mechanistic and morphological models.
Data sources for effective predictions of Chikungunya
Participants typically used several data sources to complement the information provided by PAHO. It is important to note that not all of these data sources were utilized to derive the predictions made in the final submissions. These data types included online web searches (e.g., Wikipedia, Google searches, government websites), climate information (e.g., temperature and humidity), vector-specific information (e.g., reporting of other mosquito-borne illnesses such as dengue in the same population, mosquito dynamics, ecology) and others (Table 2). Figure 9a represents the effect of the number of data sources used on the accuracy of prediction, as differentiated by the main categories of models defined elsewhere, for the top 10 participants of the Challenge. Participants with higher accuracy (i.e., 3, 4, 1, and 2) used anywhere between 1-8 data sources. However, not all data sources were considered or included in deriving the final prediction. Interestingly, all four of these top ranking participants used a morphological approach to arrive at their prediction.
Table 2.
Solver # | PAHO | Online/ | Population | Climate | Transportation | Economic | Vector | Dengue |
---|---|---|---|---|---|---|---|---|
News | Index | |||||||
1 | ⋆ | ⋆ | ⋆ | ⋆ | ⋆ | |||
2 | ⋆ | ⋆⋆⋆ | ⋆⋆⋆ | ⋆ | ||||
3 | ⋆ | |||||||
4 | ⋆ | ⋆ | ||||||
5 | ⋆ | ⋆⋆ | ⋆⋆⋆ | ⋆ | ⋆⋆ | ⋆ | ||
6 | ⋆ | ⋆ | ⋆ | ⋆ | ⋆⋆ |
There is no significant correlation between the number of data sources used and the accuracy of the forecasts, irrespective of the type of the model being utilized. In short, more data does not necessarily translate into better forecasts. The most important thing was to get the right kind of data, and to use the data appropriately. A regression analysis relating forecast accuracy to the types of data sources used by each participant (Figure 9b) showed that some data streams, such as those related to dengue epidemiology or mosquito dynamics, are used in models that have smaller forecasting errors. Conversely, models that exploit demographics and transportation data, have worse forecast accuracy than models that do not use them. Online searches correlated positively with accurate outcomes, although the specificity of this data-stream is difficult to define because of the wide variety of information types that can be tapped through the Internet. Arguably, the explanation is that Internet searches are used to validate, and sometimes, correct other data streams. In summary, not all data sources lead to improved forecasting accuracy. However, models that leverage specific data sources to substantiate missing links in surveillance data (e.g., dengue epidemiology data) or help improve data quality (e.g., Internet searches), typically have more accurate forecasts.
Predicting the peak of the epidemic
Although the peak of an outbreak is one of the most significant features of an epidemic, it was relatively difficult for the solvers to predict. We analyzed the peak predictions provided by the top 11 participants for the 20 hardest-hit countries. As mentioned earlier, by the time the first prediction was submitted, the epidemic had ended in the Latin Caribbean countries, and was just getting started in Central America and the Andean region. Since participants were not allowed to “back-cast” (i.e., predict in the past), the best choice was to select week 40 as the peak week, as a consequence of the challenge design. Figure 10 shows the peak predictions for a subset of countries. Only some of all 36 participants were able to accurately predict the exact week of the peak, and only in a few countries. The peak week as reported by PAHO clearly varies from participant submissions. A statistical analysis of the predicted peaks indicates that some participants showed very little variation (i.e., predictions were extremely conservative, and showed very little variability) in the predictions provided for all countries considered here (e.g., participants 1 and 4), whereas others showed more variation (e.g., participant 3) (data not shown). Indeed, the standard deviation for the PAHO data was larger due to the fact that the peak for these countries was spread out starting from week 8.5 for Saint Barthelemy to week 55 for Guyana (data not shown).
Discussion
The ability to go beyond health surveillance and provide timely predictions of disease spread to mitigate disease outbreaks is a capability gap in global health. The DARPA Chikungunya Challenge (also referred to as the Challenge) attempted to address this gap by promoting innovation in data collection techniques and infectious disease modeling and prediction. The Challenge also aimed to identify and characterize methodologies, data streams, and approaches beyond the traditional winners that demonstrate critical value or lack thereof in predicting CHIKV outbreaks, with the intention of developing an integral multi-aspect forecasting system for future use.
It is a health security imperative to detect, contain, and prevent impacts of intentional or natural biological events. In order to accomplish this, proactive anticipation of the trajectory of infectious diseases outbreaks is required for public health planning. The results from this Challenge may inform future efforts in response to Zika outbreaks, or that associated with existing vector-borne diseases like dengue.
Although most participants utilized multiple data streams, the use of a large number of data streams did not necessarily improve the accuracy of the predictions. It was the choice of the data streams, and how they were utilized that enabled successful predictions. Participants that used alternative data streams to understand gaps and limitations in the available data were better able to predict the epidemic. Mosquito-dynamics, population specific information, and dengue-specific information correlated best with prediction accuracy.
Conclusion
The results of this Challenge highlighted the fact that with careful consideration and understanding of the relative advantages and disadvantages of particular methods, implementation of an effective prediction system is feasible. Indeed, the ability of a model to forecast the reported data may not always translate into the ability of a model to forecast the epidemic. Furthermore, it may be of critical importance to also capture emergent behavior and mitigation strategies implemented in response to a deadly epidemic, which may require the use of more complex modeling approaches.
Improved data reporting might not always be possible, as this depends on the socio-economic and cultural framework of participating countries. However, uniform application of case definitions, reporting of geographic and demographic subsets of people, and reporting of dates of disease onset, rather than date of report may improve the overall usability of the reported data. Also, qualification of data with parallel epidemics (e.g., dengue, in this case) that rely on the same climactic factors and vector dynamics can significantly improve predictions. It is important for predictions to be judged against reliable reported data, such as a controlled test-bed, wherein the evaluation of different models and methodologies can be performed accurately and the value of various strategies clearly delineated. These findings, and further efforts to understand reported data and integrate multiple surveillance systems, could improve both the quality and quality of reporting and the associated response to an outbreak, making the dream of an effective infectious disease forecasting architecture a reality.
Acknowledgements
The authors thank the Defense Advanced Research Projects Agency (DARPA) and in particular COL Matthew Hepburn, MD (Program Manager, BTO), Dr. Anne Cheever (Associate/Lead Scientist at Booz Allen Hamilton and Technical Advisor to DARPA) and Dr. David Fang (Sr. Lead Technologist at Booz Allen Hamilton and Technical Advisor to DARPA) for the design and execution of the Challenge, technical direction, and advise. In addition, we would like to thank the Pan American Health Organization (PAHO) for providing the data required for the Challenge, and subsequent participation in discussions and deliberations. Many thanks are due to the volunteers that allowed for effective judging, and to the team of Subject Matter Experts that facilitated the design and execution of the Challenge. The LANL authors would like to thank Jonas Lukasczyk, a summer student who made the Chikunguya visualization shown in Fig. 8.
Funding
The LANL authors would like to thank DARPA for supporting the analysis of the Challenge, as well as for providing administrative assistance during the DARPA Chikungunya Challenge workshop. The challenge and the independent analysis performed by LANL were supported by DARPA. LANL is operated by Los Alamos National Security, LLC for the Department of Energy under contract DE-AC52-06NA25396.
Availability of data and materials
As this study involved the description of existing studies, all data supporting the described models can be obtained by contacting the respective participants.
Authors’ contributions
SYD, BM, NWH, and HM drafted the manuscript, independently analyzed the outcomes of the challenge, and assisted in evaluation of challenge entries. JA and RH contributed to the challenge design, evaluation of outcomes, and contributed to the manuscript. JCL, HEB (aka Participant 1), MEL (aka Participant 2), YP (aka Participant 3), DJR (aka Participant 4), SM (aka Participant 5), ATP, LEE, and HQ (aka Participant 6) participated in the challenge and provided summaries of their entries for this manuscript. All authors read, edited, and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Sara Y. Del Valle, Email: sdelvall@lanl.gov
Benjamin H. McMahon, Email: mcmahon@lanl.gov
Jason Asher, Email: Jason.Asher@hhs.gov.
Richard Hatchett, Email: Richard.Hatchett@hhs.gov.
Joceline C. Lega, Email: lega@math.arizona.edu
Heidi E. Brown, heidibrown@email.arizona.edu
Mark E. Leany, Email: professor.leany@gmail.com
Yannis Pantazis, Email: yannis.pantazis@gmail.com.
David J. Roberts, Email: david.roberts@ndcls.ox.ac.uk
Sean Moore, Email: mooresea@gmail.com.
A Townsend Peterson, Email: town@ku.edu.
Luis E. Escobar, Email: lescobar@umn.edu
Huijie Qiao, Email: qiaohj@ioz.ac.cn.
Nicholas W. Hengartner, Email: nickh@lanl.gov
Harshini Mukundan, Email: harshini@lanl.gov.
References
- 1.Hamer WH. The Milroy lectures on epidemic disease in England –The evidence of variability and of persistency of type. The Lancet. 1906;167:665–662. [Google Scholar]
- 2.Kermack WO, McKendrick AG. A contribution to the Mathematical Theory of Epidemics. In: Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, vol. 115. No. 772: 1927. p. 700–21.
- 3.Anderson R, May R. Population biology of infectious diseases: Part I. Nature. 1979;280:361. doi: 10.1038/280361a0. [DOI] [PubMed] [Google Scholar]
- 4.Anderson RM, May RM, Boily M, Garnett G, Rowley J, May R. The spread of HIV-1 in Africa: sexual contact patterns and the predicted demographic impact of AIDS. Nature. 1991;352(6336):581–9. doi: 10.1038/352581a0. [DOI] [PubMed] [Google Scholar]
- 5.Shaman J, Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci U S A. 2012;109(50):20425–30. doi: 10.1073/pnas.1208772109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tizzoni M, Bajardi P, Poletto C, Ramasco JJ, Balcan D, Gonçalves B, Perra N, Colizza V, Vespignani A. Real-time numerical forecast of global epidemic spreading: case study of 2009 A/H1N1pdm. BMC Med. 2012;10(1):1. doi: 10.1186/1741-7015-10-165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014;10(11):1003892. doi: 10.1371/journal.pcbi.1003892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible modeling of epidemics with an empirical Bayes framework. PLoS Comput Biol. 2015;11(8):1004382. doi: 10.1371/journal.pcbi.1004382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cretien J-P, Riley S, George DB. Mathematical modeling of the West Africa ebola epidemic. eLIFE. 2015;4:09186. doi: 10.7554/eLife.09186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Moghadas SM, Pizzi NJ. Wu J, Yan P. Managing public health crises: the role of models in pandemic preparedness. Influenza Other Respir Viruses. 2009;3(2):75–79. doi: 10.1111/j.1750-2659.2009.00081.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Colizza V, Barrat A, Barthelemy M, Valleron A-J. Vespignani A. Modeling the worldwide spread of pandemic influenza: Baseline case and containment interventions. PLoS Med. 2007;4(1):13. doi: 10.1371/journal.pmed.0040013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chan J, Holmes A, Rabadan R. Network analysis of global influenza spread. PLoS Comput Biol. 2010;6(11):1001005. doi: 10.1371/journal.pcbi.1001005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Meltzer MI, Atkins CY, Santibanez S, Knust B, Petersen BW, Ervin ED, Nichol ST. Damon IK, Washington ML, et al. Estimating the future number of cases in the ebola epidemic – Liberia and Sierra Leone, 2014-2015. MMWR Surveill Summ. 2014;63(Suppl 3):1–14. [PubMed] [Google Scholar]
- 14.Bellan SE, Pulliam JR, Pearson CA, Champredon D, Fox SJ, Skrip L, Galvani AP, Galvani M, Gambhir M, Lopman BA, Porco TC, Meyers LA, Dusho J. Statistical power and validity of Ebola vaccine trials in Sierra Leone: A simulation study of trial design and analysis. Lancet Infect Dis. 2015;15(6):703–10. doi: 10.1016/S1473-3099(15)70139-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kucharski AJ, Eggo RM, Watson C, Camacho A, Funk S, Edmunds WJ. Effectiveness of ring vaccination as control strategy for Ebola virus disease. Emerg Infect Dis. 2016;22(1):105–8. doi: 10.3201/eid2201.151410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.White House: Office of Science and Technology Policy (OSTP): National Strategy for Biosurveillance. https://obamawhitehouse.archives.gov/sites/default/files/National_Strategy_for_Biosurveillance_July_2012.pdf. Accessed 23 Jan 2017.
- 17.Biggerstaff M, Alper D, Dredze M, Fox S, Fung IC, Hickmann KS, Lewis B, Rosenfeld R, Shaman J, Tsou MH, Velardi P, Vespignani A, Finelli L. Results from the centers for disease control and prevention’s predict the 2013-2014 influenza season challenge. BMC Infect Dis. 2016; 16:357. 10.1186/s12879-016-1669-x. [DOI] [PMC free article] [PubMed]
- 18.Center for Disease Control and Prevention: Epidemic Prediction Initiative. https://predict.phiresearchlab.org Accessed 05 Feb 2018.
- 19.CDC: Epidemic Prediction Initiative. https://predict.phiresearchlab.org/legacy/dengue/index.html Accessed 23 Jan 2017.
- 20.DARPA: DARPA Forecasting Chikungunya Challenge. https://www.innocentive.com/ar/challenge/9933617 Accessed 15 Aug 2014.
- 21.Moran KR, Fairchild G, Generous N, Hickmann K, Osthus D, Priedhorsky R, Hyman J, Del Valle SY. Epidemic forecasting is messier than weather forecasting: The role of human behavior and Internet data streams in epidemic forecast. J Infect Dis. 2016;214(suppl 4):404–8. doi: 10.1093/infdis/jiw375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Staples JE, Breiman RF, Powers AM. Chikungunya fever: An epidemiological review of a re-emerging infectious disease. Clin Inf Dis. 2009;49(6):942–8. doi: 10.1086/605496. [DOI] [PubMed] [Google Scholar]
- 23.World Health Organization (WHO): Chikungunya. http://www.who.int/denguecontrol/arbo-viral/other_arboviral_chikungunya/en/ Accessed 09 Mar 2016.
- 24.World Health Organization (WHO): Emergency Preparedness and Response: Chikungunya in the French Part of the Caribbean Isle of Saint Martin. http://www.who.int/csr/don/2013_12_10a/en/ Accessed 09 Mar 2016.
- 25.World Health Organization (WHO) Collaborating Centres: Global Database. http://apps.who.int/whocc/ Accessed 09 Mar 2016.
- 26.Pan American Health Organization (PAHO): Chikungunya. http://www.paho.org/chikungunya Accessed 09 Mar 2016.
- 27.Hethcote HW. The mathematics of infectious diseases. SIAM Rev. 2000;42(4):599–653. doi: 10.1137/S0036144500371907. [DOI] [Google Scholar]
- 28.Lega J, Brown HE. Data-driven outbreak forecasting with a simple nonlinear growth model. Epidemics. 2016;17:19–26. doi: 10.1016/j.epidem.2016.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. New York: Elsevier Sci; 1987. [Google Scholar]
- 30.Pan American Health Organization (PAHO): Chikungunya Incidence Data. http://www.paho.org/hq/index.php?option=com_topics%26view=readall%26cid=5927%26Itemid=40931%26lang=en. Accessed 09 Mar 2016.
- 31.Yakob L, Clements AC. A mathematical model of Chikungunya dynamics and control: The major epidemic on Reunion Island. PLoS ONE. 2013;8(3):57448. doi: 10.1371/journal.pone.0057448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Keeling MJ, Rohani P. Modeling Infectious Diseases in Humans and Animals. NJ: Princeton University Press; 2008. [Google Scholar]
- 33.Xia Y, Bjørnstad ON, Grenfell BT. Measles metapopulation dynamics: A gravity model for epidemiological coupling and dynamics. Am Nat. 2004;164(2):267–81. doi: 10.1086/422341. [DOI] [PubMed] [Google Scholar]
- 34.Ionides E, Bretó C. King A. Inference for nonlinear dynamical systems. Proc Natl Acad Sci U S A. 2006;103(49):18438–43. doi: 10.1073/pnas.0603181103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol. 2005;25(15):1965–78. doi: 10.1002/joc.1276. [DOI] [Google Scholar]
- 36.Soberón J, Nakamura M. Niches and distributional areas: Concepts, methods, and assumptions. Proc Natl Acad Sci U S A. 2009;106(Supplement 2):19644–50. doi: 10.1073/pnas.0901637106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Van Aelst S, Rousseeuw P. Minimum volume ellipsoid. Wiley Interdiscip Rev Comput Stat. 2009;1(1):71–82. doi: 10.1002/wics.19. [DOI] [Google Scholar]
- 38.Campbell LP, Luther C. Moo-Llanes D, Ramsey JM, Danis-Lozano R, Peterson AT. Climate change influences on global distributions of dengue and Chikungunya virus vectors. Philos Trans R Soc Lond B Biol Sci. 2015;370(1665):20140135. doi: 10.1098/rstb.2014.0135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Qiao H, Peterson AT, Campbell LP, Soberón J, Ji L, Escobar LE. NicheA: Creating virtual species and ecological niches in multivariate environmental scenarios. Ecography. 2016;39:805–13. doi: 10.1111/ecog.01961. [DOI] [Google Scholar]
- 40.Martínez-Meyer E, Díaz-Porras D, Peterson AT, Yáñez-Arenas C. Ecological niche structure and rangewide abundance patterns of species. Biol Lett. 2013;9(1):20120637. doi: 10.1098/rsbl.2012.0637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yáñez-Arenas C, Peterson AT, Mokondoko P, Rojas-Soto O, Martínez-Meyer E. The use of ecological niche modeling to infer potential risk areas of snakebite in the Mexican state of Veracruz. PLoS ONE. 2014;9(6):100957. doi: 10.1371/journal.pone.0100957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Holt RD. Bringing the Hutchinsonian niche into the 21st century: Ecological and evolutionary perspectives. Proc Natl Acad Sci U S A. 2009;106(Supplement 2):19659–65. doi: 10.1073/pnas.0905137106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Manthey JD, Campbell LP, Saupe EE, Soberón J, Hensz CM, Myers CE, Owens HL Ingenlo, Peterson AT, Barve N, et al. A test of niche centrality as a determinant of population trends and conservation status in threatened and endangered North American birds. Endanger Species Res. 2015;26(3):201–8. doi: 10.3354/esr00646. [DOI] [Google Scholar]
- 44.Lira-Noriega A, Manthey JD. Relationship of genetic diversity and niche centrality: A survey and analysis. Evolution. 2014;68(4):1082–93. doi: 10.1111/evo.12343. [DOI] [PubMed] [Google Scholar]
- 45.Romero-Alvarez D, Peterson AT, Escobar LE. Surveillance fatigue (fatigatio vigilantiae) during epidemics. Rev Chil Infectología. 2017;34:289–292. doi: 10.4067/S0716-10182017000300015. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
As this study involved the description of existing studies, all data supporting the described models can be obtained by contacting the respective participants.