The COVID-19 pandemic brought mathematical modelling into the spotlight, as scientists rushed to use data to understand transmission patterns and disease severity, and to anticipate future epidemic outcomes. However, the use of COVID-19 modelling has been criticised, in part because of a few particularly erroneous projections at the start of the pandemic.1 More than 2 years into the pandemic, models continue to face serious obstacles as tools for informing outbreak response.1 Population-level health outcomes are difficult to predict accurately, especially cases and hospitalisations,2 as discussed in the International Institute of Forecasters blog. This Comment, drawn from our experiences with real-time prospective COVID-19 modelling, details these obstacles. We aim to highlight areas where further research and investment can improve the use of models for informing outbreak responses in the USA, with a summary of recommendations in the Panel .
Panel. Summary of recommendations.
-
1Invest in infrastructure for data collection
-
•Prioritise collection of timely high temporal and spatial resolution data
-
•Standardise reporting of data across jurisdictions (eg, US states)
-
•Pursue high-quality data that captures risk-reduction behaviours
-
•Expand genomic surveillance
-
•
-
2Prioritise translational work
-
•Adopt Pollett and colleagues’ EPIFORGE guidelines to improve model transparency
-
•Document and share experiences on translational work and lessons learned
-
•Control public messaging around research
-
•Consider more interpretable targets and ways to express uncertainty
-
•Adopt incentive structures in academia to reward translational work
-
•
-
3Build an information-sharing ecosystem that is better suited to the needs of outbreaks
-
•Strike a balance between speed and quality of publications
-
•Implement safeguards to prevent misuse of research by the public
-
•Create an organised, centralised home for epidemic research
-
•
Data quality is one of the most important drivers of model performance. If data are inconsistent or do not reflect reality, models have no reliable ground truth from which to learn or be evaluated. Unfortunately, the public health infrastructure in the USA was not equipped to provide timely, high-quality data on COVID-19 health outcomes, and required several disparate efforts to fill this need.3 However, inherent flaws remain in the COVID-19 data reporting system. For example, decision making on how to collect and share COVID-19 data fell to individual US states. Each US state has its own reporting idiosyncrasies (eg, defining what counts as a COVID-19 case or death, whether this definition includes probable cases or deaths, and how to define a probable case or death), limiting comparative analyses across locations. Additionally, artificial spikes or drops in the reported numbers of COVID-19 cases and deaths, which can result from backlogged testing results released from resource-constrained laboratories or batch death certificate reviews conducted by states, occur frequently and with irregular pattern, and affect both the training and evaluation of models that rely on the data. Other COVID-19 data, such as vaccinations, testing, hospitalisations, and genomic surveillance, have their own quality issues, largely because of an inadequate data reporting infrastructure, absence of universal data standards, and sampling bias.3
In addition to data on health outcomes, many modellers have relied on human behavioural data for COVID-19 forecasting and scenario analysis with the aim to predict transmission patterns more accurately, in particular at points when dynamics are rapidly changing. However, it is difficult to collect real-time behavioural data because human behaviour is inherently hard to track. Some COVID-19 risk-reduction behaviours were captured through surveys administered on Facebook, which represents a substantial step forward in collecting open and timely behavioural data; however, these data still have sampling and self-reporting bias, and data collection ended in June 25, 2022.4
New variants have also played a considerable role in surges in the number of COVID-19 ases and deaths worldwide. To this end, increased genomic surveillance has the potential to inform and improve predictions. As of Dec 31, 2021, only 5% of cases in the USA are sequenced, compared with more than 50% in other countries, including the UK, Iceland, and Australia.5 To give modellers the best chance of success, we need to invest in a data system that provides open, timely, and standardised data at a high spatial and temporal resolution.
Because of the uncertainty and fear surrounding this unprecedented outbreak, the scale of which has not been witnessed before, modelling results were sensationalised by the media and skewed to serve predetermined political purposes. Given that the misunderstanding of scientific findings can have serious consequences, modellers have a responsibility to facilitate appropriate interpretation of their work. Modellers must be explicit in stating how assumptions and limitations should shape interpretation, and conduct transparent reporting as outlined in Pollett and colleagues’ EPIFORGE guidelines.6 Additionally, modellers should be trained to communicate directly with the media to better explain the science and to help manage the corresponding public health messaging.
Models can also guide public health policy. To inform decision makers, the best approach is often direct collaboration with modellers. These mutually beneficial relationships allow modellers to better understand the needs of decision makers and help all stakeholders to better understand the details and limitations of epidemic and pandemic modelling. In addition, documenting the process of sharing models with decision makers is crucial to advance knowledge of best practices for science translation.7
One aspect of modelling that could be redesigned for easier interpretation and use by various stakeholders and the public is the selection of prediction targets. These targets have predominantly been the numbers of incident cases and deaths, despite poor forecast performance for these data during crucial moments for decision making, as discussed in the Forecasters blog. Simpler and more interpretable targets that still convey useful information should be considered as alternatives. One example is a categorical target that predicts if any indicators (eg, cases, deaths, or hospitalisations) in a future period will be in a state of rapid growth, moderate growth, no change, moderate decline, or rapid decline. Predicting a broader range of targets, especially if some targets allow increased forecast accuracy and reliability, could enhance public trust in modelling and better meet the needs of stakeholders.
Another crucial aspect of model translation is communicating the range of plausible outcomes instead of point predictions only. Modellers should clearly communicate uncertainty and translate statistical concepts into formats that are interpretable by stakeholders and the public. For example, the 50% and 95% CIs shown on the COVID-19 Forecast Hub often include both upward and downward trends. Without additional explanation, these confidence intervals can be difficult to interpret. One alternative might be for modellers to provide the percentage chance that the trend will be increasing, flat, or decreasing. Clearer communication of uncertainty can build trust in modelling and prevent misuse of models.
An important barrier to successful translation of models is the current state of research dissemination. Within 10 months of the first confirmed case, 125 000 COVID-19 scientific articles were shared, with 30 000 of them on preprint servers.8 Preprints excel at quickly sharing new research but do not have the quality assurance traditionally provided by peer review. Additionally, there is evidence that preprints can be misused in harmful ways to spread extremist ideologies and misleading medical information.8 Some of these harms might be mitigated by more transparent reporting on the limitations and proper interpretations of models.6 However, even within the scientific community, the sheer volume of information obstructs efficient synthesis of the literature to establish best practices.9 Efforts to address some of these problems exist, such as recruiting researchers to conduct rapid and publicly available reviews of papers.10 Nevertheless, these disparate efforts (including informal reviews on social media) still leave information scattered and difficult to synthesise. We need to strike a balance between publishing speed and quality, implement safeguards to prevent research from being misused, and develop a more organised, centralised way to vet and disseminate timely information.
Although COVID-19 forecasting and public health responses have been heavily dependent on partnerships with academic research teams, university-based modellers face considerable barriers when choosing to engage in crucial, but time-consuming, translational work—eg, building, maintaining, and communicating modelling results. Extant incentive structures do not recognise these efforts, and instead reward traditional forms of academic achievement (eg, peer-reviewed publications and secured grant funding). The value of this type of translational work needs to be recognised and elevated to continue the academic community's engagement in real-time outbreak mitigation and maximise its impact. Establishing prestigious awards for outstanding work of this kind and encouraging journals to focus on effective messaging during times of crisis could encourage more publications to focus on these essential efforts, and more universities to recognise and reward academics accordingly.
For more on the translating data in a pandemic Series see www.thelancet.com/series/translating-data-in-a-pandemic
ECL received payment for expert testimony from Cohen Ziffer Frenchman & McKenna for a report related to COVID-19 epidemiology. KN, SJ, and LG submitted a model to the COVID-19 Forecast Hub. NGR is a coauthor on EPIFORGE 2020 model reporting guidelines, a codirector of the Forecast Hub, has submitted individual models to the Forecast Hub, and served in an advisory role for the US Scenario Modeling Hub. ST is a cofounder and member of leadership team for the US Scenario Modeling Hub and has submitted individual models to both the Scenario and Forecast Hub. ECL has submitted models to both the Forecast and Scenario Hub. KN, SJ, MM, and LG were funded by the National Science Foundation (NSF) Rapid Response Research grants (2108526 and 2028604) and the Centers for Disease Control and Prevention (CDC) SHEPheRD Project (200-2016-91781). NGR has been supported by the CDC (1U01IP001122) and the National Institutes of General Medical Sciences (NIGMS; R35GM119582). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIGMS or the National Institutes of Health. KG and FP were funded by the Society for Medical Decision Making Covid Modeling Accelerator and the CDC (U01CK000589). ST has been supported by the NSF (2127976) and the CDC SHEPheRD Project (200-2016-91781). The funders of the study had no role in the conceptualisation or writing of the Comment.
References
- 1.James LP, Salomon JA, Buckee CO, Menzies NA. The use and misuse of mathematical modeling for infectious disease policymaking: lessons for the COVID-19 pandemic. Medical Decision Making. 2021;41:379–385. doi: 10.1177/0272989X21990391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cramer EY, Ray EL, Lopez VK, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci U S A. 2022;119 doi: 10.1073/pnas.2113561119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gardner L, Ratcliff J, Dong E, Katz A. A need for open public data standards and sharing in light of COVID-19. Lancet Infect Dis. 2021;21:e80. doi: 10.1016/S1473-3099(20)30635-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Salomon JA, Reinhart A, Bilinski A, et al. The US COVID-19 trends and impact survey: continuous real-time measurement of COVID-19 symptoms, risks, protective behaviors, testing, and vaccination. Proc Natl Acad Sci USA. 2021;118 doi: 10.1073/pnas.2111454118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen Z, Azman AS, Chen X, et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet. 2022;54:499–507. doi: 10.1038/s41588-022-01033-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pollett S, Johansson MA, Reich NG, et al. Recommended reporting items for epidemic forecasting and prediction research: The EPIFORGE 2020 guidelines. PLoS Med. 2021;18 doi: 10.1371/journal.pmed.1003793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Meredith HR, Arehart E, Grantz KH, et al. Coordinated strategy for a model-based decision support tool for coronavirus disease, Utah, USA. Emerg Infect Dis. 2021;27 doi: 10.3201/eid2705.203075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fraser N, Brierley L, Dey G, et al. The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biology. 2021;19 doi: 10.1371/journal.pbio.3000959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nixon K, Jindal S, Parker F, et al. An evaluation of prospective COVID-19 modeling: from data to science translation. medRxiv. 2022 doi: 10.1101/2022.04.18.22273992. published online April 19. (preprint). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Besançon L, Peiffer-Smadja N, Segalas C, et al. Open science saves lives: lessons from the COVID-19 pandemic. BMC Med Res Methodol. 2021;21:117. doi: 10.1186/s12874-021-01304-y. [DOI] [PMC free article] [PubMed] [Google Scholar]