Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
letter
. 2020 May 28;21(1):27–28. doi: 10.1016/S1473-3099(20)30437-0

COVID-19 and the difficulty of inferring epidemiological parameters from clinical data

Simon N Wood b, Ernst C Wit a, Matteo Fasiolo b, Peter J Green b
PMCID: PMC7255708  PMID: 32473661

Knowing the infection fatality ratio (IFR) is crucial for epidemic management: for immediate planning, for balancing the life-years saved against those lost to the consequences of management, and for considering the ethics of paying substantially more to save a life-year from the epidemic than from other diseases. Impressively, Robert Verity and colleagues1 rapidly assembled case data and used statistical modelling to infer the IFR for COVID-19. We have attempted an in-depth statistical review of their paper, eschewing statistical nit-picking, but attempting to identify the extent to which the (necessarily compromised) data are more informative about the IFR than are the modelling assumptions. First, the data.

Individual-level data for outside China appear problematic because different countries have differing levels of ascertainment and different disease-severity thresholds, even for classification as a case. The data's use in IFR estimation would require model-ascertainment parameters that are country specific, about which we have no information. Consequently, these data provide no useful information on the IFR.

Repatriation flight data provide the sole information on the prevalence in Wuhan, China (excepting the lower bound of confirmed cases). 689 foreign nationals who were eligible for repatriation are doubtfully representative of the susceptible population of Wuhan. Hence, seeing how to usefully incorporate the six positive cases from this sample is difficult.

Case mortality data from China provide an upper bound for the IFR, and with extra assumptions these data also supply information on how the IFR varies with age. Because prevalence is unknown, the data contain no information for estimating the absolute IFR magnitude.

Because of extensive testing, the outbreak on the Diamond Princess, the quarantined cruise liner (used only for validation by Verity and colleagues) supplies data on infections and symptomatic cases, with fewer ascertainment problems. These data appear directly informative about the IFR, although the comorbidity load on the Diamond Princess is unlikely to fully represent any population of serious interest (perhaps having fewer individuals with very severe but more with mild comorbidities).

Second, the modelling assumptions, in which we see two primary problems. The first problem is that Verity and colleagues correct the Chinese case data by assuming that ascertainment differences across age groups determine case rate differences. Outside of Wuhan, the authors replace observed case data by the cases that would have occurred if each age group had the same per-capita observed case rate as the 50–59 years age group. The authors assume complete ascertainment for this age group. These are very strong modelling assumptions that will greatly affect the results, but the published uncertainty bounds reflect no uncertainty about these assumptions. In Wuhan, the complete ascertainment assumption is relaxed by introducing a parameter, but one for which the data appear uninformative, so the results will be driven by the assumed uncertainty.

The second problem is that, generically, Bayesian models describe uncertainty both in the data and in prior beliefs about the studied system. Only when data are informative about the targets of modelling can we be sure that prior beliefs play a small role in what the model tells us about the world. In this case, the data are especially uninformative: we suspect results are mostly the consequence of what our prior beliefs were.

Taken together these problems indicate that Verity and colleagues' IFRs should be treated very cautiously when planning epidemic management. While awaiting actual measurements, we would base IFRs on the Diamond Princess outbreak data, with the Chinese case-fatality data informing the dependence of IFR on age. We have included a crude Bayesian model with its IFR estimates by age in the appendix. IFR estimates for corresponding populations are China 0·43% (95% credible interval 0·23–0·65), UK 0·55% (0·30–0·82), and India 0·20% (0·11–0·30). The strong assumptions required, by this approach too, emphasise the need for improved data. We should replace complex models of inadequate clinical data with simpler models of epidemiological prevalence data from appropriately designed random sampling using antibody or PCR tests.

Acknowledgments

We declare no competing interests.

Supplementary Material

Supplementary appendix
mmc1.pdf (254.4KB, pdf)

Reference

  • 1.Verity R, Okell LC, Dorigatti I. Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infect Dis. 2020 doi: 10.1016/S1473-3099(20)30243-7. published online March 30. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary appendix
mmc1.pdf (254.4KB, pdf)

Articles from The Lancet. Infectious Diseases are provided here courtesy of Elsevier

RESOURCES