Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2020 Mar 30:2020.03.26.20044693. [Version 1] doi: 10.1101/2020.03.26.20044693

Why estimating population-based case fatality rates during epidemics may be misleading

Lucas Böttcher 1,*, Mingtao Xia 2,, Tom Chou 1,2,
PMCID: PMC7276002  PMID: 32511575

Abstract

Different ways of calculating mortality ratios during epidemics can yield widely different results, particularly during the COVID-19 pandemic. We formulate both a survival probability model and an associated infection duration-dependent SIR model to define individual- and population-based estimates of dynamic mortality ratios. The key parameters that affect the dynamics of the different mortality estimates are the incubation period and the length of time individuals were infected before confirmation of infection. We stress that none of these ratios are accurately represented by the often misinterpreted case fatality ratio (CFR), the number of deaths to date divided by the total number of infected cases to date. Using available data on the recent SARS-CoV-2 outbreaks and simple assumptions, we estimate and compare the different dynamic mortality ratios and highlight their differences. Informed by our modeling, we propose a more systematic method to determine mortality ratios during epidemic outbreaks and discuss sensitivity to confounding effects and errors in the data.

I. INTRODUCTION

The mortality ratio is a key metric describing the severity of a viral disease. It changes in time and can be measured in a number of ways during an epidemic. One common metric is the case fatality ratio (CFR), found by dividing the total number of deaths to date, D(t), by the total number of all cases to date N(t) [1, 2].

In the recent outbreaks of SARS-CoV-2, the CFR has been estimated from aggregated population data. We show examples of CFR curves in Fig. 1 and in the Supplemental Information (SI). As of March 26, 2020, the global CFR(t) = D(t)/N(t) = 21,306/472,762 ≈ 4.5% [3], while CFRs in individual regions vary significantly. Clearly, this estimate would correspond to an actual mortality ratio only if all the remaining unresolved individuals recover. However, some of these unresolved cases will lead to death and thus to a gradual increase of the estimated mortality ratio over time. Despite the underestimation of this type of population-based measurement, it is still commonly being used by various health officials and is often inconsistently defined as deaths/(deaths + recovereds) even though this difference has been clearly distinguished [4].

FIG. 1. Mortality-ratio estimates.

FIG. 1.

(a) Evolution of the cumulative number of infected (red), death (black), and recovered (green) cases. The size of the circles indicates the number of cases in the respective compartments on a certain day. (b–c) Estimates of mortality ratios (see Eqs. (8) and (14)) of SARS-CoV-2 infections in China and Italy. The “delayed” mortality-ratio estimate CFRd corresponds to the number of deaths to date divided by total number of cases at time tτres. Many studies use CFRd, although this metric underestimates the individual-based mortality (defined below). Another population-based mortality ratio is Mp(t), the number of deaths divided by the sum of death and recovered cases, up to time t. The data are based on Ref. [5].

During the severe 2003 acute respiratory syndrome (SARS) outbreak in Hong Kong, the World Health Organization (WHO) also used the aforementioned estimate to obtain a CFR of 4.5% while the final values approached 17.0% [6, 7]. For the ongoing SARS-CoV-2 outbreaks, analyses by WHO and other institutions still use the CFR = D(t)/N(t) metric (see Table I). Since actual mortality probabilities are important measures for assessing the risks associated with epidemic outbreaks, typical underestimations by CFRs may lead to insufficient countermeasures and a more severe epidemic [8, 9].

TABLE I.

Different CFR estimates of COVID-19.

reference CFR
Xu et al. [1, 11] and Mahase [12] 2 %
Wu et al. [2] 0.1–1 % (outside Wuhan)
World Health Organization [13, 14] 2–4 %

An unambiguous, physiologically-based definition of mortality ratio is the probability that a single, newly infected individual will eventually die of the disease. If there are sufficient individual-level or cohort data, these probabilities can be further stratified according to patient age, gender, health condition, etc. [10]. The mortality ratio or probability of death should be an intrinsic property of the virus and the infected individual, depending on age, health, access to health care, etc. This intrinsic probability ought not to be directly dependent on the population-level dynamics of infected and recovered individuals. Thus, it can in principle be framed by a model for the survival probability of a single infected individual. Whether this individual infects others does not directly affect his probability of eventually dying. In section IIA, we derive a model describing the probability M1(t) that an infected individual dies or recovers before time t. Importantly, these models incorporate the duration of infection (including an incubation period) before a patient tests positive at time t = 0.

However, as mentioned above, the CFR and other mortality measures are typically reported based on population data. Do these population-based measures, including CFR, provide reasonable measures of the probability of death of an individual? In section IIB, we describe how mortality ratios are defined within population-level models, specifically, a disease duration-structured SIR model. We will show that population-based estimates are typically not a meaningful measure of mortality, but that under simplifying assumptions, the mortality ratio Mp(t) is more closely related to the number of deaths to date divided by the number of dead plus the number of recovered individuals to date [4]. In the simplest approximation, the mortality ratio is currently (as of March 26, 2020) 21,306/(21,306+114,749) ≈ 15.7% [3], significantly higher than the March 26, 2020 CFR≈ 4.5% estimate.

We use the same estimates for the rate parameters in our individual and population models to compute the different mortality ratios. Note that in general, both the individual mortality probability M1(t) and the population-based estimates Mp(t) depend on the time of measurement t. By critically analyzing these estimates and another ad hoc “delayed” ratio CFRd, we illustrate and interpret the differences among these measures and discuss how changes or uncertainty in the data affect them. In section III, we summarize our results and identify a correction factor to transform population-level mortality estimates into individual mortality probabilities.

II. RESULTS

A. Intrinsic individual mortality rate

Consider an individual that, at the time of positive testing (t = 0), had been infected for a duration τ1. A “survival” probability density can be defined such that P(τ,t|τ1)dτ is the probability that the patient is still alive and infected (not recovered) at time t > 0 and has been infected for a duration between τ = t+τ1 and τ +dτ. Since τ1 is unknown, it must be estimated or averaged over. The individual survival probability evolves according to

P(τ,t|τ1)t+P(τ,t|τ1)τ=(μ(τ,t|τ1)+c(τ,t|τ1))P(τ,t|τ1), (1)

where the death and recovery rates, μ(τ,t|τ1) and c(τ,t|τ1), depend explicitly on the duration of infection at time t and implicitly on patient health and age a [15]. They may also depend explicitly on time t to reflect changes in clinical policy or available health care. For example, enhanced medical care may decrease the death rate μ, giving the individual’s intrinsic physiological processes a chance to cure the patient. These intrinsic individual-based death and recovery rates do not directly depend on population-level viral transmission.

Equation (1), assuming an initial condition of one particular individual who has been infected for time τ1 at the time of positive test, can be solved using the method of characteristics. From the solution P(τ,t|τ1) one can derive the probabilities of death and recovery by time t as

Pd(t|τ1)=0tds μ(τ1+s,s)P(τ1+s,t|τ1),   Pr(t|τ1)=0tds c(τ1+s,s)P(τ1+s,t|τ1). (2)

The probability that an individual died before time t, conditioned on resolution (either death or recovery), is then defined as

M1(t|τ1)=Pd(t|τ1)Pd(t|τ1)+Pr(t|τ1), (3)

where we have explicitly indicated the dependence on the duration of the infection prior to confirmation of infection. These formulae also depend on all other relevant patient attributes such as age, accessibility to health care, etc. In the long-time limit, when resolution has occurred (Pd(∞) + Pr(∞) = 1), the individual mortality ratio is simply M1(∞) = Pd(∞). This result relies only on intrinsic individual rate parameters and is completely independent of disease transmission at the population level. In order to capture the dependence of death and recovery rates on the time an individual has been infected, we propose a constant recovery rate c and a simple piece-wise constant death rate μ(τ|τ1) that is not explicitly a function of time t:

c(τ,t|τ1)=c,   μ(τ|τ1)={0ττincμ1τ>τinc, (4)

where τinc is the incubation-time parameter, the time after infection during which an individual remains asymptomatic. During this incubation period, the patient has zero death rate but can recover by clearing the virus. In other words, some patients fully recover without ever developing serious symptoms.

For coronavirus infections, the incubation period appears to be highly variable with a mean of τinc ≈ 6.4 days [17]. We can estimate μ1 and c using individual patient data where 19 patients (outside Hubei) had been tracked from the date on which their first symptoms occurred until the disease resolved [16].

Two out of 19 patients died, on average, 20.5 days after first symptoms occurred and the mean recovery time of the remaining 17 patients is 16.8 days. We show the recovery-time distribution in Fig. 2(a). Since we know that the mortality ratio in this dataset is 2/19, we can determine the dependence between μ1 and c according to μ1/(μ1 +c) ≈ 2/19 (or c/μ1 ≈ 8.5). The constant recovery and after-incubation period death rates [18] are estimated to be

c=120.5/day 0.049/day   and   μ1=c/8.50.006/day. (5)

FIG. 2. Individual mortality ratio.

FIG. 2.

(a) Recovery time after first symptoms occurred based on individual data of 17 patients [16]. (b) Death- and recovery rates as defined in Eq. (4). The death rate μ(τ1) approaches μ1 for τ1 > τinc, where τinc is the incubation period and τ1 is the time the patient has been infected before first being tested positive. (c) The individual mortality ratio M1(t|τ1) for τinc = 6.4 days at different values of τ1. Note that the individual death probability Pd(t|τ1) and M1(t|τ1) are nonzero only after t > τincτ1. (d) The asymptotic individual mortality ratio M1(∞) (see Eq. (3)) as a function of τ1.

Using these numbers, the recovery and death rate functions c(τ,t|τ1) and μ(τ|τ1) are plotted as functions of τ in Fig. 2(b). We show the evolution of M1(t|τ1) at different values of τ1 in Fig. 2(c). The corresponding long-time limit M1(∞) is readily apparent in Fig. 2(d): for τ1τinc, M1(∞) = μ1/(μ1+c) ≈ 0.105, while M1(∞) < μ1/(μ1+c) when τ1 < τinc. The smaller expected mortality associated with early identification of infection arises from the remaining incubation time during which the patient has a chance to recover without possibility of death. When conditioned on testing positive at or after the incubation period, the patient immediately suffers a positive death rate, increasing his M1(∞).

Finally, in order to infer M1 (and also indirectly μ and c) during an outbreak, a number of statistical issues must be considered. First, if the outbreak is ongoing, there may not be sufficient long-time cohort data. Second, τ1 is unknown. Since testing typically occurs at the onset of symptoms, most positive patients will have been infected a few days earlier. The uncertainty in τ1 can be represented by a probability density ρ(τ1) for the individual. The expected mortality can then be constructed as an average over ρ(τ1):

M¯1(t)=P¯d(t)P¯d(t)+P¯r(t), (6)

where P¯d(t) and P¯r(t) are the the τ1-averaged probabilities death and cure probabilities. Note that this averaging is different from the population-level averaging M¯1(t)=0M1(t|τ1)ρ(τ1)dτ, which would describe the average of mortality ratios over a population with heterogeneous initial durations τ1.

Some properties of the distribution ρ(τ1) can be inferred from the behavior of patients. Before symptoms arise, only very few patients will know they have been infected, seek medical care, and get their case confirmed (i.e., ρ(τ1) ≈ 0 for τ1 ≈ 0). The majority of patients will contact hospitals/doctors when they have been infected for a duration of τinc. The distribution ρ(τ1) thus reaches its maximum near or shortly after τinc. Since patients are most likely to test positive after experiencing symptoms, we choose a gamma distribution

ρ(τ1;n,γ)=γnΓ(n)τ1n1eγτ1 (7)

with shape parameter n = 8 and rate parameter γ = 1.25/day so that the mean n/γ is equal to τinc = 6.4.

Upon using the rates in Eqs. (4) and averaging over ρ(τ1), we derived expressions for P¯(t), P¯d(t), and P¯r(t) which are explicitly given in the SI. Using the values in Eq. (5) we find an expected individual mortality ratio M¯1(t) (which are subsequently plotted in Fig. 3) and its asymptotic value M¯1()=P¯d()=0.101. Of course, it is also possible to account for more complex time-dependent forms of c and μ1 [19], but we will primarily use Eqs. (4) in our subsequent analyses.

FIG. 3. Population-level mortality-ratio estimates.

FIG. 3.

Outbreak evolution and mortality ratios without containment measures (a,c) and with quarantine (b,d). The curves are based on numerical solutions of Eqs. (9) using the initial condition I(τ,0) = ρ(τ;8,1.25) (see Eq. (7)). The death and recovery rates are defined in Eqs. (4) and (5). We use a constant infection rate β1S(0) = 0.158/day, which we estimated from the basic reproduction number of SARS-CoV-2 [17]. To model quarantine effects, we set β1 = 0 for t > 50. We show the mortality-ratio estimates Mp0(t) and Mp1(t) (see Eq. (14)) and CFRd(t,τres) (see Eqs. (8), (11), (12), and (14)).

In the next section, we define population-based estimates for mortality ratios, Mp(t), and explore how they can be computed using SIR-type models. By comparing M1(t) to Mp(t), we gain insight into whether population-based metrics are good proxies for individual mortality ratios. We will outline the mathematical differences and additional errors that confound population-level estimates.

B. Infection duration-dependent SIR model

While individual mortalities can be estimated by tracking many individuals from infection to recovery or death, oftentimes, the available data are not resolved at the individual level and only total populations are given. Typically, one has the total number of cases accumulated up to time t, N(t), the number of deaths to date D(t), and the number of cured/recovered patients to date R(t) (see Fig. 1). The CFR is simply D(t)/N(t). Note that N(t) includes unresolved cases and that N(t) ≥ R(t)+D(t). Resolution (death or recovery) of all patients, N(∞) = R(∞)+D(∞), occurs only well after the epidemic passes.

A variant of the CFR commonly used in the literature [1, 2] is the delayed CFR

CFRd(t,τres)=D(t)N(tτres), (8)

where τres is a corresponding time lag that accounts for the duration from the day when first symptoms occurred to the day of cure/death. Many estimates of the COVID-19 mortality ratio assume that τres = 0 [1, 2] and thus underestimate the number of death cases D(t) that result from a certain number of infected individuals. Similar underestimations using CFRd have been reported in previous epidemic outbreaks of SARS [4, 6] and Ebola [20].

Alternatively, a simple and interpretable population-level mortality ratio is Mp(t) = D(t)/(R(t)+D(t)), the death ratio of all resolved cases. To provide a concrete model for D(t) and R(t), and hence Mp(t), we will use a variant of the standard infection duration-dependent susceptible-infected-recovered (SIR)-type model described by [21]

dS(t)dt=S(t)0dτβ(τ,t)I(τ,t),I(τ,t)t+I(τ,t)τ=(μ(τ,t)+c(τ,t))I(τ,t), (9)

and dR(t)/dt=0dτc(τ,t)I(τ,t), where S(t) is the number of susceptibles, I(τ,t) is density of individuals at time t who have been infected for time τ, and R(t) is the number of recovered individuals. The rate at which an individual infected for time τ at time t transmits the infection to a susceptible is denoted by β(τ,t)S(t).

Note that the equation for I(τ,t) is identical to the equation for the survival probability described by Eq. (1). It is also equivalent to McKendrick age-structured models [22, 23] [24]. Infection of susceptibles is described by the boundary condition

I(τ=0,t)=S(t)0dτβ(τ,t)I(τ,t), (10)

which is similar to that used in age-structured models to represent birth [22]. Finally, we use an initial condition consistent with the infection duration density given by Eq. (7): I(τ,0) = ρ(τ;n = 8 = 1.25). Note that Eq. (10) assumes that all newly infected individuals are immediately identified; i.e., these newly infected individuals start with τ1 = 0. After solving for the infected population density, the total number of deaths and recoveries to date can be found via

D0(t)=0tdt0dτ μ(τ,t)I(τ,t),   R0(t)=0tdt0dτ c(τ,t)I(τ,t). (11)

The corresponding total number of cases N(t) in Eq. (8) is

N0(t)=R0(t)+D0(t)+0dτ I(τ,t). (12)

In the definitions of D0(t), R0(t), and N0(t), we account for all possible death and recovery cases to date (see SI) and that newly infected individuals are immediately identified. We use these case numbers as approximations of the reported case numbers to study the evolution of mortality-ratio estimates. Mortality ratios based on these numbers underestimate the actual individual mortality M1 (see section IIA) since they involve individuals that have been infected for different durations τ, particularly recently infected individuals who have not yet died.

An alternative way to compute populations is to exclude the newly infecteds and consider only the initial cohort. The corresponding populations in this case are defined as

D1(t)=0tdttdτ μ(τ,t)I(τ,t),   R1(t)=0tdttdτ c(τ,t)I(τ,t). (13)

Since D1(t) and R1(t) do not include infecteds with τ < t, they exclude the effect of newly infected individuals, but may yield more accurate mortality-ratios as they are based on an initial cohort of individuals in the distant past. The infections that occur after t = 0 contribute only to I(τ < t,t); thus, D1(t) and R1(t) do not depend on the transmission rate β or the number of susceptibles S(t). Note that all the populations derived above implicitly average over ρ(τ1;n,γ) for the first cohort of identified infecteds (but not subsequent infecteds). Moreover, the population density I(τt,t) follows the same equation as P¯(t|τ1) provided the same ρ(τ1;n,γ) is used in their respective calculations.

The two different ways of partitioning populations (Eqs. (11) and (13)) lead to two different population-level mortality ratios

Mp0(t)=D0(t)D0(t)+R0(t)   and   Mp1(t)=D1(t)D1(t)+R1(t). (14)

Since the populations D0(t) and R0(t), and hence Mp0(t), depend on disease transmission through β(τ,t) and S(t), we expect Mp0(t) to carry a different interpretation from M1(t) and Mp1(t).

In the special case in which μ and c are constants, the time-integrated populations 0tdt0dτI(τ,t) and 0tdttdτI(τ,t) factor out of Mp0(t) and Mp1(t), rendering them time-independent and

Mp0,1=μ1μ1+c=M1. (15)

Thus, only in the special time-homogeneous case do both population-based mortality ratios become independent of the population (and transmission β) and coincide with the individual death probability.

To illustrate the differences between M1(t), Mp0,1(t), and CFRd(t,τres) in more general cases, we use the simple death and cure rate functions given by Eqs. (4) in solving Eqs. (1) and (9). For β(τ,t) in Eq. (10), we account for incubation effects by neglecting transmission during the asymptomatic incubation period (ττinc) and assume

β(τ,t)={0ττincβ1τ>τinc. (16)

We use the estimated basic reproductive number R0=β1S(0)/(μ1+c)2.91 [17] to fix β1S(0)=(μ1+c)R00.158/day. We also first assume that the susceptible population does not change appreciably before quarantine and set S(t) = S(0). Thus, we only need to solve for I(τ,t) in Eqs. (9) and (10). We solve Eqs. (9) and (10) numerically (see the Methods section for further details) and use these numerical solutions to compute D0,1(t), R0,1(t), and N0,1(t) (see Fig. 3(a) and (b)), which are then used in Eqs. (14) and CFRd(tτres). To determine a realistic value of the time lag τres, we use data on death/recovery periods of 36 tracked patients [16] and find that patients recover/die, on average, τres = 16.5 days after first symptoms occurred.

We show in Figs. 3(c) and (d) that Mp1(t) approaches the individual mortality ratio M¯1()0.1 of section IIA. This occurs because the model for P(τ,t) and I(τ,t) are equivalent and we assumed that the initial distribution of τ for both quantities are given by ρ(τ;8,1.25). However, the population-level mortality ratios CFRd(t,τres) and Mp0(t) also take into account recently infected individuals who may recover before symptoms. This difference yields different mortality ratios because newly infecteds are implicitly assumed to be detected immediately and all have τ1 = 0. Thus, the underlying infection-time distribution is not the same as that used to compute M¯p1(t) (see SI for further details). The mortality ratios CFRd(t,τres) and Mp0(t) should not be used to quantify the individual mortality probability of individuals who tested positive after their incubation period. During the course of an outbreak, the measures CFRd(t,τres) and Mp0(t) are subject to another confounding influence. Since D(t), R(t), and N(t) do not change with the same rates at the same time, these population-level mortality estimates only reach their steady state after sufficiently long times (see Fig. 3(c) and (d)).

To summarize, we described two confounding factors that complicate the direct use of population-level mortality ratio to estimate individual mortality probabilities. First, infection-time distributions ρ(τ;n,γ) that are meaningful on an individual level may not correspond to those in population-level data. Second, population-level mortality ratios are often time-dependent and most informative only in the steady state after the outbreak stopped.

The evolution of the mortality ratios in Fig. 3 qualitatively resembles the behavior of the mortality-ratio estimates in Fig. 1. As shown in Fig. 1, the population-based estimates for coronavirus varies, decreasing in time for China but fluctuating for Italy. These changes could result from changing practices in data collecting, or from explicitly time-inhomogeneous parameters μ(τ,t), c(τ,t), and/or β(τ,t).

Although population-level quarantining does not directly affect the individual mortality M1(t|τ1) or M¯1(t), it can be easily incorporated into the SIR-type population dynamics equations through changes in β(τ,t)S(t). For example, we have set S(t > tq) = 0 to represent implementation of a quarantine after tq = 50 days of the outbreak. After tq = 50 days, no new infections occur and the estimates CFRd(t,τres) and Mp0(t) start converging immediately towards their steady-state values (see Fig. 3(d)). Since the number of deaths decreases after the implementation of quarantine measures, the delayed CFRd(t,τres = 17) is first decreasing until t = tq +τres = 67. For t > 67, the CFRd(t,τres = 17) measures no new cases and is thus equal to the CFR.

III. DISCUSSION AND SUMMARY

After an outbreak, it is important to assess the severity of the disease by estimating its mortality and other disease characteristics. Assuming accurate data, the often-used CFR and delayed CFR typically underestimate the true, final death ratio. For example, during the SARS outbreaks in Hong Kong, the WHO first estimated the fatality rate to 2.5% (March 30, 2003) whereas the final estimates reached values of about 17.0% (June 30, 2003) [7]. Standard metrics like the CFR are seen to be easily confounded by and sensitive to uncertainty in intrinsic disease parameters such as the incubation period and the time τ1 a patient had been infected before clinical confirmation of infection. For the recent COVID-19 outbreaks, CFR-based measures may still provide reasonable estimates of the actual mortality across different age classes due to a counter-acting error in the numbers of unreported mild-symptom cases.

Here, we stress that more mechanistically meaningful and interpretable metrics can be defined and be as easily estimated from data as CFRs. Our proposed mortality ratios for viral epidemics are defined in terms of (i) individual survival probabilities and (ii) population ratios using numbers of deaths and recovered individuals. Both of these measures are based on the within-host evolution of the disease, and in the case of Mp0,1(t), the population-level transmission dynamics. Thus, these metrics directly incorporate key parameters operating on the weeks or months timescale, the incubation time τinc and time of prior infection τ, through the solution of age-structured PDEs. Among the metrics we describe, Mp1(t) is structurally closest to M¯1(t) in that both are independent of transmission β since new infections are not considered. Both of these converge after an incubation time τinc to a value smaller than or equal to μ1/(μ1 + c).

The most accurate estimates of M1 can be obtained if we keep track of the fate of cohorts that were infected within a small time window in the past. By following only these individuals, one can track how many of them died as a function of time. As more cases arise, one should stratify them according to estimated τ to gather improving statistics for M1(∞). These data should also be collated according to the other central factor in COVID-19 mortality: patient age. With the further spread of SARS-CoV-2 in different countries, data on more individual cases of death and recovery can be more easily stratified by age, health condition, and other individual characteristics. Using identical initial infection time distributions ρ(τ1;n,γ) (see Eq. (7)), the long-time limit of Mp1(t) approaches the individual mortality M¯1() (see Eq. (6)).

Besides accurate cohort data, for which at present there are few for coronavirus, cumulative population data has been used to estimate the mortality ratio. The metrics Mp0(t) and CFR(t) are based on these aggregate populations but implicitly depend on new infections and the transmission rate β. Despite this confounding factor, Mp0(t) and CFRd(t,τres) approach ecτinc μ1/(μ1+c) as t → ∞, where ecτinc  is the probability that no recovery occurred during the incubation time τinc. Based on these results, we can establish the following connection between the different mortality ratios for initial infection times with distribution ρ(τ1;n,γ) and mean τ¯=n/γ:

CFRd()=Mp0()ecτ¯Mp1()=ecτ¯M¯1(). (17)

According to Eq. (17), population-level mortality estimates (e.g., CFR and Mp0 can be transformed, at least approximately, into individual mortality probabilities using the correction factor ecτ¯ with τ¯τinc.

Besides the mathematical differences between M1(t) and Mp0(t), CFR, estimating Mp0(t) and CFR(t) from aggregate populations implicitly incorporate a number of confounding factors that lead to variability in these estimates. In Fig. 4, we plot the population-level mortality-ratio estimates Mp0 against the CFR for different regions and observe large variations and very little correlation between countries [25]. As of March 26, 2020, the value of Mp0 in Italy is almost 45% and can increase further if the current conditions (e.g., treatment methods, age group proportion of infecteds, etc.) do not change. Differences between the mortality ratios in China and Italy (see Figs. 1(b) and (c)) might be a result of varying medical treatment strategies, different practices in data collecting (e.g., post-mortem testing), and differences in the age demographics between the countries.

FIG. 4. Region-dependence of COVID-19 mortality-ratio estimates.

FIG. 4.

Mortality-ratio estimates of COVID-19 in different regions (see Eqs. (8) and 14 (τres = 0)). We used data on the cumulative number of cases, recoveries, and deaths in Ref. [5] as of March 24, 2020. The marker sizes indicate the population of the corresponding countries. The metrics Mp0(t) and and CFR are largely uncorrelated with correlation coefficient 0.33.

In general, even if the cohort initially tested was only a fraction of the total infected population, tracking M¯1(t) or Mp1(t) of this cohort still provides an accurate estimation of the mortality rate. However, the newly infecteds that contribute to CFR and Mp0(t) at later times may not all be tested or may be tested at different times after they were infected. A reported/tested fraction f < 1 would not directly affect the CFRs or mortality ratios if the unreported/untested individuals die and recover in the same proportion as the tested infecteds. Undertesting will overestimate true CFR or mortality rates if the untested infecteds are less likely to die than the tested infecteds. In other words, if the untested (presumably because they were mildly or asymptomatic) population predominantly recovers instead of dying, the actual CFR and mortality ratios would be significantly lower than those based on tested individuals. If untested infecteds do not die, the asymptotic mortality of all infected individuals Mp0,1()Mp0,1() (see the SI). Current estimates show that only a minority of SARS-CoV-2 infections are reported (e.g., f ≈ 14% in China before January 23, 2020) [26].

Besides under-reporting, the delay in transmission after becoming infected will also affect Mp0(t). Although we have assumed that transmission occurs only after the incubation period when symptoms arise, there is evidence of asymptomatic transmission of coronavirus [26, 27]. Asymptomatic transmission can be modeled by setting β(τ) > 0 even for τ < τinc. An undelayed transmission in a nonquarantine scenario causes relatively more new infecteds who have not had the chance to die yet, leading to a smaller mortality ratio Mp0(t). Within our SIR model, delaying transmission reduces the number of infected individuals and deaths at any given time but increases the measured mortality ratio Mp0(t). Without quarantine, the asymptotic values Mp0() and CFR(∞) will also change as a result of changing the transmission latency period, as shown in the SI. With perfect quarantining, the asymptote Mp0() is eventually determined by a cohort that does not include new infections and is thus independent of the transmission delay.

In this work, we have explicitly defined a number of interpretable mathematical metrics that represent the risk of death. By rigorously defining these metrics, we are able to reveal the inherent assumptions and factors that affect their estimation. Within survival probability and SIR-type models, we explicitly illustrate how physiologically important parameters such as incubation time, death rate, cure rate, and transmissibility influence the metrics. We also discussed how statistical factors such as time of testing after infection (τ1) and testing ratio (f) affect our estimates. Given the uncertainty in the testing fraction, we conclude that M1(t) and Mp1(t) is best interpreted as approximately the mortality probability conditioned on being tested positive. In practice, these are probably also good estimates of mortality of patients conditioned on showing symptoms. In addition to our metrics and mathematical models, we emphasize the importance of curating individual cohort data. These data are more directly related to the probability of death M1(t) and are subject to the fewest confounding factors and statistical uncertainty.

METHODS

To numerically solve Eqs. (9) and (10), we used a uniform discretization τk = kΔτ,k = 0,1,…,K. A backward difference operator [I(τk,t) − I(τk−1,t)]/τ) is used to approximate τI(τ,t) and a predictor-corrector Euler scheme is used to advance time [28]. Setting the cut-off I(−Δτ,t) ≡ 0 and I(KΔτ,t) ≡ 0, the resulting discretized equations for the full SIR model are

S(t+Δt)=S(t)ΔtS(t)k=0Kβ(τk,t)I(τk,t)Δτ,I˜(τk,t)=I(τk,t)ΔtI(τk,t)I(τk1,t)ΔτΔt(c(τk,t)+μ(τk,t))I(τk,t),I(τk,t+Δt)=I(τk,t)Δt2[I(τk,t)I(τk1,t)Δτ+(c(τk,t)+μ(τk,t))I(τk,t)+I˜(τk,t)I˜(τk1,t)Δτ+(c(τk,t+Δt)+μ(τk,t+Δt))I˜(τk,t)]+δk,0ΔtΔτS(t)j=0Kβ(τj,t)I(τj,t)Δτ, (18)

where I˜ is the initial predicted guess, and the last term proportional to δk,0 encodes the boundary condition Eq. (10). Note that we use k=0Kβ(τk,t)I(τk,t)Δτ to indicate the numerical evaluation of 0dτβ(τ,t)I(τ,t). Quadrature methods such as Simpson’s rule and the trapezoidal rule can be used to approximate the integral more efficiently.

The total deaths, recovereds, and infecteds at time t are found by

D0(mΔt)=12j=0mk=0Kc(kΔτ,jΔt)[I(kΔτ,jΔt)+I˜(kΔτ,jΔt)]ΔτΔt,
R0(t)=12j=0mk=0Kμ(kΔτ,jΔt)[I(jΔτ,jΔt)+I˜(kΔτ,jΔt)]ΔτΔt,
I(mΔt)=k=0KI(kΔτ,mΔt)Δτ,

with analogous expressions for D1(mΔt) and R1(mΔt). To obtain a stable integration scheme, the time steps Δt and Δτ have to satisfy Δt/(2Δτ) < 1. In all of our numerical computations, we thus set Δt = 0.002,Δτ = 0.02, and K = 104. In the SI, we show additional plots of the magnitude of I(τ,t) in the tτ plane.

Supplementary Material

1

ACKNOWLEDGEMENTS

LB acknowledges financial support from the SNF Early Postdoc.Mobility fellowship on “Multispecies interacting stochastic systems in biology”. The authors also acknowledge financial support from the Army Research Office (W911NF-18-1-0345), the NIH (R01HL146552), and the National Science Foundation (DMS-1814364).

Footnotes

DATA AVAILABILITY

The datasets that we used in this study are stored in the publicly accessible repositories of Refs. [3, 5, 16].

COMPETING INTERESTS

The authors declare no competing interests.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES