Skip to main content
SAGE - PMC COVID-19 Collection logoLink to SAGE - PMC COVID-19 Collection
. 2022 Feb;31(2):348–360. doi: 10.1177/09622802211061927

A sequential test to compare the real-time fatality rates of a disease among multiple groups with an application to COVID-19 data

Yuanke Qu 1, Chun Yin Lee 2, KF Lam 1,3,
PMCID: PMC8832113  PMID: 34878362

Abstract

Infectious diseases, such as the ongoing COVID-19 pandemic, pose a significant threat to public health globally. Fatality rate serves as a key indicator for the effectiveness of potential treatments or interventions. With limited time and understanding of novel emerging epidemics, comparisons of the fatality rates in real-time among different groups, say, divided by treatment, age, or area, have an important role to play in informing public health strategies. We propose a statistical test for the null hypothesis of equal real-time fatality rates across multiple groups during an ongoing epidemic. An elegant property of the proposed test statistic is that it converges to a Brownian motion under the null hypothesis, which allows one to develop a sequential testing approach for rejecting the null hypothesis at the earliest possible time when statistical evidence accumulates. This property is particularly important as scientists and clinicians are competing with time to identify possible treatments or effective interventions to combat the emerging epidemic. The method is widely applicable as it only requires the cumulative number of confirmed cases, deaths, and recoveries. A large-scale simulation study shows that the finite-sample performance of the proposed test is highly satisfactory. The proposed test is applied to compare the difference in disease severity among Wuhan, Hubei province (exclude Wuhan) and mainland China (exclude Hubei) from February to March 2020. The result suggests that the disease severity is potentially associated with the health care resource availability during the early phase of the COVID-19 pandemic in mainland China.

Keywords: Brownian motion, COVID-19, emerging infectious disease, fatality rates, sequential test

1. Introduction

The incidence of emerging infectious diseases has increased worldwide in recent decades and has posed one of the greatest threats to public health globally.1 In particular, the ongoing coronavirus pandemic (COVID-19), first identified in Wuhan city of China in December 2019, is affecting 217 countries and territories across the world with a death toll of over 1.7 million out of around 79 million cases by the end of 2020.2 The COVID-19 crisis has become a public health emergency and has seriously disrupted every aspect of our life, economies, and societies. For this deadly infectious disease caused by a novel pathogen, its lethality is one of the most important characteristics of the virulence of the disease for evaluating the effectiveness of responding strategies.

The case fatality rate (CFR) is one of the most essential epidemiological quantities to measure the virulence of an infectious disease, which is commonly defined as the proportion of deaths among all confirmed cases. The CFR has been adopted by health authorities in the current COVID-19 pandemic as a severity indicator.3 However, it was reported that this simple estimator only performs well at the end of an epidemic when all the cases have been resolved (affected individuals either died or recovered), but may not be a reliable indicator during an ongoing epidemic.4,5 Various statistical approaches have been proposed to provide a more accurate estimate for disease severity by adjusting for the reporting delay from illness onset to death during an epidemic.68 Among others, Yip et al.9 suggested that the fatality rate of an emerging epidemic should be time-varying in nature, and a decreasing trend in fatality rate could be a reflection of an effective measure. To provide some critical guidance on developing prompt decisive policies during an outbreak, they proposed to use the real-time fatality rate (RTFR) to measure the severity of an epidemic as opposed to the traditional CFR. Specifically, the RTFR is defined as the probability of death conditioned on a transition to death or recovery based on a counting process approach. Relative to CFR, the RTFR was shown to be more sensitive to capture changes in fatality rate during the course of an epidemic. To detect a change in RTFR statistically, Lam et al.10 developed a one-sample sequential test for the null hypothesis of constant fatality rate, which is applied to investigate the effectiveness of the interventions in Hong Kong and Beijing during the severe acute respiratory syndrome epidemic in 2003. Therein, the testing procedure starts before the implementation of a potential intervention. Under the null hypothesis, the RTFR remains constant, which means that the intervention is not effective in suppressing the fatality, at least, in a short-term period, say two months. Hence, a significant reduction in RTFR in a short run can be assumed to be attributed to an effective intervention. However, one should be cautious about the test results in the long run, as a progressive reduction in RTFR can be caused by other factors, such as the rise in temperature, improved medical health care, and mutation that the virus becomes less lethal. A more promising test to identify a potential factor that affects the severity is to compare the RTFRs among multiple independent groups over time that the effects of the above-mentioned confounding factors are shared by all groups.

There exists a modest statistical literature for comparing virulence among different subgroups. Reich et al.11 defined the relative CFR as the CFR of one group divided by that of another reference group. They compared the group-specific fatality rates using a generalized linear model framework, which was adopted by Chen et al.12 for estimating CFR based on the maximum profile-likelihood approach. However, the assumption of time-invariant CFRs in their approach is quite restrictive and is presumably more suitable for chronic diseases rather than novel emerging infectious diseases. Apart from that, most of the contemporary studies for epidemics compared the disease severity among different subgroups in a pre-specified study period and drew a conclusion on the performance of a certain intervention only at the end of the study. This approach fails to assess the efficacy of the implemented measures in a timely manner even if there is strong statistical evidence supporting differences in performance among subgroups during the observation period. Moreover, these conclusions may not be applied directly to future episodes of the same epidemic even if the viruses are of the same strain, because the characteristics of the viruses may change. During the outbreak of a novel infectious epidemic, there is an urgent need to identify an effective intervention at the earliest possible time so that prompt action can be taken to secure public health in an effective way. Also, the collection of complete and complex data is extraordinarily difficult due to various administrative reasons. For instance, information such as the times to death and recovery of patients is non-trivial and hard to assess. On the other hand, it is relatively easy to obtain the summary data on the number of confirmed cases, deaths, and recoveries from countries, such as those compiled daily during the ongoing COVID-19 pandemic. In addressing the aforementioned problems, a statistical test that provides a timely comparison of the fatality rates among multiple groups based on a simple data structure is warranted.

Motivated by the idea of Yip et al.9 where they captured the progressive changes in disease severity efficiently using RTFR, we propose a sequential test for the null hypothesis of equal RTFRs among different subgroups over a time period . The null hypothesis can be rejected at time where as soon as statistical evidence accumulates. With the proposed method, one can test for the difference in RTFRs among neighboring areas, different age groups, different treatment arms of a clinical trial to inform and formulate public health strategies. For example, a single-arm clinical trial conducted in March 2020 found clinical improvement in patients with severe COVID-19 receiving Remdesivir, the first drug recommended for treating COVID-19.13 To further study this potential antiviral agent, a growing number of controlled clinical trials are conducted to judge its efficacy.14,15 In this case, a potential usage of a multiple sub-group test is to compare the RTFRs of patients receiving Remdesivir and standard treatment over time. Essentially, a rejection of the null hypothesis before the end of the study indicates the superior effectiveness of one treatment over the other(s). Another example, as will be illustrated in the “Application” section, is to compare the difference in RTFRs across different neighboring areas to identify the target areas that need assistance in medical health care resources during public health emergencies.

The test statistic for two-sample comparison and its asymptotic properties are studied in section 2. The generalization of the two-sample case to the -sample case () is delineated in section 3. In section 4, a large-scale simulation study is carried out to evaluate its finite-sample performance in various scenarios. In section 5, the proposed test is applied to the COVID-19 epidemic data of mainland China to investigate the difference in disease severity among three separate area clusters: Wuhan, Hubei province (exclude Wuhan) and mainland China (exclude Hubei) during the disease outbreak. Discussions and recommendations are given in section 6.

2. A two-sample test for equality of RTFRs

We consider two populations, classified by age, treatment, or any other categories that are of our interest, subject to infection during an epidemic. Some basic epidemiological data are collected in real-time. Very often, public health officials aim to examine the difference in disease severity among two subgroups. Typically, clinicians are interested in tackling the following questions:

  • Is the newly proposed treatment more effective than the standard treatment (placebo) in treating the specific infectious disease?

  • Compared with area A where no measures have been taken, is the fatality rate lower in area B with effective policies?

  • Do patients from resource-poor area A have a higher fatality rate than those from area B?

These questions have primary importance to guide the decision-making process during the outbreak of an infectious disease.

2.1. The test statistic

We set the observation period to be in the hope that a reliable decision can be made at time . Time can be set as the day where a certain intervention, treatment, or response strategy is implemented on a particular group. We partition into regular intervals (naturally in days or weeks) and the information regarding the numbers of inpatients, deaths and recoveries for the two subgroups are collected in sequence at the end of the interval, . Denote the numbers of deaths and recoveries in group in the interval by and , respectively. We further denote the cumulative numbers of deaths and recoveries in group at the end of the interval by and , respectively, where and . Let be the number of inpatients just before the start of the interval for group . We assume that in the interval, each of the inpatients will either die, recover or remain in the hospital with respective probabilities , and . Conditional on the past information, we have

(nk,D(h),nk,R(h)|Ik(h1))Multinomial(Ik(h1);pk,D(h),pk,R(h)) (1)

Let be the filtration or history generated by the observed data, which satisfies the usual regularity conditions.16 For group , a discrete RTFR17 for the interval by considering recovery and death as two competing risks is defined as

πk(h)=pk,D(h)pk,D(h)+pk,R(h) (2)

which can be treated as the probability of a death conditioned on an event of death or recovery. The maximum likelihood estimator (MLE) of can be easily shown to be

π^k(h)=p^k,D(h)p^k,D(h)+p^k,R(h)=nk,D(h)nk,D(h)+nk,R(h)

where and are the respective sample death and recovery proportions for group in the interval. The above framework, together with a smoothed version of the RTFR estimator, was summarized in Yip et al.17 With the sensitivity in picking up the changes in severity over time, the RTFR in (2) can be used to compare the virulence of the disease between two subgroups. That is, to test

H0:π1(h)=π2(h)forallhHversusH1:π1(h)>π2(h)forsomehH

which can be reformulated as

H0:p1,D(h)p1,R(h)=p2,D(h)p2,R(h)forallhHversusH1:p1,D(h)p1,R(h)>p2,D(h)p2,R(h)forsomehH (3)

When the null hypothesis of equal RTFRs between two subgroups () holds true, we expect that the ratios and are similar throughout the whole observation period. Therefore, we propose the following two-sample test statistic:

Z2(h)=j=1hw(j){n1,D(j)n2,R(j)n2,D(j)n1,R(j)} (4)

where is a locally bounded, non-negative predictable weight process. The subscript 2 in corresponds to the 2-sample case discussed in this section. The proposed test statistic has mean zero under for all but has a positive expected value under for some Let be the test statistic in (4) with the typical weights for , which represents the situation that the contribution from every interval is weighted equally. Presumably, one can introduce different sets of weights to the test statistic to allow extra flexibility. For example, the choice of weights allocates a heavier weight to the period with more inpatients in groups, but a lighter weight to the time period with fewer inpatients as the fluctuations can be erratic in these intervals. Another set of intuitive weights is , which makes the changes in fatality rate contribute equally to the test statistic throughout the whole study period regardless of the size of inpatients, and the resulting test statistic is denoted by

Z2*(h)=j=1hp^1,D(j)p^2,R(j)p^2,D(j)p^1,R(j)

2.2. Asymptotic properties of the test statistic

Denote as the time indices for the discrete time process. For , is the vector of MLEs of . Under the multinomial setting in (1), we have

p^jpjND(0,Σj)

where

Σj=(p1,D(j)(1p1,D(j))I1(j1)p1,D(j)p1,R(j)I1(j1)00p1,D(j)p1,R(j)I1(j1)p1,R(j)(1p1,R(j))I1(j1)0000p2,D(j)(1p2,D(j))I2(j1)p2,D(j)p2,R(j)I2(j1)00p2,D(j)p2,R(j)I2(j1)p2,R(j)(1p2,R(j))I2(j1))

Similarly, is the MLE of , and by the Delta method,18 we have

g(p^j)g(pj)ND(0,g(pj)pjTΣjg(pj)pj)

When the null hypothesis of equal RTFRs between the two groups holds true, for all which implies

g(p^j)ND(0,g(pj)pjTΣjg(pj)pj)

Moreover, for any , and are independent. Consequently, we have

Z2(h)=j=1hw(j)g(p^j)=j=1hw(j){n1,D(j)n2,R(j)n2,D(j)n1,R(j)}ND(0,σ2(h)) (5)

where and it can be consistently estimated by

σ^2(h)=j=1hw(j)2(I1(j1)[n2,D2(j)p^1R(j)+n2,R2(j)p^1,D(j){n2,D(j)p^1,R(j)n2,R(j)p^1,D(j)}2]+I2(j1)[n1,D2(j)p^2,R(j)+n1,R2(j)p^2,D(j){n1,D(j)p^2,R(j)n1,R(j)p^2,D(j)}2]) (6)

With the above definitions, it is easy to obtain the test statistic and its corresponding variance estimate evaluated at the endpoint . It follows that a straightforward test statistic for equal RTFRs between two subgroups is given by

V(H)=Z2(H)σ^(H) (7)

which is distributed according to a standard normal distribution under the null hypothesis. Therefore, a decision can be made at the end of the observation period and one can reject if at the level of significance, where satisfies and is the distribution function of a standard normal random variable.

Note that in (6) is a non-decreasing function and for , we have . Therefore, is a Gaussian process with independent increment. Hence, we have

Z2σ^2DWσ2onD[0,τ]

where is a standard Brownian motion. With these properties, the asymptotic normality of {, } with variance estimate can be applied to develop a sequential testing procedure, discussed in the next section.

2.3. The sequential testing procedure

Consider the above statistical test conducted over the observation period with a pre-specified value of . The idea of the sequential test is to conduct a test at the end of each of the non-overlapping intervals until a decision is made. We can think of it as a test running for days or weeks, for example, . To be specific, for a test statistic is calculated based on the filtration and a simple rule is used to decide when to stop: if exceeds the corresponding critical value , we reject the null hypothesis and stop the test at the end of the interval. To maintain the overall type I error rate in a sequential design, the cumulative type I error is achieved by a non-decreasing function defined for each interval such that and , say . Adjustment to the significance level for each interval can be made through the -spending function approach as proposed in Gordon Lan and DeMets19 with

α(h)=44Φ(zα/4h/H),h=1,2,,H (8)

Then, the set of rejection boundaries is chosen such that

1α=P[h=1H{Z2(h)<bh}]

Suppose that is first rejected at the end of the interval, where , we have

P(T=h)=P[{Z2(h)>bh}{Z2(j)<bj,1jh1}]

Based on the -spending function in (8), we have the recursive relationship

α(1)=P(T=1)=P{Z2(1)>b1}, (9)
α(2)α(1)=P(T=2)=P[{Z2(2)>b2}{Z2(1)<b1}] (10)
α(h)α(h1)=P(T=h)forh=3,,H (11)

We have shown in Section 2.2 that the sequence of test statistics is a Brownian motion with independent increments under the null hypothesis. We have, , and for each , , independent of . The calculation of the rejection boundaries can be simplified and obtained recursively by solving

P(T=1)=α(1)=1Φ(b1σ^(1))

and, for

P(T=h)=α(h)α(h1)=bhbh1b1j=1h12π(σ^2(j)σ^2(j1))×exp{(ujuj1)22(σ^2(j)σ^2(j1))}du1duh

The multiple integral is evaluated using a Gaussian quadrature that replaces each integral by a weighed sum. Details regarding the numerical computation for sequential methods are given in Chapter 19 of Jennison and Turnbull.20 Therefore, we can compare the test statistic with its corresponding rejection boundary and one can reject the null hypothesis of equal RTFRs at the end of the interval, where . If for , then one may conclude that the null hypothesis is not rejected at the end of the observation period.

3. Generalization of the two-sample test to -sample test

In the last section, we propose a sequential test for equal RTFRs between two subgroups. This test can be easily generalized to accommodate the -sample cases for handling complicated clinical issues during an epidemic. For instance, the test can be applied to identify the most effective drug among several candidate treatments; the test is useful when the health authority wants to determine whether the disease severity is associated with some continuous or ordinal scale measurements, such as age or the level of hospital-based health care technology. Moreover, it is of epidemiological importance to compare the severity of the disease among different areas or countries. For the ongoing COVID-19, one can compare the RTFRs among different areas in China, or among different countries to exchange information and learn from the experiences of areas with comparatively improved fatality rates.

Analogous to the two-sample test, we consider the observation period that contains equally spaced time intervals. We aim to test for the null hypothesis that the RTFRs of a specific disease are equal across independent subgroups against the alternative that the RTFRs increase with age or other measurements or factors of our interest over time. To be more specific, the hypotheses are

H0:p1,D(h)p1,R(h)==pK,D(h)pK,R(h)forallhHversusH1:p1D(h)p1,R(h)>>pK,D(h)pK,R(h)forsomehH

The test statistic is proposed as follows:

ZK(h)=j=1hk=1K1wk,k+1(j){nk,D(j)nk+1,R(j)nk+1,D(j)nk,R(j)} (12)

where and are the numbers of deaths and recoveries in interval for the subgroup, respectively. In particular, is a set of predetermined weights regarding the size of population in groups and . For illustration, we set in (12) to accommodate the comparison of RTFRs among three subgroups. The proposed test statistic becomes

Z3(h)=j=1hk=12wk,k+1(j){nk,D(j)nk+1,R(j)nk+1,D(j)nk,R(j)}=j=1hw1,2(j){n1,D(j)n2,R(j)n2,D(j)n1,R(j)}+w2,3(j){n2,D(j)n3,R(j)n3,D(j)n2,R(j)} (13)

Similarly, let be the vector of the MLEs of the probabilities of death and recovery in the interval for the three groups, we can show that the test statistic can be rewritten as

Z3(h)=j=1hI1(j1)I2(j1)I3(j1)[p^2,D(j){w2,3(j)p^3,R(j)/I1(j1)w1,2(j)p^1,R(j)/I3(j1)}p^2,R(j){w2,3(j)p^3,D(j)/I1(j1)w1,2(j)p^1,D(j)/I3(j1)}]=j=1hI1(j1)I2(j1)I3(j1)g*(p^j*)

where denotes the number of inpatients for group at the start of the interval, is a function of . It follows that the asymptotic variance of the test statistic in (13) can be derived easily by the Delta method, which is given by

σ^*2(h)=j=1hI12(j1)I22(j1)I32(j1)g*(p^j*)pj*TΣj*g*(p^j*)pj* (14)

where with

Σk,j*=(pk,D(j)(1pk,D(j))Ik(j1)pk,D(j)pk,R(j)Ik(j1)pk,D(j)pk,R(j)Ik(j1)pk,R(j)(1pk,R(j))Ik(j1))

for and .

Analogous to the two-sample case, we denote as the test statistic in (13) with a typical set of weights for all and . Another set of intuitive weights is , which yields the corresponding test statistic:

Z3*(h)=j=1hp^1,D(j)p^2,R(j)p^2,D(j)p^1,R(j)+p^2,D(j)p^3,R(j)p^3,D(j)p^2,R(j) (15)

We can easily show that the test statistic in (13) enjoys the same asymptotic properties as in the two-sample test. It converges to a Brownian motion under the null hypothesis, and the sequential testing procedure mentioned in Section 2.3 can be readily adopted using (13) and (14). Therefore, the differences of the RTFRs among independent groups can be identified at the earliest possible time when enough statistical evidence accumulates.

4. Simulation study

A large-scale simulation is carried out to assess the finite-sample performance of the proposed two- and three-sample sequential tests. We assume that surveillance data are routinely reported while the exact death and discharge times are generally unknown. This mimics the real-world epidemiological data that only a summary of aggregated counts is available during the outbreak. We assume a 50-day observation period (i.e. ), which is divided into equal intervals and the daily number of inpatients is set to be . We consider different scenarios that imitate how the RTFRs change over time in practice based on the prespecification of the death and recovery probabilities and on day for group , respectively. The daily numbers of deaths and recoveries are then generated under the multinomial setting in (1). Based on the filtration on day , the test statistics and can be calculated and the sequential test can be conducted. The overall level of significance is set at , and the -spending function described in (8) is adopted throughout the simulation.

For each scenario, independent simulated data sets were generated. Under , various scenarios with equal RTFRs among subgroups were considered to evaluate the empirical rejection rates of the sequential test, and the results for two- and three-sample tests are summarized in Tables 1 and 3, respectively. We can see that the empirical sizes for both tests match closely with the nominal level of in all cases, suggesting that the proposed tests are empirically unbiased.

Table 1.

Simulation results for the empirical sizes of the proposed two-sample test under different scenarios when is true.

Size(%)
2

Table 2.

Simulation results for the empirical powers of the proposed two-sample test under different scenarios when is true.

Power(%)

Table 3.

Simulation results of the proposed three-sample test under different scenarios when is true.

Size

We consider 24 scenarios under the alternative hypothesis for the two-sample comparison. The results are summarized in Table 2, where in the last column represents the empirical average of , the day at which the null hypothesis is first rejected (among those with being rejected). In the first eight scenarios, the RTFRs of the two groups are different only between a specific interval or . The second eight scenarios correspond to the situation that the RTFRs of two groups remain the same at first, but the RTFR of group 2 drops suddenly at . One can see that the proposed test is reasonably powerful (with empirical powers over 95%) in detecting a sudden change in RTFR between groups. Also, the null hypothesis can be rejected within a short period of time, say 7 days, since a change has been imposed to the RTFR of group 2. Nevertheless, we may observe a relatively small power in some cases where the change in RTFR in group 2 is modest or small. For example, the scenario in the seventh row of Table 2 only attains a power of 50.80% due to a relatively small jump size in the recovery probabilities in group 2. When we increase the jump size from 0.01 to 0.02 (the next row in Table 2), the empirical power increases from 50.80% to 95.56%, and becomes closer to where a change occurs. The remaining 8 scenarios correspond to the situation that the RTFR of group 1 is uniformly higher than that of group 2, and the empirical powers are high in general. Table 4 demonstrates the good performance of the proposed three-sample test under . Specifically, the RTFR is always the highest in group 1 and the lowest in group 3 throughout the observation period. The empirical powers are close to 1 in all cases and the null hypothesis can be rejected quickly as soon as there is enough statistical evidence supporting the alternative hypothesis.

Table 4.

Simulation results of the three-sample sequential test under the alternative hypothesis when is true.

Size
28.2

In addition to the results reported in Tables 1 to 4, we have tried different sequences of daily number of inpatients in the simulation setup, such as and where , as well as small sample size with around 800. We also tried another weight function corresponding to the test statistics and in replacement of and . It is noted that the results obtained in Tables 1 to 4 are quite robust to these changes, hence those findings are not reported here. Moreover, when compared with the non-sequential test discussed in (7), the sequential test achieves the same level of power in all cases with no additional cost but it allows conclusion to be made at a much earlier time.

In addition, to assess the effect of the choice of , and hence the number of intervals (say, in days or weeks), on the performance of the proposed test, and days were also considered. For the cases with a sudden change in the RTFR under , the power and for different values of are virtually identical. For the cases with a gradual increase in the difference in RTFRs over time, it is natural to expect a higher power based on a larger value of as more statistical evidence would accumulate over time, but the difference is minimal. On the other hand, a larger value of also means that the significance level assigned to each interim analysis is smaller, which will also lead to a slight increase in . In practice, we suggest to set to be reasonably large to allow accumulation of more statistical evidence, at the expense of a slight delay in the decision if there is a difference.

5. Application

In December 2019, several cases of novel coronavirus infection, now known as COVID-19, were reported in Wuhan, Hubei province, China. Despite the implementation of strict lockdown in Wuhan on 23 January 2020,21 this virus had rapidly spread from the epicenter to different regions across China. By the end of February 2020, 79,394 cases including 2838 deaths were reported in mainland China.22 Thereof, 66,337 occurred in the Hubei province with a death toll of 2727, suggesting a CFR of 3.26% at first glance, which contrasts with 0.8% in other areas of mainland China. The study suggested that the accessibility level of health care resources may be the cause of the considerable gap in mortality among different areas.23 According to the level of medical resource availability during the outbreak, we partition mainland China into three clusters, namely Wuhan city, Hubei province excluding Wuhan city, and mainland China excluding Hubei province. The main objective of the analysis is to explore the difference in disease severity based on the RTFR among these clusters and to investigate the potential effects of medical resource availability (i.e. the numbers of doctors and hospital beds) on the fatality rate in China. The cumulative numbers of confirmed cases, deaths, and recoveries between 1 February and 31 March for each cluster were summarized and extracted from the public domain.24

A smoothed version of the RTFR estimator17 for the three separate clusters over the observation period is shown in Figure 1. We can see that there exist clear disparities in severity among areas in mainland China during the early phase of the COVID-19 epidemic. In this regard, we provide some explanatory notes to describe the observed pattern. As most of the cases were concentrated in Hubei province at the beginning of the outbreak, the hospitals and local health care systems were suddenly overwhelmed. Especially in Wuhan, many patients did not receive timely treatment, causing Wuhan to have the highest fatality rate, followed by the remaining cities in Hubei province. On the contrary, the lockdown measures implemented in Wuhan city delayed the epidemic growth in other provinces and provides valuable time for them to prepare. Therefore, the number of infections grew at a much slower rate compared with the supply of health care resources, contributing to a mitigating fatality rate in mainland China as a whole. To meet the shortage of medical resources in worst-hit areas, the Chinese government mobilized all the necessary resources nationwide to support virus control in Hubei and the city of Wuhan. Two new field hospitals, namely Huoshenshan and Leishenshan, were built in a few days and had been put into use in Wuhan in early February. Over 25,000 medical professionals from other provinces of China rushed to Wuhan for assistance as of 14 February.25 The remaining 16 cities in Hubei province also received one-to-one paired assistance from other provinces.26 In the meantime, a number of temporary hospitals, namely the Fangcang shelter hospitals, were constructed to provide enough beds to treat patients with mild to moderate symptoms. These temporary hospitals relieved the huge pressure on the health care system and allowed the designated hospitals to concentrate on treating patients with severe and critical conditions.27 As of 28 February 2020, Wuhan had established 16 temporary hospitals, and the demand for hospital beds in Hubei was met. While the RTFR of mainland China stabilized at a low level, the RTFRs of Wuhan and Hubei declined continuously owing to increasing hospital beds and sufficient medical resources. Eventually, the RTFRs for the three clusters reach a similar level by the end of February and remained relatively low throughout March.

Figure 1.

Figure 1.

The estimated real-time fatality rate for the outbreak of COVID-19 in three separate clusters.

The proposed method is applied to examine the difference in RTFRs among areas over time. By treating day as the unit, there are intervals in total with day being 1 February 2020. We conduct the two-sample test for , and the three-sample test for for some , respectively. The typical weight function is used and the overall significance level is set to be . Specifically, on day , we compared the test statistics and with their corresponding critical values , respectively. The sequential test is terminated at if the test statistic at exceeds the critical value . In line with the considerable gap in fatality rates among areas shown in Figure 1, the null hypothesis of equal RTFRs is rejected quickly on the seventh day (7 February) and fourth day (4 February) based on the two- and three-sample tests, respectively. We then conduct the same pair of tests for the period from 1 March to 1 April with , during which the medical resources availability is more or less the same across different clusters. As expected, both the two- and three-sample tests fail to reject the null hypothesis, which is further supported by the similar fatality rates in March 2020 among the three areas as displayed in Figure 1. This example shows that our proposed tests are sensitive in picking up changes in RTFRs, and it is useful to provide real-time signals at time to the health authority on whether the existing measures or medical resources are adequate in some areas to contain the epidemic.

6. Discussion

A statistical test is proposed in this paper to compare the RTFRs among independent groups during the course of an ongoing epidemic. As the implementation of an effective control measure can reduce disease severity and save more lives, our method can provide an evidence-based assessment of the effectiveness of the implemented intervention and inform the policy-making process during an emerging epidemic. The asymptotic Brownian motion of the test statistic under the null hypothesis allows one to adopt a sequential design naturally. Therefore, the null hypothesis of no difference in severity among subgroups can be rejected as soon as sufficient information has accumulated over time. This property is particularly useful during the emerging epidemic as the government officials can identify the effective control measures at the earliest time and issue the recommendation for disease control promptly. A large-scale simulation study shows the good performance of our proposed test in two- and three-sample cases in terms of unbiasedness and the sensitivity in picking up the difference in severity among groups.

The proposed statistical test is applied to the COVID-19 data in mainland China to examine the difference in severity among three separate clusters. The results suggest that the severity of COVID-19 in mainland China is possibly associated with the accessibility of local health care resources. This emphasizes that medical supplies and resources play an important role in lowering the RTFR. As many countries are now struggling with the COVID-19 outbreak, these findings may suggest on disease prevention and control worldwide. Especially for the resource-limited countries, they should at least slow down the surge of infections to avoid the local medical system being overwhelmed. The illustrated example demonstrates that our method is simple to use and is widely applicable to all emerging infectious diseases.

We have shown that the proposed two-sample test can be easily generalized to accommodate the -sample situation. Essentially, this enables us to deal with more clinical questions in practice. For example, investigating the discrepancy in RTFRs between multiple age groups could help to minimize the confounding effect and help us gain an in-depth understanding of the other factors that affect fatality. Most importantly, noting the discrepancy of fatality rates between different treatment arms in clinical trials helps the clinicians to identify the most effective treatment for curbing the disease. Take COVID-19 as an example, over hundreds of clinical trials have been registered worldwide on clinical trials registries so far aiming to evaluate the performance of some possible treatments.2830 Our proposed method can be one of the essential tools to evaluate the efficacy of different potential treatments, where superiority over other candidate treatments is indicated by a relatively improved fatality rate along the timeline.

The proposed tests have the advantage of using minimal information to gain timely assessment on the effectiveness of potential treatments or implemented measures based on a quantitative approach. During an outbreak of the emerging epidemic, the surveillance data are always incomplete, and individual data such as the time-at-infection, time-to-recovery, and time-to-death, are difficult to obtain. This is true especially for those areas with low public health awareness and with a poor health care system. For the ongoing COVID-19, the epidemiological data are hard to obtain, and, for most of the countries, only the cumulative counts on cases, death, and recovery are recorded. It is important for public health officials to make use of this simplest data structure to gain more insight into the disease so that prompt actions can be taken to suppress the disease fatality at the earliest possible time.

Footnotes

Declaration of conflicting interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

References

  • 1.Jones KE, Patel NG, Levy MAet al. Global trends in emerging infectious diseases. Nature 2008; 451: 990–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.World Health Organisation (WHO). Coronavirus disease (COVID-19) weekly epidemiological update and weekly operational update, 2020. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports (2020, accessed 10 December 2020).
  • 3.World Health Organisation (WHO). Estimating mortality from COVID-19, 2020. https://www.who.int/publications/i/item/WHO-2019-nCoV-Sci-Brief-Mortality-2020.1 (2020, accessed 18 November 2020).
  • 4.Ghani AC, Donnelly CA, Cox DRet al. Methods for estimating the case fatality ratio for a novel, emerging infectious disease. Am J Epidemiol 2005; 162: 479–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.World Health Organisation (WHO). Estimating mortality from COVID-19: Scientific brief, 4 August 2020. Technical report, 2020.
  • 6.Kucharski AJ, Edmunds WJ. Case fatality rate for Ebola virus disease in west Africa. The Lancet 2014; 384: 1260. [DOI] [PubMed] [Google Scholar]
  • 7.Rajgor DD, Lee MH, Archuleta Set al. The many estimates of the COVID-19 case fatality rate. Lancet Infect Dis 2020; 395: 1569–1578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mizumoto K, Saitoh M, Chowell Get al. Estimating the risk of Middle East respiratory syndrome (MERS) death during the course of the outbreak in the Republic of Korea, 2015. Int J Infect Dis 2015; 39: 7–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yip PSF, Lam KF, Lau EHYet al. A comparison study of realtime fatality rates: severe acute respiratory syndrome in Hong Kong, Singapore, Taiwan, Toronto and Beijing, China. J R Stat Soc A Stat 2005; 168: 233–243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lam KF, Deshpande JV, Lau EHYet al. A test for constant fatality rate of an emerging epidemic: with applications to severe acute respiratory syndrome in Hong Kong and Beijing. Biometrics 2008; 64: 869–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Reich NG, Lessler J, Cummings DAet al. Estimating absolute and relative case fatality ratios from infectious disease surveillance data. Biometrics 2012; 68: 598–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen Z, Akazawa K, Nakamura T. Estimating the case fatality rate using a constant cure-death hazard ratio. Lifetime Data Anal 2009; 15: 316–329. [DOI] [PubMed] [Google Scholar]
  • 13.Grein J, Ohmagari N, Shin Det al. Compassionate use of remdesivir for patients with severe COVID-19. New Engl J Med 2020; 382: 2327–2336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang Y, Zhang D, Du Get al. Remdesivir in adults with severe COVID-19: A randomised, double-blind, placebo-controlled, multicentre trial. The Lancet 2020; 395: 1569–1578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Beigel JH, Tomashek KM, Dodd LEet al. Remdesivir for the treatment of COVID-19 preliminary report. New Engl J Med 2020; 383: 1813–1826. [DOI] [PubMed] [Google Scholar]
  • 16.Andersen PK, Borgan O, Gill RDet al. Statistical models based on counting processes. New York: Springer Science & Business Media, 2012. [Google Scholar]
  • 17.Yip PSF, Lau EHY, Lam KFet al. A chain multinomial model for estimating the real-time fatality rate of a disease, with an application to severe acute respiratory syndrome. Am J Epidemiol 2005; 161: 700–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oehlert GW. A note on the delta method. Am Stat 1992; 46: 27–29. [Google Scholar]
  • 19.Gordon Lan KK, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70: 659–663. [Google Scholar]
  • 20.Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. London: Chapman & Hall, 2020. [Google Scholar]
  • 21.Lau H, Khosrawipour V, Kocbach Pet al. The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China. J Travel Med 2020; : –. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wu JT, Leung K, Bushman Met al. Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nat Med 2020; 26: 506–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ji Y, Ma Z, Peppelenbosch MPet al. Potential association between COVID-19 mortality and health-care resource availability. Lancet Glob Health 2020; 8: e480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Tencent. Tencent News, 2020. https://news.qq.com/zt2020/page/feiyan.htm\# (2019, accessed 12 January 2021).
  • 25.Jia J, Ding J, Liu Set al. Modeling the control of COVID-19: Impact of policy interventions and meteorological factors. Electron J Differ Equ 2020; 2020: 1–24. [Google Scholar]
  • 26.Chen T, Wang Y, Hua L. Pairing assistance the effective way to solve the breakdown of health services system caused by COVID-19 pandemic. Int J Equity Health 2020; 19: 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen S, Zhang Z, Yang Jet al. Fangcang shelter hospitals: a novel concept for responding to public health emergencies. The Lancet 2020; 395: 1305–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yao X, Ye F, Zhang Met al. In vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Clin Infect Dis 2020; 71: 732–739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cao B, Wang Y, Wen Det al. A trial of lopinavir–ritonavir in adults hospitalized with severe COVID-19. New Engl J Med 2020; : –. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gautret P, Lagier JC, Parola Pet al. Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial. Int J Antimicrob Agents 2020; : . [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES