Implementation of an Alternative Method for Assessing Competing Risks: Restricted Mean Time Lost

Hongji Wu; Hao Yuan; Zijing Yang; Yawen Hou; Zheng Chen

doi:10.1093/aje/kwab235

. 2021 Sep 22;191(1):163–172. doi: 10.1093/aje/kwab235

Implementation of an Alternative Method for Assessing Competing Risks: Restricted Mean Time Lost

Hongji Wu, Hao Yuan, Zijing Yang, Yawen Hou, Zheng Chen ^✉

PMCID: PMC9180943 PMID: 34550319

Abstract

In clinical and epidemiologic studies, hazard ratios are often applied to compare treatment effects between 2 groups for survival data. For competing-risks data, the corresponding quantities of interest are cause-specific hazard ratios and subdistribution hazard ratios. However, they both have some limitations related to model assumptions and clinical interpretation. Therefore, we recommend restricted mean time lost (RMTL) as an alternative measure that is easy to interpret in a competing-risks framework. Based on the difference in RMTL (RMTLd), we propose a new estimator, hypothetical test, and sample-size formula. Simulation results show that estimation of the RMTLd is accurate and that the RMTLd test has robust statistical performance (both type I error and statistical power). The results of 3 example analyses also verify the performance of the RMTLd test. From the perspectives of clinical interpretation, application conditions, and statistical performance, we recommend that the RMTLd be reported along with the hazard ratio in analyses of competing-risks data and that the RMTLd even be regarded as the primary outcome when the proportional hazards assumption fails.

Keywords: competing risks, hazard ratio, hypothesis testing, restricted mean time lost, sample size, survival analysis

Abbreviations:

cHR: cause-specific hazard ratio
CI: confidence interval
CIF: cumulative incidence function
COVID-19: coronavirus disease 2019
CSH: cause-specific hazard function
HR: hazard ratio
RMTL: restricted mean time lost
RMTLd: difference in restricted mean time lost
SDH: subdistribution hazard function
sHR: subdistribution hazard ratio

Clinical trials of treatments and preventative measures for coronavirus disease 2019 (COVID-19) have received global attention. In published and ongoing randomized trials for COVID-19 treatments, the time-to-event endpoint of interest, such as the time to clinical improvement (or recovery), has been the most commonly used primary outcome (1). The corresponding method used has been the Kaplan-Meier method, and the effect size has been the hazard ratio (HR). However, patients may die of COVID-19 before improvement (or recovery), so competing-risks problems occur (2); that is, the occurrence of the event of interest (improvement or recovery) may be precluded by a competing event (death). At this time, the commonly applied single-event survival analysis techniques may lead to biased results, with subjects who experience a competing event being censored (3, 4). Therefore, competing-risks analysis should be applied in such situations.

There are 2 widely used approaches to competing-risks analysis based on hazards (5). One is based on a cause-specific hazard function (CSH), which refers to the instantaneous rate of occurrence of a specific event among the individuals who are still event-free; its corresponding statistical test is the log-rank test, and the statistical measure—that is, the cause-specific hazard ratio (cHR)—can be estimated through a cause-specific Cox regression model. The other approach is the subdistribution hazard function (SDH), which refers to the instantaneous rate of the event of interest in subjects who have not yet experienced the given event. The statistical test is the Gray test, and the estimated effect of one group relative to another—that is, the subdistribution hazard ratio (sHR)—can be calculated using the Fine-Gray model. Meanwhile, the clinical or epidemiologic interests in this approach are characterized by the cumulative incidence function (CIF), the probability of one event of interest occurring by a particular time in the presence of other events, which reflects the risk of the cause of interest without ignoring the presence of other competing events.

In the clinical analysis of competing-risks data, the estimations and statistical tests based on the cHR and sHR still have some limitations. First, the HR (both the cHR and the sHR) should be described as a relative rate, not as a relative risk (6). Without the assumption of independence of competing events, the cHR cannot be linked to the comparison of CIFs for an event between 2 groups (7), which means that cHR > 1 does not necessarily imply CIF₁ > CIF₀; that is, even if the hazard due to a main cause in a control group is always higher than that in a treatment group, the risk of the main cause in the control group is not necessarily always higher than that in the treated group. Although the sHR can affect the comparison of CIFs—that is, sHR > 1 can indicate that CIF₁ > CIF₀ and vice versa—it reflects the relative change in the instantaneous rates of occurrence of a given type of event in subjects who have not yet experienced that event between 2 groups. Researchers may find it difficult to interpret the results when individuals who had a competing event are retained in the risk set (8). Second, both the cause-specific Cox model and the Fine-Gray model depend on an assumption of the proportionality of the CSH and the SDH; as a consequence, researchers in many published survival analyses report only a single cHR or sHR, which is an average of specific HRs at different time points. However, if the above assumption is violated, a single HR is difficult to interpret because the true HR varies over time. Third, because of the semiparametric nature of the 2 regression models, the “relative” hazard rates cHR and sHR are not interchangeable with the “absolute” hazard rate without baseline hazards, which may make their clinical interpretation difficult to conceptualize.

Considering the above limitations, especially the problem of clinical interpretation, some researchers recommended an alternative statistic (9–11): restricted mean time lost (RMTL). RMTL can be estimated as the area under the CIF curve up to a specified time point and interpreted as the mean amount of time lost due to a specific cause during a predefined time window. Thus, compared with that of HRs, the clinical interpretation of the RMTL, which is based on a time scale, can easily be understood by physicians and patients (12–14). The difference in RMTL (RMTLd) is used to qualify the treatment effect and is also directly associated with comparisons of CIFs.

Although Anderson (9) and Zhao et al. (10) introduced the concept of RMTL, neither of them discussed the corresponding estimation and hypothetical test based on the RMTLd. Lyu et al. (11) presented a statistical inference framework and sample-size estimator based on the RMTLd, but it seemed to be relatively conservative on the basis of simulations. Therefore, in this article, we introduce a new RMTLd-based statistical inference framework and sample-size formula and demonstrate its performance through simulation and illustrative examples.

METHODS

Without loss of generality, only 1 event of interest ( Inline graphic ) and 1 competing event () are assumed. T is defined as the observed time (time to event or censoring time).

Estimation of the RMTLd

The nonparametric estimation of the CIF is

where Inline graphic is the ith ordered event time, is the number of events of cause j that occur at time , is the number of subjects at risk at time , and is the event-free survival probability. Tau (τ) is the chosen time point, and τ ≤ T. For simplicity, we denote the RMTL of the event of interest to be Inline graphic ; then, the nonparametric estimation of is given by

which can be interpreted as the mean amount of time lost due to a specific cause within the τ year window. The variance in Inline graphic can be estimated based on the derivation of the martingale approximation (15) (for the detailed process, see Web Appendix 1, available at https://doi.org/10.1093/aje/kwab235):

Let Inline graphic be the RMTL of the event of interest in group ; then, denotes the estimated RMTL, and corresponds to the variance in . Then, the RMTLd between 2 groups is , and the corresponding variance is . In large samples, the confidence interval (CI) of the RMTLd is estimated as

where Inline graphic is the upper quantile of the standard normal distribution.

Hypothetical test

The null and alternative hypotheses of the RMTLd test are Inline graphic and , respectively. Under the null hypothesis , the RMTLd test statistic can be computed as

which asymptotically follows a standard normal distribution.

Sample size

Suppose Inline graphic and are the required sample sizes in the control group and the treatment group, respectively, and that is the ratio of sample sizes. Assume we test the null hypothesis with statistical power at a 2-sided significance level . Under alternative hypothesis , we then have

Hence, the total sample size (for the detailed derivation, see Web Appendix 2) is

Inline graphic is the inverse standard normal distribution function at probability , and the population variance of group k can be estimated as , where and can be obtained through a pilot study or previous study.

Simulation setup

In the simulation setup, we assessed the performance of the estimation of the RMTLd, the RMTLd test, and the RMTLd-based sample size under 6 different scenarios: 1) no difference between groups (Figure 1A); 2) a proportional SDH with sHR ≈ 0.905 (Figure 1B); 3) a proportional SDH with sHR ≈ 0.741 (Figure 1C); 4) an early difference between groups (Figure 1D); 5) a late difference with curves separated at t = 1 year (Figure 1E); and 6) a late difference with curves separated at t = 2 years (Figure 1F).

Scenarios considered in a simulation study comparing the statistical performance of the Gray test and the RMTLd test. A) No difference in the event of interest between groups; B) a proportional subdistribution hazard function with a small difference; C) a proportional subdistribution hazard function with a large difference; D) an early difference; E) a late difference with a large difference; F) a late difference with a small difference. RMTLd, difference in restricted mean time lost.

Let the type of interest and competing events be generated through the binomial distributions Inline graphic and , where N is defined as the sample size of each group and is the maximum cumulative incidence of events of interest, which is set to . The parameter settings of failure time () correspond to the event of interest and the competing event, respectively, under different situations (shown in Web Table 1), and the censoring times of the 2 groups are based on the uniform distributions Inline graphic and , respectively. Next, define the observed time and the event indicator . The censoring rates are required to be similar between the 2 groups and can be set at approximately 0%, 15%, 30%, or 45% by changing the settings of a and b. For the sample size, we consider both a balanced design (n₀ = n₁ = 300, 500, 1,000) and an unbalanced design (n₀ = 300, n₁ = 500; n₀ = 500, n₁ = 1,000). For all scenarios, a nominal level Inline graphic is applied, and the specific time point τ is selected as the minimum of the maximum follow-up time of the 2 groups (16). All simulations are performed using 10,000 replications.

To evaluate the performance of the RMTLd estimation, we determined the true RMTLd at Inline graphic = 4 years with a total sample size of n = 1,000,000 (n₀ = n₁ = 500,000) under the different scenarios. The true RMTLd’s between groups for the event of interest under the 6 scenarios shown in Figure 1 (scenarios A–F) are 0.00004, −0.3935, −0.5141, −0.2986, −0.3517, and −0.1729 years, respectively, over a period of 4 years. Then, according to the above settings, we sampled from this large sample to calculate the mean relative bias, the root mean squared error, the relative standard error, and the coverage of the RMTLd (17) to measure the performance of the estimation of the RMTLd.

Meanwhile, we compared the type I error and statistical power of the Gray test and proposed the RMTLd test to evaluate the performance of the RMTLd test. To evaluate the type I error rate, the CIFs of the events of interest and competing events were assumed to be Inline graphic and , respectively, so the failure time in both groups was generated from , given the event type , as shown in Figure 1A.

To assess the statistical power, we considered several situations (Figures 1B–1F). In the first situation, the proportional SDH assumption was met: Failure times were generated from the CIFs (18) Inline graphic and , where is the group indicator (Z = 0 and Z = 1 for the control group and the treatment group, respectively). Meanwhile, we considered 2 scenarios, sHR ≈ 0.905 and sHR ≈ 0.741, corresponding to Figure 1B and Figure 1C, respectively. In the second situation, the proportional SDH assumption was violated: Both the early difference (Figure 1D) and the late difference (Figures 1E and 1F) in the CIFs were considered. The failure time was generated on the basis of CIFs with piecewise Weibull distributions Inline graphic (where and are the scale parameter and the shape parameter, respectively): and (19). The specific parameter settings of all scenarios are presented in Web Table 1.

To evaluate the performance of the proposed sample-size estimation, we set Inline graphic and (the targeted power was 80%) and generated the necessary parameters by averaging over each simulation to calculate the RMTLd-based sample sizes under different situations (Figures 1B–1F). Next, we simulated the observed power of the Gray test and the RMTLd test based on the calculated sample sizes through 10,000 simulations.