Abstract
The proper analysis of composite endpoints consisting of both death and non-fatal events is an intriguing and sometimes contentious topic. The current practice of analyzing time to the first event often draws criticisms for ignoring the unequal importance between component events and for leaving recurrent-event data unused. Novel methods that address these limitations have recently been proposed. To compare the novel versus traditional approaches, we review three typical models for composite endpoints based on time to the first event, composite event process, and pairwise hierarchical comparisons. The pros and cons of these models are discussed with reference to the relevant regulatory guidelines, such as the recently released ICH-E9(R1) Addendum “Estimands and Sensitivity Analysis in Clinical Trials”. We also discuss the impact of censoring when the model assumptions are violated and explore sensitivity analysis strategies. Simulation studies are conducted to assess the performance of the reviewed methods under different settings. As demonstration, we use publicly available R-packages to analyze real data from a major cardiovascular trial.
Keywords: Clinical trials, Estimand, Recurrent event, Semi-competing risks, Time to first event, Win ratio
1. Introduction
Composite endpoints frequently arise in modern phase-III trials, especially for cardiovascular disease and cancer. These endpoints typically contain death and non-fatal events as components and are often used for the primary efficacy analysis. For example, many cardiovascular trials use as the primary endpoint major adverse cardiac events (MACE) including death and non-fatal events such as heart failure, myocardial infarction, and stroke. In cancer trials, death and tumor progression are often combined into the composite endpoint of progression-free survival.
A principal consideration for combining multiple types of outcome is to collect more events, which usually increase statistical power (Freemantle et al., 2003). Indeed, it used to be common for cardiovascular trials to focus on mortality as the singular endpoint. As cardiovascular mortality rate declines steadily over the past number of decades, however, unusually large cohorts would now be needed to adequately power a trial on mortality alone. To reduce costs, non-fatal cardiovascular events have been added to the primary endpoint in an effort to extract more information per patient. In addition to statistical efficiency, a composite endpoint also avoids the need for multiplicity adjustment – such adjustment often results in overly conservative tests. Finally, analysis of a properly synthesized univariate outcome produces a single measure of treatment effect, which is easier to interpret than multiple component-specific effect sizes.
Despite the many advantages of composite endpoints, the specific methods for their analysis are by no means straightforward. It is particularly challenging to deal with composite endpoints consisting of death and non-fatal events. For example, how should we handle the competing risk of death, i.e., the fact that death precludes the occurrence of future events? What is the most appropriate way to account for the different clinical importance between death and non-fatal events? What is more, the ICH-E9(R1) Addendum (ICH, 2017), recently released by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH), places heavy emphasis on estimand construction and sensitivity analysis in clinical trials. The implications of such general guidelines for composite endpoints in particular have not been thoroughly discussed or understood.
In this article, we review some commonly used statistical models for composite endpoints of death and non-fatal events and evaluate their pros and cons against the principles set forth in relevant regulatory guidelines. In Section 2, we formally define the composite endpoint, review the corresponding guidelines from international councils and regulatory agencies, and discuss the resulting challenges for statistical methodology. In Section 3, we provide detailed accounts for three typical models, namely, the proportional hazards (PH) model for time to the first event, the proportional mean (PM) model for the composite event process, and finally, the proportional win-fractions (PW) model for the hierarchical endpoint. For each model, we lay out the model structure, define the treatment effect estimand, discuss the model-generating mechanisms, and describe the estimation procedures. In keeping with the guidelines of the ICH-E9(R1) Addendum, we also explore alternative estimands and sensitivity analysis strategies when the model assumptions are violated. Numerical studies are conducted in Section 4 to compare the performance of the different methods using simulated and real data. Concluding remarks are offered in Section 5.
2. Definition, Guidelines, and Challenges
2.1. Composite Endpoint as Univariatized Outcome
Widely referenced by clinicians and statisticians alike, “composite endpoint” remains a loosely-defined term encompassing potentially anything that involves multiple outcomes. To fix ideas, however, it would be helpful to define the subject in unequivocal terms. In a recently published book, Rauch et al. (2018a) observed that “a composite endpoint is generally defined as an outcome combining several endpoints of interest within a single variable”. This definition shall be adopted here because it highlights a couple of key features of composite endpoints.
To begin with, a composite endpoint must draw on multiple outcomes (hence “ several endpoints of interest”), like the bivariate/multivariate survival and non-fatal event times. Equally important, it must combine the multiple outcomes into a “single variable” (or single process). As an example, the progression-free survival time qualifies as a composite endpoint because it combines the bivariate times to death and tumor progression into a univariate time to the first event. On the contrary, uncombined multivariate/semi-competing risks outcomes under joint (or componentwise marginal) models (see, e.g., Ghosh and Lin, 2000; Fine et al., 2001; Ghosh and Lin, 2002; Ye et al., 2007) shall not be classified as composite endpoints. By the above standards, a composite endpoint is fully characterized by the multiple outcomes it draws upon and the (many-to-one) function it uses to combine them. These defining elements, particularly the rule of combination, will be re-visited when we talk about specific methodologies for composite endpoints in Section 3.
2.2. Regulatory Guidelines
Many regulatory guidelines shed light on the construction, analysis, and reporting of composite endpoints. For example, the ICH-E9 “Statistical Principles for Clinical Trials” (ICH, 1998) states that “[t]here should generally be only one primary variable” and that “[i]f a single primary variable cannot be selected from multiple measurements associated with the primary objective, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable, using a predefined algorithm”. These recommendations are consistent with our definition of the composite endpoint as a univariatized outcome and our emphasis on the rule of combination as integral to the endpoint itself. The document also stresses the importance of pre-specifying the rule: “[t]he method of combining the multiple measurements should be specified in the protocol”. When it comes to statistical reporting, “an interpretation of the resulting scale should be provided in terms of the size of a clinically relevant benefit”.
The European Network for Health Technology Assessment (EUnetHTA) issued a guideline in 2015 entitled “Endpoints used for Relative Effectiveness Assessment – Composite Endpoints” (EUnetHTA, 2015). The guideline recommends combining outcomes “of similar clinical importance and sensitivity to intervention”, adding that “ [i]f adequate, mortality should however be included if it is likely to have a censoring effect on the observation of other components”. Moreover, “[a]ll components of a composite endpoint should be separately defined as secondary endpoints and reported with the results of the primary analysis”.
The U.S. Food and Drug Administration (FDA) has recently released a Guidance for Industry: “Multiple Endpoints in Clinical Trials” (FDA, 2017). It observes that “[c]omposite endpoints are often assessed as the time to first occurrence of any one of the components”, but adds that “it also may be possible to analyze total endpoint events”. For reporting, “[t]he treatment effect on the composite rate can be interpreted as characterizing the overall clinical effect when the individual events all have reasonably similar clinical importance”. Besides the primary analysis, “analyses of the components of the composite endpoint are important and can influence interpretation of the overall study results”, echoing the suggestion by the EUnetHTA guideline.
Last but not the least, the recently released ICH-E9(R1) Addendum “Estimands and Sensitivity Analysis in Clinical Trials” (ICH, 2017) has attracted much attention and become the subject of intense discussion (Akacha et al., 2017a,b; Ratitch et al., 2020). As the title suggests, the Addendum is centered around estimand construction for the treatment effect and sensitivity analysis against possible violations of assumptions. According to the Addendum, “an estimand defines in detail what needs to be estimated to address a specific scientific question of interest”. In the case of a composite endpoint, the “scientific question of interest” concerns the multiple outcomes in the background. Hence, to construct a meaningful estimand for a composite endpoint, we must first (1) define the rule of combining the multiple outcomes (e.g., by time to the first event), and then (2) specify the treatment effect measure on the univariatized variable (e.g., hazard ratio, restricted mean event-free survival time, etc.). On the other hand, censoring caused by study termination or (uninformative) patient dropout is not of scientific interest and thus should not play a role in the estimand. Indeed, the official training material for the Addendum states unequivocally that “missing data and loss-to-follow-up are irrelevant to the construction of estimands” (ICH, 2018). However, patient death should not be considered as censoring but as “intercurrent events”, that is, events “that occur after treatment initiation and either preclude observation of the variable or affect its interpretation”. Unlike censoring, intercurrent events are considered part of the scientific process and “need to be considered in the description of a treatment effect on a variable of interest”. One strategy to account for an intercurrent event such as death, according to the Addendum, is to incorporate it as a component of a composite endpoint (the “composite strategy”).
In sum, the following are some consensus recommendations for composite endpoints reached by all guidelines:
be pre-specified in the trial protocol;
consist of components of similar clinical importance;
treat death as intercurrent event rather than censoring;
provide a meaningful scale for the overall treatment effect;
be supplemented with component-wise secondary analysis.
Some of the above are straightforward to follow. For example, by regarding death as an intercurrent event, it is clear that we should only be concerned with the observable outcomes in the presence of death and not bother with any latent data with death hypothetically removed (see, e.g., Huang and Wang, 2004; Liu et al., 2004; Zeng and Lin, 2009). Other recommendations are more challenging to implement. For example, it is suggested that the composite endpoint include components of similar clinical importance. When mortality is included as a component, however, it is usually regarded more important than non-fatal events such as hospitalization. The traditional approach focuses on time to the first event and disregards the unequal importance among the components. Moreover, events that occur afterward are ignored. Recently, some novel methods have been proposed to better prioritize survival over non-fatal events and make fuller use of recurrent-event data. To apply these methods in real studies, it is important that we understand their assumptions, estimands, and statistical properties as well as their pros and cons with respect to the aforementioned regulatory guidelines.
3. Three Classes of Statistical Models
3.1. General Framework
We start by introducing the full and observed data. The full data refer to the multivariate outcomes absent censoring; the observed data refer to their censored versions. For the full data, let D denote the survival time and write ND(t) = I (D ≤ t), where I(·) is the indicator function. Let N(t) denote the number of (possibly recurrent) non-fatal events by time t. With death as an intercurrent event, the counting process N(·) is terminated at time D and so remains flat on [D, ∞). Use Y(t) ={ND(u), N(u) : 0 ≤ u ≤ t} to denote the event history data up to time t. The full outcome data can thus be denoted by Y ≔ Y (∞). In a randomized controlled trial, use Z = 1, 0 to indicate the treatment and control arms, respectively. To facilitate the reporting of treatment effect, it generally helps to build a model for Y against Z, with the effect size estimand as a built-in parameter. In this way, provided that the model holds true, the estimand is naturally free from influence by the censoring times, because the latter are not involved in the model specification. For all models considered in this paper, we can adjust for additional baseline covariates by including them in the predictor Z.
To introduce the observed data, let C denote the independent censoring time such that
| (1) |
Because death is a terminal event, the observed length of follow-up is X = D ∧ C, where a ∧ b = min(a,b). Hence, the outcome process is terminated at X, giving rise to the observed data {Y(X), X, Z}. Here we do not need a censoring indicator as traditionally needed to distinguish death from censoring since ND(·) is itself a component of Y. A random n-sample of {Y(X), X, Z} are denoted by
| (2) |
Recall that a composite endpoint is characterized not only by the underlying multivariate outcome (in this case Y), but also by the algorithm used to combine the components. As a result, the first step to describe a composite-endpoint model is to formally define the endpoint by articulating the rule of construction. In light of this, we will follow a three-step routine for each of the models considered below. First, we define the target composite endpoint by a univariate counting process formulated as a map of Y(t). Second, we introduce the model for against the treatment indicator Z. Third, we describe the estimation procedures based on the observed data (2) under the independent censoring assumption (1).
3.2. The Models
3.2.1. Proportional Hazards Model for Time to First Event
The traditional approach to composite endpoints is to analyze time to the first event using standard univariate techniques such as the Kaplan–Meier curve, log-rank test, and the Cox proportional hazards (PH) model, the last of which provides a simple and intuitive effect size estimand.
The counting process for the this endpoint can be written as
| (3) |
Let λk(t) denote the hazard function for time to the first event in the kth treatment arm (k = 1, 0), i.e., . The PH model specifies that the arm-specific hazard functions are proportional over time, i.e.,
| (4) |
where is the log-hazard ratio and τ is the maximum length of follow-up. Model (4) can be generated by the bivariate Lehmann model (Oakes, 2016)
| (5) |
where T is time to the first non-fatal event and H(·,·) is some baseline bivariate survival function. Indeed, if we denote , it is clear from (5) that , which means that (4) holds with λ0(t) = −d log {H(t,t)} / dt.
Given the observed data (2), the hazard ratio exp(β) can be easily estimated using the familiar partial likelihood, which can be formulated as an integral of counting-process martingale residuals (for details, see, Ch. 8 of Fleming and Harrington, 1991). Cumulative sums of the observed martingale processes can be used to check the proportionality assumption in (4) (Lin et al., 1993). If proportionality holds, these processes should contain nothing but random noises scattered around zero; a systematic pattern is indicative of deviation from proportionality.
The estimated hazard ratio serves as the effect size measure. It can be interpreted as the number of times a patient in the treatment is as likely to experience a first adverse event as in the control. In addition to estimation, we can also use a standardized as the test statistic for the between-arm difference, with its null distribution approximated by the normal distribution. The test is consistent, i.e., with rejection probability tending to one as n → ∞, against the alternative hypothesis that λ1 ≺ λ0. Here and in the sequel, f ≺ g means that f(t) ≤ g(t) for all t ∈[0, τ] with strict inequality for some t. Hence, we have a large probability of detecting the treatment effect if the risk for the first event in the treatment is always lower than or equal to that in the control, and strict lower at some points.
A major limitation of this type of analysis is that it treats death and non-fatal events as if on equal footing (Lubsen and Kirwan, 2002; Freemantle et al., 2003; Montori et al., 2005; Ferreira-González et al., 2007a,b). To reflect the greater importance of survival, various component-wise weighting schemes have been proposed for the time-to-first-event analysis. (Sampson et al., 2010; Bakal et al., 2015; Rauch et al., 2018b).
3.2.2. Proportional Mean Model for Composite Event Process
By focusing on the first event, the traditional approach leaves the later events untapped, which usually reduces statistical efficiency. The inadequate capture of the patient’s full experience also detracts from the clinical relevance of the endpoint. A natural way to address such limitations is to consider in its entirety the (recurrent) composite event process
| (6) |
Compared to in (3), counts all events, whether fatal or non-fatal. Let denote the mean frequency function for the total events in the kth arm. The proportional mean (PM) model (Mao and Lin, 2016) specifies that the arm-specific mean functions are proportional over time, i.e.,
| (7) |
Hence, exp(β) is the mean ratio for the number of events comparing the treatment to control. Model (6) can be generated by a terminated homogeneous Poisson process with proportional intensity structure (Mao and Lin, 2016). Specifically, suppose that N*(t) is a Poisson process with intensity λ0 exp(βZ) for some λ0 > 0. If , where τ* is some random stopping time (Fleming and Harrington, 1991), then it can be shown that (6) holds with λ0(t) = λ0E(t ∧ τ*) (Mao and Lin, 2016).
Given the observed data (2), the mean ratio exp(β) can be estimated by an estimating equation that mimics the partial likelihood score function. The competing risk of death is accounted for using inverse probability censoring weighting (IPCW) (Robins and Rotnitzky, 1992). To check the proportional mean assumption in (7), a martingale-type residual process can be utilized in similar ways to the martingale process in the PH model.
The estimated mean ratio can be interpreted as the number of times a patient in the treatment experiences as many adverse events than in the control. If we use a standardized to test the treatment effect, the test is consistent against μ1 ≺ λ0, that is, the average cumulative frequency of total events in the treatment is always lower than or equal to that in the control, and strict lower at some points.
Like the time-to-first-event analysis, we can attach different weights to the fatal and non-fatal events in to reflect their perceived degrees of severity. Some details about weight construction can be found in Mao and Lin (2016).
3.2.3. Proportional Win-Fractions Model for Hierarchical Endpoint
A novel approach that prioritizes death over non-fatal events without artificial weighting was proposed by Pocock et al. (2012) and has since been gaining momentum among clinicians and statisticians alike (Montgomery et al., 2014; Abdalla et al., 2016; Wang et al., 2017; Timsit et al., 2017; Fergusson et al., 2018). The win ratio, unlike other methods, derives its estimator based on pairwise comparison between the two arms. For each pair of patients, a winner is decided upon first by the order of their survival times and, if that cannot be determined due to censoring, then by the order of their first non-fatal event times. This sequential process thus prioritizes survival over non-fatal events in a hierarchical way. The win ratio is defined as the fraction of wins over the fraction of losses comparing the treatment to control.
The statistical properties of the win ratio have been studied extensively (see, e.g., Luo et al., 2015; Bebu and Lachin, 2015; Oakes, 2016; Luo et al., 2017; Dong et al., 2018; Mao, 2019; Follmann et al., 2020; Dong et al., 2020a). A drawback of the win ratio is that its estimand generally depends on the censoring distribution (Rauch et al., 2014; Bebu and Lachin, 2015; Dong et al., 2020b). Recently, a proportional win-fractions (PW) model (Mao and Wang, 2020) has been developed under which the win ratio estimand is invariant to the censoring distribution.
First, we define the composite endpoint for the PW model. Unlike in the PH and PM models, the endpoint is based on a pair of subjects. Let denote the indicator for Yi winning against Yj by time t. By the construction of Pocock et al. (2012), this indicator is defined by
| (8) |
where Ti and Tj are times to the first non-fatal events for subjects i and j, respectively. (The first term on the right hand side of (8) indicates a win by survival and the second a win by non-fatal event.) For inference, we target the win-loss functions
That is, ω1(t) and ω0(t) are the fractions of wins and losses, respectively, comparing the treatment to the control at time t.
Now, the proportional win-fractions (PW) model specifies that the loss and win fractions are proportional over time, i.e.,
| (9) |
Hence, exp(β) is the “loss ratio” (the inverse of the win ratio) between the treatment and the control (regardless of follow-up time).
Interestingly, the PW model can be generated by the same Lehmann model (5) with loss ratio equal to the hazard ratio on time to the first event (Oakes, 2016). As a special case, the Gumbel–Hougaard copula model has been frequently used to generate time-invariant win/loss ratio:
| (10) |
where ΛD and Λ are marginal cumulative hazard functions for D and T, respectively, and α ≥ 1 controls the degree of correlation between the two components.
Given the observed data (2), the loss ratio exp(β) can be estimated using estimating equations in the form of a covariate-specific weight process integrated by a pairwise martingale-type residual process. The details can be found in Mao and Wang (2020). The estimated win ratio can be interpreted as the number of times a patient in the treatment is as likely to produce a “worse” outcome as compared to the control (accounting for the relative importance of component events).
The hypothesis test based on a standardized is consistent against the alternative hypothesis that ω0 ≺ ω1, that is, the probability of loss by the treatment is always smaller than or equal to that of win, and strictly smaller at some points. Simple sufficient conditions for ω0 ≺ ω1 in terms of the bivariate outcome distributions are given by Luo et al. (2015) and Mao (2019).
3.3. Comparisons, Recommendations, and Caveats
A summary of the comparisons between the PH, PM, and PW models is provided in Table 1. In particular, the R-packages that implement these models are openly available at the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/).
Table 1.
A summary of the PH, PM, and PW models for composite endpoints of ND(·) and N(·).
| Rule of univariatization | |
| PH | |
| PM | |
| PW | . |
| Modeling target (k = 1, 0) | |
| PH | Hazard function: |
| PM | Mean function: |
| PW | Win-loss function: |
| Model assumption | |
| PH | λ1(t) / λ0 (t) ≡ exp(β) |
| PM | μ1 (t) / μ0 (t) ≡ exp(β) |
| PW | ω0 (t) / ω1 (t) ≡ exp(β) |
| Estimand & Interpretation | |
| PH | Hazard ratio: Treatment exp(β) times as likely to experience first event |
| PM | Mean ratio: Treatment experiencing exp(β) times as many events |
| PW | Loss ratio: Treatment exp(β) times as likely to produce worse outcome |
| adjusting for priority of death over non-fatal events | |
| Accounting for semi-competing risks | |
| PH | Not needed |
| PM | Inverse probability censoring weighting |
| PW | Not needed if death is hierarchically prioritized |
| R-function | |
| PH | survival::coxph |
| PM | Wcompo::CompoML |
| PW | WR::pwreg |
The three methods have their pros and cons. A major advantage of the PH model is its familiarity to applied researchers. From a statistical point of view, however, efficiency may suffer as a result of its exclusive focus on the first event. Hence, the PH model is more appealing when recurrent events are few (e.g., tumor progression and death in cancer trials) so that there is not much to lose by simply picking on the first one. In contrast, the PM model utilizes all events that occur over the course of the study and may thus be better suited to chronic disease trials with many recurrent events (e.g., cardiovascular hospitalizations in O’Connor et al. (2009)). As warned by Anker and McMurray (2012), however, analysis of total events may prove unrobust against outlying patients who experience usually large numbers of events. In such cases, descriptive and residual analyses will help to identify influential data points for further examination. Finally, the PW model (or the win ratio approach in general) is able to prioritize death over the non-fatal events in a natural way. It is thus most attractive when mortality is considered as the main outcome with non-fatal events only as auxiliaries. As suggested by the various guidelines, all methods need to be supplemented by component-wise secondary analyses for better explanation of the overall treatment effect.
3.4. Model Critiques and Sensitivity Analysis
A common feature of the three models considered in Section 3.2 is that they all rest on certain “proportionality” assumption. Such assumption is of a global kind that brings together outcomes censored at different times to produce an overall, time-invariant effect size, whether it be hazard ratio, mean ratio, or loss ratio. We have illustrated some model-generating mechanisms that guarantee proportionality for each model. These mechanisms are plausible but somewhat restrictive, thus deserving a closer examination.
Both the PH and PW models can be generated by the Lehmann model (5), which implies marginal proportional hazards models for death and the first non-fatal event with the same hazard ratio. Outside (5), the PH may still hold, but under somewhat stringent conditions. For instance, let λkD(t) and λkH(t) denote the cause-specific hazard functions for D and T in the kth arm (k = 1, 0). Then, the time-dependent hazard ratio on time to the first event is
It is clear from the right hand side that the component-wise (cause-specific) hazard ratios need not be constant or identical in order for the PH model to hold. As a simple example, if the cause-specific hazard functions are all proportional to one another, then the PH model is always satisfied even if λ1D(t) / λ0D(t) ≠ λ1H(t) / λ0H(t).
Compared to the PH model, it might be even harder for the PW to hold with differential component-wise treatment effects. Indeed, since death as a competing risk is prioritized over the non-fatal events, it plays a bigger role as follow-up progresses. Thus, for the loss ratio to be constant, the time-dependent component-wise treatment effects must change in a way that exactly offsets the temporal change in the relative contribution of the corresponding components. A numeric example for such “incidental” PW models is provided in the online Supporting Information of Mao and Wang (2020).
As we have seen, the model-generating mechanisms for the PH and PW models are closely related. However, outside the the Lehmann model which forces equality of component-wise hazard ratios, the two models rarely coincide due to their different prioritization schemes. On the other hand, the PM model targets event frequency rather than event risk, which makes it unlikely to hold simultaneously with either PH or PW models, except under the null.
When the corresponding proportionality assumption is violated, the hazard ratio, mean ratio, and loss ratio estimands generally take the form of weighted averages of the corresponding time-dependent ratios, with the weights determined by the censoring distribution (Struthers and Kalbfleisch, 1986; Dong et al., 2020b). Take the loss ratio for example. Let G(t) denote the cumulative distribution function of C1 ∧ C0, where Ck is a generic censoring time in the kth arm (k = 1, 0). Oakes (2016) showed that, in general, the loss ratio estimand is
Similar expressions hold true for the PH and PM models. For deeper discussions about the impact of censoring on the win/loss ratio estimands, see Dong et al. (2020b) and Mao and Wang (2020).
In light of the ICH-E9(R1) Addendum, sensitivity analysis should be performed using alternative estimands when the proportionality assumption is in doubt. A natural substitute for the global hazard, mean, and loss ratios is their local versions, that is,
at some pre-specified time t0 > 0. To estimate the local hazard ratio , one can estimate the λk(t0) under parametric models such as the Weibull distribution (Rauch et al., 2018b). The local mean ratio can be estimated using nonparametric estimates of the μk(t0) by the Ghosh and Lin (2000) type of methods. Finally, using the results in Oakes (2016), the local (or “curtailed”) loss ratio can be estimated based on (nonparametric or model-based) estimates of the outcome distribution, i.e., the “integral approach” according to Dong et al. (2020b).
4. Numerical Studies
4.1. Simulations
We conducted simulation studies to compare the performances of the PH, PM, and PW models in different scenarios. Because of the generally incompatible modeling assumptions, it is impossible to compare all three methods in the estimation of treatment effects across the board. Nonetheless, we can still compare the PH and PW under the Lehmann model (5), which lies in the intersection of the two models. Let ɅD(t) = 0.2t, Ʌ(t) = 2t, and α = 2 in the special case of the Gumbel–Hougaard copula (10). This leads to exponential marginal distributions for death and the first non-fatal event with Kendall’s concordance coefficient 0.5 (Oakes, 1989). Let C ~ min {Expn(0.2), Unif[1, 4]}. Under this set-up, the death rate is about 18% and the non-fatal event rate about 75%.
Under log hazard/loss ratios β = 0,−0.2, and −0.5, we assessed the estimation of β by the PH and PW models. The results are summarized in Table 2. Both methods provide consistent estimates of the treatment effect, with largely accurate confidence intervals. However, it can be seen that the PH model is slightly more efficient (i.e., with smaller standard errors) than the PW model. This is not surprising since the partial-likelihood estimator is known to be efficient under PH models (Fleming and Harrington, 1991).
Table 2.
Simulation results on the estimation of hazard/loss ratios by PH and PW models.
| PH | PW | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| n | β | EST | SE | SEE | CP | EST | SE | SEE | CP |
| 200 | 0 | 0.009 | 0.149 | 0.151 | 0.957 | 0.011 | 0.165 | 0.168 | 0.952 |
| −0.2 | −0.206 | 0.155 | 0.160 | 0.951 | −0.202 | 0.173 | 0.171 | 0.955 | |
| −0.5 | −0.503 | 0.162 | 0.163 | 0.948 | 0.504 | 0.175 | 0.177 | 0.953 | |
| 500 | 0 | −0.002 | 0.098 | 0.096 | 0.947 | 0.003 | 0.104 | 0.107 | 0.949 |
| −0.2 | −0.205 | 0.096 | 0.101 | 0.954 | −0.203 | 0.107 | 0.109 | 0.956 | |
| −0.5 | −0.496 | 0.102 | 0.104 | 0.949 | −0.501 | 0.110 | 0.112 | 0.951 | |
Note: EST and SE are the mean and standard error of the parameter estimator; SEE is the mean of the standard error estimator; CP is the coverage probability of the 95% confidence interval. Each scenario is based on 2,000 replicates.
Next, we assessed how the PH and PW estimators behave when the corresponding model assumptions do not hold. We first considered a scenario where the PH model is satisfied while the PW model is not. The set-up is similar to the first set of simulations except that we allowed for differential treatment effects on D and T. Specifically, we used the joint model pr(D > s,T > t | Z) = exp(−[{exp(βDZ)s}2 + {2exp(βHZ)t}2]1/2) and set (βD, βH)T = (0, 0), (−0.5, −0.25) and (−0.25, −0.5). This leads to a PH model for time to the first event with constant log-hazard ratio β= 0, −0.29, and −0.44, respectively. The effect size for the PW model is ill defined because the loss ratio varies over time. The censoring distribution was generated the same way as in the first set of simulations. In the second scenario, we used the set-up described in Section S1.1.2 of Mao and Wang (2020). Basically, the treatment has a time-varying effect on D but has no effect on T. The end result is a constant loss ratio but a time-varying hazard ratio on time to the first event. We set the log-loss win ratio β = 0, −0.25, and −0.5 and generated the censoring time from Unif[0, 1].
The simulation results with n = 200 are summarized in Table 3. Under true PH models, the PH estimator provides valid inference on the target effect size while the PW estimator performs poorly. Due to the prioritization of death, the bias of the PW estimator has differential directions depending on the relative scale of the component-wise effect sizes – overestimation if the effect on death outweighs that on the non-fatal event and under estimation if vice versa. It is thus not surprising that uder (βD, βH)T = (−0.5, −0.25) (log-hazard ratio = −0.29), the PW overestimates the effect size, resulting in a more significant test than the PH model, while the opposite is true under (βD, βH)T = (−0.25, −0.5) (log-hazard ratio = −0.44). In the scenario with true PW models, the PW estimator provides valid inference on the log-loss ratio. The PH estimator in this case always underestimates the effect size because the treatment effect is driven solely by death, which is de-emphasized by time to the first event analysis as compared to the PW analysis.
Table 3.
Simulation results on the estimation of hazard/loss ratios by PH and PW models under possibly violated assumptions.
| PH | PW | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| True | β | EST | SE | SEE | p-value | EST | SE | SEE | p-value |
| PH | 0 | −0.003 | 0.159 | 0.156 | 0.487 | 0.007 | 0.170 | 0.170 | 0.511 |
| −0.29 | −0.284 | 0.152 | 0.153 | 0.173 | −0.434 | 0.176 | 0.174 | 0.035 | |
| −0.44 | −0.444 | 0.157 | 0.155 | 0.047 | −0.294 | 0.174 | 0.173 | 0.216 | |
| PW | 0 | 0.011 | 0.181 | 0.177 | 0.498 | 0.004 | 0.204 | 0.207 | 0.500 |
| −0.25 | −0.176 | 0.192 | 0.183 | 0.384 | −0.257 | 0.214 | 0.216 | 0.230 | |
| −0.50 | −0.347 | 0.189 | 0.191 | 0.174 | −0.505 | 0.222 | 0.228 | 0.044 | |
Note: See note to Table 2. β denotes the log-hazard ratio under true PH models and log-loss ratio under true PW models; p-value denotes the average p-value for testing the null. Each scenario is based on 2,000 replicates.
Finally, we compared the power of the three methods in hypothesis testing, particularly in relation to the frequency of recurrent events. Denote the gap times of the non-fatal events by U1, U2, …, UJ, where J is the maximum number of events. We generated the outcomes under a Gamma frailty joint model for (D,U1,…,UJ) with conditionally independent components. Let D | ξ ~ Expn{ξexp(−γZ)λD} and Uj | ξ ~ Expn{ξexp(−γZ)λH}, where we set λD = 0.1, λH = 2, ξ ~ Gamma(σ−2, σ−2) with σ2 = 0.1. The parameter γ is the common log-hazard ratio for the survival time and recurrent event gap times comparing the control to the treatment. The power of the tests should hence increase with the magnitude of γ. The parameter σ2 is the variance of ξ and thus controls the degree of correlation among the components. The censoring distribution was the same as in the first set of simulations.
We considered two scenarios with J = 2 and 8 and evaluated the power of the tests associated with the three models as a function of γ under n = 100. The results are summarized in Table 4. The performances of the PH and PW are similar across the two scenarios. The PM is considerably less powerful than PH and PW for J = 2 but is comparable to or more powerful than the latter two for J = 8. These results suggest that the number of recurrent events plays an important role in the statistical efficiency of the PM, but not of the PH or PW (since they draw upon only the first non-fatal event).
Table 4.
Empirical type I error and power by PH, PM, and PW methods.
| J = 2 | J = 8 | |||||
|---|---|---|---|---|---|---|
| γ | PH | PM | PW | PH | PM | PW |
| 0 | 0.057 | 0.053 | 0.061 | 0.051 | 0.048 | 0.045 |
| 0.2 | 0.135 | 0.124 | 0.147 | 0.141 | 0.214 | 0.145 |
| 0.5 | 0.572 | 0.437 | 0.579 | 0.579 | 0.664 | 0.569 |
| 0.7 | 0.874 | 0.736 | 0.866 | 0.865 | 0.871 | 0.861 |
| 0.9 | 0.978 | 0.865 | 0.979 | 0.965 | 0.960 | 0.972 |
Each scenario is based on 2,000 replicates.
4.2. A Real Example
Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION) was conducted on a total of 2,331 medically stable outpatients with heart failure and reduced ejection fraction recruited over 4/2003 – 02/2007 at 82 centers in the USA, Canada, and France (O’Connor et al., 2009). The objective of the trial is to evaluate the efficacy and safety of exercise training among heart failure patients. The cohort was randomized to usual care alone (control) or usual care plus aerobic exercise training (treatment) that consisted of 36 supervised sessions. The primary composite endpoint was a composite of all-cause death and all-cause hospitalization (non-fatal event).
To illustrate the methods described in Section 3, we consider a subset of the study data consisting of 451 non-ischemic patients. The data can be accessed in the non_ischemic dataset in the WR R-package downloadable from CRAN. Below are the first 10 records of the dataset. The variable ID is a unique identifier for the patient, time is the event time in month, status indicates the event type: 0 for censoring, 1 for death, and 2 for hospitalization, and Training is an indicator variable for treatment. Some descriptive statistics on the dataset are tabulated by arm in Table 5. The treatment group appears to have appreciably lower rates for both death and hospitalization.
Table 5.
Descriptive statistics for the non-ischemic subgroup in the HF-ACTION study.
| Exericise training | Usual care | |
|---|---|---|
| N | 220 | 231 |
| Median follow-up | 32 months | 31 months |
| Death rate | 0.036 per year | 0.064 per year |
| Hospitalization rate | 0.234 per year | 0.274 per year |
| ID | time | status | Training |
|---|---|---|---|
| 1 | 1 7.2459016 | 2 | 0 |
| 2 | 1 12.5573770 | 0 | 0 |
| 3 | 2 0.7540984 | 2 | 0 |
| 4 | 2 45.9016393 | 0 | 0 |
| 5 | 5 0.2295082 | 2 | 0 |
| 6 | 5 0.3278689 | 1 | 0 |
| 7 | 6 47.4754098 | 2 | 0 |
| 8 | 6 47.9016393 | 0 | 0 |
| 9 | 7 32.5901639 | 0 | 1 |
| 10 | 8 0.9836066 | 2 | 0 |
Now, we analyze the data using the PH, PM, and PW models. For the PH model on time to the first event, the partial-likelihood estimate for the hazard ratio is 0.80, with 95% confidence interval (0.64,1.01). That is, patients in exercise training are 0.80 times as likely to experience a first event as those in usual care. The model-based estimates for the event-free survival functions are plotted in the left panel of Figure 1.
Fig. 1.

Left, PH-model-based event-free survival rates by arm; right, PM-model-based mean frequency functions of total events by arm.
To fit a PM model, we use the CompoML function from the WCE R-package with the following code:
obj.PM=CompoML(id, time, status, Training, c(1, 1)) #unweighted obj.PM
where the last argument specifies that death and non-fatal event are weighted by a 1:1 ratio. The output is as follows:
Point and interval estimates for mean ratios:
Mean Ratio 95% lower CL 95% higher CL
Training 0.8298503 0.7149027 0.963280
That is, patients in treatment experience 0.83 times as many composite events as those in control, with 95% confidence interval (0.71,0.96). The model-based estimates for the mean functions by arm are plotted in the right panel of Figure 1.
Finally, we fit a PW model using the pwreg function from the WR R-package. We use the following code:
obj.PW=pwreg (time=time, status=status, Z=Training, ID=id) obj.PW
The output is as follows:
Point and interval estimates for win ratios:
Win Ratio 95% lower CL 95% higher CL
Training 1.233259 0.9719297 1.564854
That is, patients in treatment are 1/1.23 = 0.81 times as likely to have an overall less favorable outcome (adjusting for the priority of death over hospitalization) as compared to those in control, with 95% confidence interval (0.64,1.03).
Therefore, all three models suggest that exercise training improves the cardiovascular outcomes for non-ischemic heart failure patients compared to usual care alone. Judging from the 95% confidence intervals of their corresponding estimands, the treatment effect is significant at the 0.05 level in the PM model and is only borderline significant in the PH and PW models.
5. Concluding Remarks
We have reviewed three common statistical models for composite endpoints of death and non-fatal events and discussed their pros and cons in relation to the relevant regulatory guidelines. All three models entail scientifically meaningful estimands for the treatment effect (provided that the model assumptions hold) and are implemented in easy-to-use R-packages. The proportional hazards (PH) model for time to the first event is the most commonly used approach but suffers from underutilization of outcome data and from lack of differentiation between death and non-fatal events. The proportional mean (PM) model captures the entire outcome process but has limited ability to contain the influence of frequent non-fatal events. The proportional win-fractions (PW) model prioritizes death over non-fatal events in a natural, hierarchical way. It is most suitable when death is considered the main outcome, though it may be less powerful when the treatment effect on death is relatively small.
The methodologies reviewed in this article are representative but by no means exhaustive of the vast statistical literature on composite endpoints. In keeping with the emphasis of the ICH-E9(R1) Addendum on scientifically-driven estimands, we have omitted methods that focus on hypothesis testing (e.g., Claggett et al. (2018)). Besides model-based global estimands, one can choose to make inference on local estimands as well (see Section 3.4), e.g., 5-year progression-free survival rate, restricted mean survival time (Royston and Parmar, 2013), and curtailed win ratio (Oakes, 2016; Finkelstein and Schoenfeld, 2019). Furthermore, all models considered here are multiplicative in nature; additive models, with equally interpretable estimands, may sometimes fit the data better (Lin et al., 1998; Sun et al., 2019).
For simplicity, we have assumed that all components of the composite endpoint are subject to the same censoring mechanism. In practice, death and non-fatal events can be censored at different times. For example, while investigators can no longer collect non-fatal event data after the patient drops out of the study, it is sometimes feasible (and ethically permissible) to retrieve his/her mortality information from public records. A recent paper by Diao et al. (2018) discussed semiparametric regression of composite endpoints under such component-specific censoring mechanisms. Finally, the inchoate PW model can be further developed to make fuller use of the outcome data. For example, the rule of pairwise comparison can be modified to take into consideration the frequency as well as the timings of recurrent non-fatal events. Extensions along such lines are expected to appear in the statistical literature in the near future.
Acknowledgment
This research was supported by the National Institutes of Health grant R01 HL149875.
References
- Abdalla S, Montez-Rath ME, Parfrey PS, and Chertow GM (2016), “The Win Ratio Approach to Analyzing Composite Outcomes: An Application to the EVOLVE Trial,” Contemporary Clinical Trials, 48, 119–124. [DOI] [PubMed] [Google Scholar]
- Akacha M, Bretz F, Ohlssen D, Rosenkranz G, and Schmidli H (2017a), “ Estimands and their role in clinical trials,” Statistics in Biopharmaceutical Research, 9(3), 268–271. [Google Scholar]
- Akacha M, Bretz F, and Ruberg S (2017b), “Estimands in clinical trials–broadening the perspective,” Statistics in medicine, 36(1), 5–19. [DOI] [PubMed] [Google Scholar]
- Anker SD, and McMurray JV (2012), “Time to move on from “time-to-first”: should all events be included in the analysis of clinical trials,” European Heart Journal, 33, 2764–2765. [DOI] [PubMed] [Google Scholar]
- Bakal JA, Westerhout CM, and Armstrong PW (2015), “Impact of Weighted Composite Compared to Traditional Composite Endpoints for the Design of Randomized Controlled Trials,” Statistical Methods in Medical Research, 24, 980–988. [DOI] [PubMed] [Google Scholar]
- Bebu I, and Lachin JM (2015), “Large Sample Inference for a Win Ratio Analysis of a Composite Outcome Based on Prioritized Components,” Biostatistics, 17, 178–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Claggett B, Tian L, Fu H, Solomon SD, and Wei L-J (2018), “Quantifying the Totality of Treatment Effect With Multiple Event-Time Observations in the Presence of a Terminal Event from a Comparative Clinical Study,” Statistics in Medicine, 37, 3589–3598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diao G, Zeng D, Ke C, Ma H, Jiang Q, and Ibrahim JG (2018), “ Semiparametric regression analysis for composite endpoints subject to componentwise censoring,” Biometrika, 105, 403–418. [Google Scholar]
- Dong G, Hoaglin DC, Qiu J, Matsouaka RA, Chang Y-W, Wang J, and Vandemeulebroecke M (2020a), “The win ratio: on interpretation and handling of ties,” Statistics in Biopharmaceutical Research, 12, 99–106. [Google Scholar]
- Dong G, Huang B, Chang Y-W, Seifu Y, Song J, and Hoaglin DC (2020b), “ The win ratio: Impact of censoring and follow-up time and use with nonproportional hazards,” Pharmaceutical Statistics, 19, 168–177. [DOI] [PubMed] [Google Scholar]
- Dong G, Qiu J, Wang D, and Vandemeulebroecke M (2018), “The Stratified Win Ratio,” Pharmaceutical Statistics, 28, 778–796. [DOI] [PubMed] [Google Scholar]
- EUnetHTA (2015), “Endpoints used for relative effectiveness assessment: Composite endpoints,”, Online. [Google Scholar]
- FDA (2017), “Guidance for industry - Multiple endpoints in clinical trials,”. Online; accessed Nov 29, 2019. [Google Scholar]
- Fergusson NA, Ramsay T, Chassé M, English SW, and Knoll GA (2018), “The Win Ratio Approach Did Not Alter Study Conclusions and May Mitigate Concerns Regarding Unequal Composite End Points in Kidney Transplant Trials,” Journal of Clinical Epidemiology, 98, 9–15. [DOI] [PubMed] [Google Scholar]
- Ferreira-González I, Permanyer-Miralda G, Busse J, Bryant DM, Montori VM, Alonso-Coello P, Walter SD, and Guyatt GH (2007a), “Methodologic Discussions for Using and Interpreting Composite Endpoints Are Limited, but Still Identify Major Concerns,” Journal of Clinical Epidemiology, 60, 651–657. [DOI] [PubMed] [Google Scholar]
- Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, Busse JW, Heels-Ansdell D, Montori VM, Akl EA, Bryant DM, Alonso-Coello P, Alonso J et al. (2007b), “Problems With Use of Composite End Points in Cardiovascular Trials: Systematic Review of Randomised Controlled Trials,” British Medical Journal, 334, 786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fine JP, Jiang H, and Chappell R (2001), “On Semi-Competing Risks Data,” Biometrika, 88, 907–919. [Google Scholar]
- Finkelstein DM, and Schoenfeld DA (2019), “Graphing the Win Ratio and Its Components Over Time,” Statistics in Medicine, 38, 53–61. [DOI] [PubMed] [Google Scholar]
- Fleming TR, and Harrington DP (1991), Counting Processes and Survival Analysis, Hoboken, NJ: John Wiley & Sons. [Google Scholar]
- Follmann D, Fay MP, Hamasaki T, and Evans S (2020), “Analysis of ordered composite endpoints,” Statistics in Medicine, 39, 602–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freemantle N, Calvert M, Wood J, Eastaugh J, and Griffin C (2003), “Composite Outcomes in Randomized Trials: Greater Precision but With Greater Uncertainty,” Journal of the American Medical Association, 289, 2554–2559. [DOI] [PubMed] [Google Scholar]
- Ghosh D, and Lin D (2000), “Nonparametric analysis of recurrent events and death,” Biometrics, 56, 554–562. [DOI] [PubMed] [Google Scholar]
- Ghosh D, and Lin DY (2002), “Marginal regression models for recurrent and terminal events,” Statistica Sinica, 56, 663–688. [Google Scholar]
- Huang C-Y, and Wang M-C (2004), “Joint modeling and estimation for recurrent event processes and failure time data,” Journal of the American Statistical Association, 99, 1153–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ICH (1998), “Statistical Principles for Clinical Trials,”, London: European Medicines Evaluation Agency. Online; accessed Nov 29, 2019. [Google Scholar]
- ICH (2017), “ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials, Step 2b,”, London: European Medicines Evaluation Agency. Online; accessed Nov 29, 2019. [Google Scholar]
- ICH (2018), “The ICH E9(R1) Step 2 Training Material,”, https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/E9-R1EWG_Modules1-3_Step2_COMPILATION_TrainingMaterial_2018_0703.pdf.
- Lin D, Oakes D, and Ying Z (1998), “Additive hazards regression with current status data,” Biometrika, 85, 289–298. [Google Scholar]
- Lin DY, Wei L-J, and Ying Z (1993), “Checking the Cox Model With Cumulative Sums of Martingale-Based Residuals,” Biometrika, 80, 557–572. [Google Scholar]
- Liu L, Wolfe RA, and Huang X (2004), “Shared frailty models for recurrent events and a terminal event,” Biometrics, 60, 747–756. [DOI] [PubMed] [Google Scholar]
- Lubsen J, and Kirwan B-A (2002), “Combined Endpoints: Can We Use Them?,” Statistics in Medicine, 21, 2959–2970. [DOI] [PubMed] [Google Scholar]
- Luo X, Qiu J, Bai S, and Tian H (2017), “Weighted Win Loss Approach for Analyzing Prioritized Outcomes,” Statistics in Medicine, 36, 2452–2465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo X, Tian H, Mohanty S, and Tsai WY (2015), “An Alternative Approach to Confidence Interval Estimation for the Win Ratio Statistic,” Biometrics, 71, 139–145. [DOI] [PubMed] [Google Scholar]
- Mao L (2019), “On the Alternative Hypotheses for the Win Ratio,” Biometrics, 75, 347–351. [DOI] [PubMed] [Google Scholar]
- Mao L, and Lin D (2016), “Semiparametric Regression for the Weighted Composite Endpoint of Recurrent and Terminal Events,” Biostatistics, 17, 390–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao L, and Wang T (2020), “A class of proportional win-fraction regression models for composite outcomes,” Biometrics, 10.1111/biom.13382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montgomery A, Abuan T, and Kollef M (2014), “The Win Ratio Method: A Novel Hierarchical Endpoint for Pneumonia Trials in Patients on Mechanical Ventilation,” Critical Care, 18, P260. [Google Scholar]
- Montori VM, Permanyer-Miralda G, Ferreira-González I, Busse JW, Pacheco-Huergo V, Bryant D, Alonso J, Akl EA, Domingo-Salvany A, Mills E et al. (2005), “Validity of Composite End Points in Clinical Trials,” British Medical Journal, 330, 594–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oakes D (1989), “Bivariate Survival Models Induced by Frailties,” Journal of the American Statistical Association, 84, 487–493. [Google Scholar]
- Oakes D (2016), “On the Win-Ratio Statistic in Clinical Trials With Multiple Types of Event,” Biometrika, 103, 742–745. [Google Scholar]
- O’Connor CM, Whellan DJ, Lee KL, Keteyian SJ, Cooper LS, Ellis SJ, Leifer ES, Kraus WE, Kitzman DW, Blumenthal JA et al. (2009), “Efficacy and Safety of Exercise Training in Patients With Chronic Heart Failure: HF-ACTION Randomized Controlled Trial,” Journal of the American Medical Association, 301, 1439–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pocock S, Ariti C, Collier T, and Wang D (2012), “The Win Ratio: A New Approach to the Analysis of Composite Endpoints in Clinical Trials Based on Clinical Priorities,” European Heart Journal, 33, 176–182. [DOI] [PubMed] [Google Scholar]
- Ratitch B, Bell J, Mallinckrodt C, Bartlett JW, Goel N, Molenberghs G, O’ Kelly M, Singh P, and Lipkovich I (2020), “Choosing estimands in clinical trials: Putting the ICH E9 (R1) into practice,” Therapeutic Innovation & Regulatory Science, 54(2), 324–341. [DOI] [PubMed] [Google Scholar]
- Rauch G, Jahn-Eimermacher A, Brannath W, and Kieser M (2014), “ Opportunities and Challenges of Combined Effect Measures Based on Prioritized Outcomes,” Statistics in Medicine, 33, 1104–1120. [DOI] [PubMed] [Google Scholar]
- Rauch G, Kunzmann K, Kieser M, Wegscheider Kand König J, and Eulenburg C (2018b), “A Weighted Combined Effect Measure for the Analysis of a Composite Time-to-First-Event Endpoint With Components of Different Clinical Relevance,” Statistics in Medicine, 37, 749–767. [DOI] [PubMed] [Google Scholar]
- Rauch G, Schüler S, and Kieser M (2018a), Planning and Analyzing Clinical Trials With Composite Endpoints, New York: Springer. [Google Scholar]
- Robins JM, and Rotnitzky A (1992), “Recovery of information and adjustment for dependent censoring using surrogate markers,” in AIDS Epidemiology Springer, pp. 297–331. [Google Scholar]
- Royston P, and Parmar MK (2013), “Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-toevent outcome,” BMC Medical Research Methodology, 13, 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sampson UK, Metcalfe C, Pfeffer MA, Solomon SD, and Zou KH (2010), “Composite Outcomes: Weighting Component Events According to Severity Assisted Interpretation but Reduced Statistical Power,” Journal of Clinical Epidemiology, 63, 1156–1158. [DOI] [PubMed] [Google Scholar]
- Struthers CA, and Kalbfleisch JD (1986), “Misspecified proportional hazard models,” Biometrika, 73, 363–369. [Google Scholar]
- Sun X, Ding J, and Sun L (2019), “A semiparametric additive rates model for the weighted composite endpoint of recurrent and terminal events,” Lifetime Data Analysis, 26, 471–492. [DOI] [PubMed] [Google Scholar]
- Timsit J-F, de Kraker ME, Sommer H, Weiss E, Bettiol E, Wolkewitz M, Nikolakopoulos S, Wilson D, Harbarth S et al. (2017), “Appropriate Endpoints for Evaluation of New Antibiotic Therapies for Severe Infections: A Perspective from COMBACTE’s STAT-Net,” Intensive Care Medicine, 43, 1002–1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H, Peng J, Zheng JZ, Wang B, Lu X, Chen C, Tu XM, and Feng C (2017), “Win Ratio -An Intuitive and Easy-to-Interpret Composite Outcome in Medical Studies,” Shanghai Archives of Psychiatry, 29, 55–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye Y, Kalbfleisch JD, and Schaubel DE (2007), “Semiparametric analysis of correlated recurrent and terminal events,” Biometrics, 63, 78–87. [DOI] [PubMed] [Google Scholar]
- Zeng D, and Lin D (2009), “Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events,” Biometrics, 65, 746–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
