Abstract
The win ratio is gaining traction as a simple and intuitive approach to analysis of prioritized composite endpoints in clinical trials. To extend it from two-sample comparison to regression, we propose a novel class of semiparametric models that includes as special cases both the two-sample win ratio and the traditional Cox proportional hazards model on time to the first event. Under the assumption that the covariate-specific win and loss fractions are proportional over time, the regression coefficient is unrelated to the censoring distribution and can be interpreted as the log win ratio resulting from one-unit increase in the covariate. U-statistic estimating functions, in the form of an arbitrary covariate-specific weight process integrated by a pairwise residual process, are constructed to obtain consistent estimators for the regression parameter. The asymptotic properties of the estimators are derived using uniform weak convergence theory for U-processes. Visual inspection of a “score” process provides useful clues as to the plausibility of the proportionality assumption. Extensive numerical studies using both simulated and real data from a major cardiovascular trial show that the regression methods provide valid inference on covariate effects and outperform the two-sample win ratio in both efficiency and robustness. The proposed methodology is implemented in the R-package WR, publicly available from the Comprehensive R Archive Network (CRAN).
Keywords: cardiovascular trials, prioritized endpoints, probabilistic index models, proportionality assumption, U-processes, win ratio
1 |. INTRODUCTION
Cardiovascular (CV) and cancer clinical trials often feature composite endpoints consisting of death and nonfatal events such as CV hospitalization or tumor progression (Lewis, 1999). The traditional approach to the analysis of composite endpoints has a major limitation—it focuses on time to the first event (TFE), and thus, ignores the unequal importance between the component events. Recently, Pocock et al. (2012), following earlier work of Finkelstein and Schoenfeld (1999) and Buyse (2010) who proposed joint tests on multiple outcomes using generalized pairwise comparisons, developed a win ratio approach to prioritize death over nonfatal events. Specifically, all possible pairs of patients are compared between the treatment and control, first with regard to the survival time and then, in the case of censoring, by time to the nonfatal event. The win ratio is the ratio between the fractions of “wins” and “losses” the treatment scores against the control. It is commonly interpreted as the number of times a patient is more likely to have a favorable outcome in the treatment as compared to the control.
The win ratio approach has drawn keen interests from statisticians. Bebu and Lachin (2015) and Luo et al. (2015) developed analytic variance formulas for the win ratio statistic using U-statistic theory. Luo et al. (2015) and Mao (2019) studied the null and alternative hypotheses for the win ratio as a testing procedure. Luo et al. (2017) and Dong et al. (2018) extended the win ratio to weighted and stratified versions, respectively. Besides hypothesis testing, Oakes (2016) offered important insight into the estimand of the win ratio and introduced the notion of “curtailed win ratio,” ie, the win ratio evaluated on subjects all followed up to a fixed time point. Meanwhile, many clinicians have started to use the win ratio in real-life trials of both CV (eg, Kotalik et al., 2019) and non-CV diseases (eg, Montgomery et al., 2014; Wang et al., 2017; Finkelstein and Schoenfeld, 2019).
Despite its popularity, the existing literature about the win ratio has been for the most part restricted to the two-sample setup and lacks a rigorous treatment of the regression problem. In addition, the dependency of the win ratio estimand on the censoring distribution has not been satisfactorily addressed. This dependency is particularly problematic in regression analysis, which aims to model the true clinical outcomes against covariates, regardless of the study-dependent length of follow-up. To address these challenges, we develop a novel class of semiparametric proportional win-fractions (PW) regression models to extend the two-sample win ratio. The PW models allow for a user-specified rule of comparison for the relative favorability between outcomes. Under the assumption of proportionality in the covariate-specific win fractions, the regression coefficients are invariant to the follow-up time and can be interpreted as the log win ratios associated with unit increases in the covariates.
The rest of the paper is organized as follows. In Section 2, we lay out the model structure and assumptions, both in terms of the outcome events in the absence of censoring. In Section 3, we introduce the observed data, based on which inference and model-checking procedures are developed. Sections 4 and 5 are devoted to numerical studies using simulated and real data, respectively, on the special PW model under Pocock’s rule of comparison (Pocock et al., 2012). We conclude the paper with some discussions on future research directions in Section 6.
2 |. A GENERAL FRAMEWORK FOR WIN-RATIO REGRESSION
2.1 |. The full data
Let D denote the survival time and write ND(t) = I(D ≤ t), where I(·) is the indicator function. For generality, we allow for multiple hierarchically ranked nonfatal event types (Oakes, 2016). In CV trials, for instance, minor symptoms such as mild chest pain may count as a less serious event than hospitalization. Specifically, let N1(t),…,NK(t) denote the counting processes of K (possibly recurrent) nonfatal event types ranked in descending order of importance. Typically, K = 1 so that N1(t) counts all nonfatal events indiscriminately.
For a generic counting process N(·), we denote its event-history process by . Write . Thus, Y(t) contains information on all clinical outcomes accrued up to time t. Hence, if the maximum length of follow-up is τ, the full outcome data are captured in Y(τ). To facilitate comparison on an event process N(·), define , where we adopt the convention that inf(∅) = ∞. That is, is TFE if it occurs by time t and is ∞ if it does not.
2.2 |. General specification of win function
For all notation introduced above and later, we use sub-scripts i and j to denote the corresponding quantity from two independent patients. To formalize the rule of comparison, we need an indicator function for, eg, Yi winning over Yj. However, in order to apply the rule to patients under varying lengths of follow-up, we need to incorporate the time dimension into the comparison. To that purpose, write
We impose the following conditions on the win function for every t ∈ [0, τ]:
| (A1) |
| (A2) |
| (A3) |
Condition (A1) ensures comparability of the two patients by subjecting them to the same length of follow-up. Condition (A2) is an “antisymmetry” condition to ensure that “winning” can only go one way. In the case that , the comparison is said to be indeterminate. Condition (A3) can be viewed as a “noncompeting-risks” requirement because it forbids change of win-loss status beyond the occurrence of a terminal event (ie, death).
Two simple and intuitive win functions that satisfy conditions (A1)–(A3) are listed below. Here and after, we treat ∞ as if it were a definite maximum of the real numbers so that ∞ > α for all and that ∞ > ∞ is false.
- Comparison on the TFE:
where - Sequential comparisons based on survival time and times to first nonfatal events:
The function is precisely the win function employed in Pocock’s win ratio (Pocock et al., 2012). Since we are chiefly interested in extending Pocock’s win ratio to regression, we will be focusing on in applications. Nevertheless, it is straightforward to think about other sensible choices for . For instance, instead of comparing on times to the first nonfatal events alone, one could also draw on their frequencies (see Supporting Information for details).
2.3 |. The proportional win-fractions regression models
Now, we are ready to build a model for the composite outcome Y, summarized through a user-specified win function , against Z, a p-dimensional vector of covariates. To do so, let Zi and Zj denote the covariates associated with independent subjects i and j, respectively. Each subject can be viewed as randomly drawn from a subpopulation of patients with the same covariate values. If all patients in the two subpopulations are paired up as in the two-sample case of Pocock et al. (2012), the resulting win ratio is
| (1) |
Note that, by taking conditional expectation on the right-hand side of (1), the quantity is a summary measure of between-population comparison. In fact, is a covariate-specific version of the “curtailed win ratio” defined by Oakes (2016) since all subjects under comparison are followed up to the same time t.
A regression model for the win ratio then boils down to specifying up to a finite-dimensional regression parameter. By definition, the covariate-specific win ratio must be nonnegative and satisfy for all Zi, Zj, and t ∈ [0, τ]. A natural model that meets these requirements is
| (2) |
where β = (β1,…,βp)T is a p-dimensional regression parameter. Because the numerator and denominator of are the subpopulation win fractions of Zi against Zj and vice versa, respectively, a key assumption in model (2) is that the win fractions are proportional over time. We hence call (2) the proportional win-fractions (PW) models. Obviously, proportionality of the covariate-specific win fractions is the same as time invariance of the win ratio. The proportional win-fractions assumption here is analogous, but not equivalent, to the proportional hazards assumption in the standard Cox model (their relationships under different choices of are discussed later). In model (2), exp(βj) can be interpreted as the win ratio resulting from one-unit increase in the jth coordinate of Z, regardless of the follow-up time.
Expression (2) formulates a class of PW models indexed by the user-supplied win function . We have illustrated two examples for in Section 2.2. For a given , let denote the particular PW model under . In the Supporting Information, we prove that is equivalent to the Cox proportional hazards (PH) model on the TFE. This strengthens the results of Thas et al. (2012), which established that the Cox model implies a probabilistic index model (PIM) for pairwise regression of univariate uncensored outcomes.
For , with Z ∈ {1, 0}, exp(β) is equal to the estimand of Pocock’s two-sample win ratio comparing group Z = 1 against group Z = 0 under a time-invariance assumption (Oakes, 2016). To distinguish between , ie, Cox PH model on the TFE, and , we call the former “priority-unadjusted” PW model and the latter “priority-adjusted” PW model.
Like , is also closely related to PH models. We show in the Supporting Information that a multivariate Lehmann model with component-wise marginal PH structures is a special case of , with the win ratios equal to the inverse of the hazard ratios. This extends the two-sample results in Oakes (2016) to the regression setting. On the other hand, is more general and is not confined to joint PH models such as the Lehmann model. This is not surprising since a joint PH model for a (K + 1)-variate outcome imposes distributional constraints on [0, τ]⊗(K+1), whereas the PW model is constrained only on a unidimensional time interval [0, τ]. Examples for with nonproportional hazards marginal structures are given in the Supporting Information.
3 |. INFERENCE PROCEDURES
3.1 |. Observed data
The PW models in Section 2 are formulated on patients all followed up to τ. In practice, however, subjects are censored due to staggered entry and/or loss to follow-up before study termination. Let C denote the censoring time and assume that C = Y ⫫ Z. The observed length of follow-up is thus X = D ∧ C. The observed composite outcome is Y(X), ie, the event history up to time X. Overall, the observed data consist of {Y(X), X, Z}. Note that one does not need a “censoring indicator” to distinguish death from censoring because the former is a component of Y(·).
Due to censoring, the win indicators in (1) are not directly computable, because Y(t) need not be observed if the subject is censored before t. Thanks to independent censoring, however, we can show that the win ratio in (1) is unchanged if the win functions in its numerator and denominator are both curtailed at the earlier of the follow-up times.
Lemma 1.
Under model (2), conditions (A1)–(A3), and the independent censoring assumption, we have that
| (3) |
for all t ∈ [0, τ].
The proof of Lemma 1 is given in the Supporting Information. Note that, by (A1), and are both functions of Yi(Xi ∧ Xj ∧ t) and Yj(Xi ∧ Xj ∧ t), which are sub-sets of the observed outcomes Yi(Xi) and Yj(Xj), respectively. Hence, the observed win indicators δij(t) are indeed computable.
Lemma 1 translates the PW model (2), specified on the full data, into an identity based on the observed data. For regression models, however, statisticians generally prefer formulations in terms of mean-zero errors, not least because such formulations facilitate construction of estimating equations. In our case, we can construct the following pairwise residual process
| (4) |
where Rij(t) = δij(t) + δji(t) and
By elementary probabilistic arguments, it is not hard to show that (3) is equivalent to
| (5) |
Indeed, if one views δij(t) as the counting process for subject i winning over subject j and Rij(t) as a “determinacy” indicator, then μ(Zi, Zj; β) is the conditional probability of winning given a determinate comparison. In that sense, Mij is reminiscent of the standard univariate counting process martingale (Fleming and Harrington, 1991).
3.2 |. Weighted U-estimating equations
We want to make inference on the regression parameter β based on the observed data {Yi(Xi), Xi, Zi} (i = 1,…,n), a random n-sample of {Y(X), X, Z}. By the conditional mean-zero property of the residual process in (5), we can use the following covariate-weighted U-statistic estimating equations:
| (6) |
where is some real-valued symmetric function converging uniformly in probability to some fixed function h0(t;·,·;β).
An estimator can be obtained by solving (6) through the Newton-Raphson algorithm. Computationally, this root-finding problem is indistinguishable from that of a weighted logistic regression. Therefore, convergence of the Newton-Raphson algorithm is theoretically guaranteed. For variance estimation, however, U-statistic theory is needed to account for the correlation between the summands on the left-hand side of the equation (see Section 3.3).
One has considerable flexibility in specifying the weight . Different choices do not alter the interpretation of the regression parameter, but do affect the efficiency of the resulting estimator. The simplest choice is the constant weight . It can be shown that the regression parameter estimator under the constant weight for with a binary covariate reduces to Pocock’s two-sample win ratio statistic (see Supporting Information). In Section 4.1, we explore additional time-dependent weights in simulations. However, none of the more complex weights shows any edge in statistical efficiency. Thus, we recommend the use of the constant weight for now, pending development of a complete efficiency theory for the PW models.
3.3 |. Asymptotic properties and variance estimation
Write Oi = {Yi(Xi), Xi, Zi} (i = 1,…,n). With mild regularity conditions on and the weight function , the following theorem establishes the consistency and asymptotic linearity (normality) of .
Theorem 1.
Under (A1)–(A3) for the win function and regularity conditions (C1)-(C5) listed in the Supporting Information, we have that and that
| (7) |
where ψ(Oi) is a mean-zero influence function given in the Supporting Information.
For variance estimation, we can estimate ψ(Oi) in (7) by , where
and . Then, the variance matrix of can be estimated by For α ∈ (0, 1), a 100(1 – α)% confidence interval for exp(βj), the win ratio resulting from one-unit increase in the jth component of Z, is estimated by , where z1−α∕2 is the (1 − α∕2)th quantile of the standard normal distribution and is the square root of the jth diagonal element of . The algebraic formulas for the variance estimator and the estimating function are given in the Supporting Information.
3.4 |. Graphical assessment of the proportionality assumption
Under a sound proportionality assumption, the pairwise residual process Mij(t | Zi, Zj; β) has conditional mean zero for all t ∈ [0, τ]. Thus, similarly to the case with the Cox PH model (Lin et al., 1993), we consider the following (mean-zero) “score process”
| (8) |
To assess the proportionality assumption on the jth component of Z, one can plot a standardized jth component of , over [0, τ]. When proportionality holds, the standardized score process randomly fluctuates around zero. A systematic trend, on the other hand, is indicative of nonproportionality. For example, if the process goes down initially and then bumps up, it means that there are fewer wins at earlier times and more at later times than adequately explained by the PW model (see Section 4.2). Formal goodness-of-fit tests based on the suprema of the score process and other cumulative sums of the residual processes are considered in the Supporting Information.
For , in particular, a possible cause of nonproportionality is component-wise differential covariate effects. Indeed, as follow-up progresses, greater emphasis is being shifted from the nonfatal events to death. This is because death is prioritized and is more likely to occur with a longer follow-up. As a result, if the covariate effects on death and the nonfatal events are different, the win ratio is likely to change over time. When the proportionality assumption fails, the scale of the estimated regression coefficient should be interpreted with caution. Nonetheless, one can always use the standardized estimator for testing purposes.
4 |. SIMULATION STUDIES
In this section and the next, we focus on the priority-adjusted model , the regression version of Pocock’s win ratio (Pocock et al., 2012). For all simulations, let K = 1 so that the composite outcome consists of (D, T1), a bivariate vector of survival time and time to the first nonfatal event.
4.1 |. Model-based estimation
In the first set of simulations, we assessed the estimation of regression parameters under a correctly specified . Let Z = (Z1, Z2)T, where Z1 follows the standard normal distribution truncated on [−1, 1] and Z2 ~2Bernoulli(0.5) − 1. We simulated outcome data under a conditional Gumbel-Hougaard copula model (Oakes, 1989; Luo et al., 2015; Oakes, 2016), a special case of the Lehmann model considered in the Supporting Information with baseline survival function , where α ≥ 1 controls the correlation between D and T1. We set λD = 0.2, λH = 2, and α = 1, 2 (corresponding to Kendall’s tau values 0% and 50%, respectively, between D and T1). Let C ~ min{Un[1, 4], Expn(0.2)}. Under this setup, the death rate is about 18%, nonfatal event rate about 75%, and censoring rate about 82%.
For the weighted U-estimating equations (6), we chose two weight functions and , where is the conditional survival function estimated from a Cox PH model on D. With β = (β1, β2)T = (−.5, .5)T, (0, 0)T, and (.5, −.5)T, we simulated 2000 replicates for each scenario and computed the parameter estimates, with standard errors estimated by the formula given in Section 3.3. On a MacBook Pro with 2.7 GHz processor, it takes about 0.5, 2.5, and 10 seconds to fit the model to a sample of size 200, 500, and 1000, respectively.
The results for the estimation of β1 are summarized in Table 1. Both estimators exhibit minimal bias across different sample sizes. Their estimated standard errors are in good agreement with the observed variations in the estimators. The coverage probabilities of the 95% confidence interval are fairly close to the nominal rate. Moreover, there does not appear to be any appreciable difference in efficiency between the two weight functions. Additional simulations under a higher mortality, however, show considerable efficiency loss for the survival-weighted estimator (see Table S1 in the Supporting Information).
TABLE 1.
Simulation results on the estimation of β1
| Constant weight | Survival weight | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| α | n | β 1 | EST | SE | SEE | CP | EST | SE | SEE | CP |
| 1.0 | 200 | −.5 | −0.508 | 0.160 | 0.163 | 0.951 | −0.506 | 0.165 | 0.163 | 0.953 |
| 0 | 0.003 | 0.155 | 0.154 | 0.956 | 0.002 | 0.154 | 0.155 | 0.953 | ||
| .5 | 0.497 | 0.166 | 0.163 | 0.942 | 0.508 | 0.159 | 0.163 | 0.949 | ||
| 500 | −.5 | −0.500 | 0.103 | 0.102 | 0.952 | −0.501 | 0.103 | 0.102 | 0.946 | |
| 0 | 0.001 | 0.097 | 0.096 | 0.950 | 0.001 | 0.099 | 0.097 | 0.943 | ||
| .5 | 0.501 | 0.102 | 0.102 | 0.950 | 0.496 | 0.105 | 0.102 | 0.944 | ||
| 1000 | −.5 | −0.498 | 0.072 | 0.072 | 0.948 | −0.500 | 0.074 | 0.072 | 0.943 | |
| 0 | 0.002 | 0.068 | 0.068 | 0.950 | −0.001 | 0.069 | 0.068 | 0.944 | ||
| .5 | 0.497 | 0.072 | 0.072 | 0.950 | 0.501 | 0.072 | 0.072 | 0.950 | ||
| 2.0 | 200 | −.5 | −0.506 | 0.172 | 0.169 | 0.948 | −0.497 | 0.174 | 0.173 | 0.946 |
| 0 | 0.004 | 0.155 | 0.160 | 0.958 | 0.000 | 0.163 | 0.163 | 0.954 | ||
| .5 | 0.502 | 0.166 | 0.169 | 0.955 | 0.504 | 0.176 | 0.173 | 0.950 | ||
| 500 | −.5 | −0.502 | 0.106 | 0.106 | 0.948 | −0.500 | 0.108 | 0.108 | 0.951 | |
| 0 | −0.001 | 0.100 | 0.100 | 0.944 | −0.001 | 0.103 | 0.102 | 0.946 | ||
| .5 | 0.501 | 0.109 | 0.106 | 0.946 | 0.501 | 0.109 | 0.108 | 0.952 | ||
| 1000 | −.5 | −0.501 | 0.074 | 0.074 | 0.949 | −0.498 | 0.075 | 0.076 | 0.950 | |
| 0 | 0.001 | 0.070 | 0.070 | 0.952 | −0.001 | 0.072 | 0.072 | 0.952 | ||
| .5 | 0.498 | 0.077 | 0.074 | 0.942 | 0.502 | 0.077 | 0.076 | 0.949 | ||
Note: EST and SE are the mean and standard error of the parameter estimator; SEE is the mean of the standard error estimator; CP is the coverage probability of the 95% confidence interval. Each scenario is based on 2000 replicates.
To investigate more weighting options, we experimented with two additional weights and summarized the simulation results in Tables S2 and S3 of the Supporting Information. Interestingly, neither of the two weights outperforms the simple constant weight in statistical efficiency. It thus appears advisable to use the constant weight in practice due to its computational simplicity.
4.2 |. Proportionality and score processes
To evaluate the sensitivity of the score processes to violations of proportionality assumption, we assessed their behavior under both proportional and nonproportional win fractions. To do so, we simulated the outcome data under the following conditional Gumbel-Hougaard copula model
| (9) |
This model implies marginal Cox PH models for D and T1 with regression parameters −βD and −βH, respectively. Clearly, when βD = βH = β, (9) reduces to the model used in Section 4.1. Under different βD and βH, however, the win ratio is time-varying. In fact, by the arguments in Section 3.4, the following three scenarios for Z1 can be constructed: (a) βD = βH = (.5, 0)T, constant win ratio; (b) βD = (.5, 0)T and βH = (0, 0)T, increasing win ratio; (c) βD = (0, 0)T and βH = (.5, 0)T, decreasing win ratio. For n = 1000, we plotted the standardized score processes under the three scenarios in Figure 1. The first scenario shows curves that fluctuate randomly around zero with suprema bounded by 2 (mostly bounded by 1); the second and third show curves with initial decrease/increase and subsequent return to zero, which suggest excess wins at later/earlier times, respectively. The clear visual differences between the scenarios demon-strate a fair degree of sensitivity of the score processes to departures from proportionality. As a practical rule of thumb, if the supremum of the standardized score process exceeds 2 by a substantial margin (such as in the latter two panels of Figure 1), there is cause for concern over the proportionality assumption. In such cases, the investigator should interpret the regression coefficients with caution, or seek alternative means to analyze the data.
FIGURE 1.

Standardized score processes for Z1 under different scenarios with n = 1000. Each panel displays 10 random realizations
4.3 |. Hypothesis testing
Finally, we compared with the two-sample win ratio of Pocock et al. (2012) as hypothesis testing procedures. Let Z1 ~ Bernoulli{expit(γZ2)}, where expit(x) = exp(x){1 + exp(x)}−1 and Z2 ~ N(0, 1) is a potential con-founder. We generated the composite outcomes using the same Gumbel-Hougaard model as in Section 4.1 with α = 2, β2 = −.5, and β1 = 0, .2, .4 (corresponding to win ratios 1.0, ~1.2, and ~1.5, respectively). We tested the effect of Z1 under γ = 0 (no confounding) and .5 (confounding by Z2) using the two-sample win ratio and estimated from (by the constant weight) with (Z1, Z2)T as covariates.
The empirical type I error (under win ratio 1.0) and power (under win ratios 1.2 and 1.5) are summarized in Table 2. Each scenario is based on 2000 replicates. In the absence of confounding, both the two-sample and regression-based tests maintain correct type I error. However, the latter is substantially more powerful than the former, owing to adjustment of the additional predictor Z2. In the presence of confounding, the type I error for the two-sample test is severely inflated, especially under the larger sample size, while that for the regression-based test is close to the nominal level. These results suggest that covariate adjustment improves both efficiency and robustness of the test. More simulations comparing and the TFE analysis can be found in the Supporting Information.
TABLE 2.
Empirical type I error and power by Pocock’s two-sample win ratio comparing Z1 = 1 with Z1 = 0 and by tests on β1 under
| n = 200 | n = 500 | ||||
|---|---|---|---|---|---|
| Confounding | Win ratio | Two-sample | Regression | Two-sample | Regression |
| No | 1.0 | 0.056 | 0.052 | 0.047 | 0.051 |
| 1.2 | 0.166 | 0.203 | 0.408 | 0.459 | |
| 1.5 | 0.529 | 0.587 | 0.918 | 0.948 | |
| Yes | 1.0 | 0.266 | 0.055 | 0.562 | 0.049 |
| 1.2 | 0.052 | 0.184 | 0.062 | 0.432 | |
| 1.5 | 0.126 | 0.577 | 0.262 | 0.932 | |
5 |. ANALYSIS OF THE HF-ACTION STUDY
The HF-ACTION (Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training) study was conducted on a group of 2331 heart failure patients recruited between April 2003 and February 2007 at 82 sites in the USA, Canada, and France (O’Connor et al., 2009). The study aimed to assess the effect of adding aerobic exercise training to usual care on the patient’s CV outcomes. The primary endpoint was a composite of all-cause death and all-cause hospitalization. The full dataset was analyzed in O’Connor et al. (2009) using TFE under a Cox PH model adjusting for etiology (ischemic versus nonischemic). The results show a moderate beneficial effect of exercise training in reducing the risk of the first composite event with a nonsignificant hazard ratio of 0.93.
To illustrate the use of the proposed methodology, we reanalyze the data using to prioritize death over hospitalization. The data consist of 2109 subjects with complete measurements of baseline covariates of interest. Among them, 1051 received aerobic exercise training consisting of 36 supervised sessions in addition to usual care, and the remaining 1058 received usual care alone. The median follow-up time was about 31 months, with a death rate of 16.5% and a first-hospitalization rate of 53.8%.
In addition to the treatment indicator, we also include baseline covariates such as patient age, sex, disease etiology (nonischemic vs ischemic), CPX duration test (maximum duration of cardiopulmonary exercise before reporting of discomfort), histories of atrial fibrillation/flutter or diabetes, and geographical region of the study site (the USA, Canada, or France). Table 3 summarizes the estimated win ratios from the multiple regression analysis using the constant weight, with confidence intervals (CI) constructed as described in Section 3.3. Controlling for covariates, patients under exercise training are 6% more likely to have a favorable (priority-adjusted) outcome than those under usual care. However, this effect is nonsignificant at the conventional .05 level. On the other hand, ischemic etiology, female gender, longer CPX duration, non-US study site, and absence of history of atrial fibrillation are strongly and significantly associated with a favorable outcome. These results are comparable to those from the Cox model on TFE, ie, (see Table S5 in the Supporting Information). In addition, a joint test on the two terms for geographical region yields a chi-square statistic of (P-value 1.4 × 10−4), suggesting highly significant differences in patient outcomes between American, Canadian, and French sites.
TABLE 3.
Priority-adjusted PW multiple regression analysis of the HF-ACTION study
| Covariate | Win Ratio | 95% CI | P-value |
|---|---|---|---|
| Training vs Usual | 1.06 | (0.95–1.19) | .275 |
| Non-Ischemic vs Ischemic | 1.15 | (1.02–1.31) | .027 |
| Age (decade) | 1.02 | (0.97–1.07) | .468 |
| Male vs Female | 0.72 | (0.63–0.82) | <.001 |
| CPX Duration (minute) | 1.11 | (1.09–1.13) | <.001 |
| Canada vs USA | 1.34 | (1.09–1.66) | .007 |
| France vs USA | 1.95 | (1.32–2.89) | .001 |
| Atrial Fibrillation (y vs n) | 0.80 | (0.70–0.92) | .002 |
| Diabetes (y vs n) | 0.98 | (0.87–1.11) | .726 |
Note: CI, confidence interval.
Finally, we assess the proportionality assumption on all covariates using the graphical procedure described in Section 3.4. The standardized score processes are plotted in Figure 2. Most curves are reasonably well behaved and none exhibits such conspicuous patterns as displayed in the latter two panels of Figure 1. It can thus be cautiously concluded that the proportionality assumption is approximately true. A closer examination, however, reveals a decreasing trend for the win ratio of age, suggesting that older patients may “win” less at later times than explained by the PW model. As discussed in Section 3.4, this may signify differential effects of age on mortality and hospitalization. Component-wise secondary analysis substantiates our conjecture that older age has a greater negative impact on patient survival than on hospitalization (see Table S6 in the Supporting Information). Similar mechanisms may underlie the apparent nonproportionality in “Canada vs USA” as well.
FIGURE 2.

Standardized score processes for all covariates in the priority-adjusted PW regression analysis of the HF-ACTION study
6 |. DISCUSSIONS
The contributions of the proposed class of semiparametric proportional win-fractions models are threefold. First, it generalizes the traditional Cox proportional hazards model for TFE by allowing for priority adjustment among different event types. Second, it extends Pocock’s two-sample win ratio to the regression setting. Third, it extends the PIM of Thas et al. (2012) for pairwise regression from univariate uncensored outcome to multivariate censored outcomes. In addition to composite time-to-event outcomes, the general framework established here is also applicable to other outcome types, such as longitudinal measurements combined with a survival endpoint (Finkelstein and Schoenfeld, 1999), so long as a suitable win function is specified to embody a sensible rule of comparison.
The proportional win-fractions assumption is key to ensuring that the regression parameters are scientifically interpretable. Early approaches to a meaningful win ratio estimand have relied on special cases with component-wise proportional hazards structures (Kotalik et al., 2019; Follmann et al., 2020). Our model specification in terms of the time-curtailed win ratio (Oakes, 2016) allows us to directly model the covariate-specific win ratio, bypassing restrictive joint models. Nonetheless, the proportionality assumption itself is still strong and must be checked, as we did in Section 5, whenever one uses the PW models to analyze real data.
When the proportionality assumption fails, the regression coefficient estimates will depend on the censoring distribution, similarly to the case with the Cox model under nonproportional hazards (Struthers and Kalbfleisch, 1986). The inverse probability censoring weighting (IPCW) approach is a useful tool for addressing such dependency on censoring and and has recently been applied to the two-sample win ratio (Dong et al., 2020). A similar approach may be adopted in our regression framework to relax the proportionality assumption, possibly by inversely weighting the win indicator using covariate-specific survival functions for the censoring time. This deserves a careful treatment in the future.
In addition to Thas et al. (2012), regression models with U-statistic estimating functions have been studied in a number of different settings, including the Box-Cox power transformation models (ie, a generalization of the accelerated failure time model) for univariate survival outcome (Cai et al., 2005) and the receiver-operating characteristic curve for disease classification (Cai and Dodd, 2008). Compared to these previous cases, our estimating functions are further complicated by the presence of a time-dependent weight process . Nonetheless, with only mild regularity conditions, we are able to use general weak convergence theory for U-processes (indexed by infinite-dimensional parameters) to control the asymptotic variations in the estimation function and thereby to prove that the resulting estimator is consistent and asymptoti-cally normal. These theoretical results increase one’s flexibility in the choice of the weight.
On the other hand, the optimal weight with regard to statistical efficiency has been left as an open problem. This problem is nontrivial since a full-fledged efficiency theory for the simpler case of the PIM (Thas et al., 2012) has not yet been developed. Given the unique structure of such pairwise regression models, a systematic approach drawing on the first principles of semiparametric efficiency theory (Bickel et al., 1993) is likely needed to elucidate the problem.
Under the proportionality assumption, the pairwise comparison in the PW models cancels out the time effect on the win-loss status. To shed more light to the temporal trend, it might be appealing to reformulate the models in explicit terms of a “baseline” win-loss function of time as well as the time-invariant regression parameter (eg, similarly to the specification of the Cox model). The existence of such equivalent expressions for the PW models would be an interesting question for future research.
Supplementary Material
ACKNOWLEDGMENTS
This research was supported by NIH grant R01HL149875. The HF-ACTION study data analyzed in Section 5 were provided by the Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) of the National Heart, Lung, and Blood Institute. We thank the Editor, Associate Editor, and two anonymous referees for helpful comments.
Funding information
National Heart, Lung, and Blood Institute, Grant/Award Number: R01HL149875
Footnotes
DATA AVAILABILITY STATEMENT
The data that support the findings in this paper are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
SUPPORTING INFORMATION
Web Appendices, Tables, and Figures referenced in Sections 2, 3, 4, and 5 are available with this paper at the Biometrics website on Wiley Online Library. An R-package WR that implements the proposed methodology is available on the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/web/packages/WR/) along with an illustrative example using a subset of the HF-ACTION study data.
REFERENCES
- Bebu I and Lachin JM (2015) Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics, 17, 178–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y and Wellner JA (1993) Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. [Google Scholar]
- Buyse M (2010) Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine, 29, 3245–3257. [DOI] [PubMed] [Google Scholar]
- Cai T and Dodd LE (2008) Regression analysis for the partial area under the ROC curve. Statistica Sinica, 18, 817–836. [Google Scholar]
- Cai T, Tian L and Wei LJ (2005) Semiparametric Box-Cox power transformation models for censored survival observations. Biometrika, 92, 619–632. [Google Scholar]
- Dong G, Mao L, Huang B, Gamalo-Siebers M, Wang J, Yu G et al. (2020) The inverse-probability-of-censoring weighting (IPCW) adjusted win ratio statistic: an unbiased estimator in the presence of independent censoring. Journal of Biopharmaceutical Statistics, 30, 882–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong G, Qiu J, Wang D and Vandemeulebroecke M (2018) The stratified win ratio. Journal of Biopharmaceutical Statistics, 28, 778–796. [DOI] [PubMed] [Google Scholar]
- Finkelstein DM and Schoenfeld DA (1999) Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine, 18, 1341–1354. [DOI] [PubMed] [Google Scholar]
- Finkelstein DM and Schoenfeld DA (2019) Graphing and win ratio and its components over time. Statistics in Medicine, 38, 53–61. [DOI] [PubMed] [Google Scholar]
- Fleming TR and Harrington DP (1991) Counting Processes and Survival Analysis. Hoboken: John Wiley & Sons. [Google Scholar]
- Follmann D, Fay MP, Hamasaki T and Evans S (2020) Analysis of ordered composite endpoints. Statistics in Medicine, 39, 602–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kotalik A, Eaton A, Lian Q, Serrano C, Connett J and Neaton JD (2019) A win ratio approach to the re-analysis of Multiple Risk Factor Intervention Trial. Clinical Trials, 16, 626–634. [DOI] [PubMed] [Google Scholar]
- Lewis JA (1999) Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. Statistics in Medicine, 18, 1903–1942. [DOI] [PubMed] [Google Scholar]
- Lin DY, Wei LJ and Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika, 80, 557–572. [Google Scholar]
- Luo X, Qiu J, Bai S and Tian H (2017) Weighted win loss approach for analyzing prioritized outcomes. Statistics in Medicine, 36, 2452–2465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo X, Tian H, Mohanty S and Tsai WY (2015) An alternative approach to confidence interval estimation for the win ratio statistic. Biometrics, 71, 139–145. [DOI] [PubMed] [Google Scholar]
- Mao L (2019) On the alternative hypotheses for the win ratio. Biometrics, 75, 347–351. [DOI] [PubMed] [Google Scholar]
- Montgomery A, Abuan T and Kollef M (2014) The win ratio method: a novel hierarchical endpoint for pneumonia trials in patients on mechanical ventilation. Critical Care, 18, P260. [Google Scholar]
- Oakes D (1989) Bivariate survival models induced by frailties. Journal of the American Statistical Association, 84, 487–493. [Google Scholar]
- Oakes D (2016) On the win-ratio statistic in clinical trials with multiple types of event. Biometrika, 103, 742–745. [Google Scholar]
- Pocock SJ, Ariti CA, Collier TJ and Wang D (2012) The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33, 176–182. [DOI] [PubMed] [Google Scholar]
- Struthers CA and Kalbfleisch JD (1986) Misspecified proportional hazard models. Biometrika, 73, 363–369. [Google Scholar]
- Thas O, Neve JD, Clement L and Ottoy JP (2012) Probabilistic index models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74, 623–671. [Google Scholar]
- Wang H, Peng J, Zheng JZ, Wang B, Lu X, Chen C et al. (2017) Win ratio-an intuitive and easy-to-interpret composite outcome in medical studies. Shanghai Archives of Psychiatry, 29, 55–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
