An optimal Wilcoxon–Mann–Whitney test of mortality and a continuous outcome

Roland A Matsouaka; Aneesh B Singhal; Rebecca A Betensky

doi:10.1177/0962280216680524

. Author manuscript; available in PMC: 2018 Aug 1.

Published in final edited form as: Stat Methods Med Res. 2016 Dec 29;27(8):2384–2400. doi: 10.1177/0962280216680524

An optimal Wilcoxon–Mann–Whitney test of mortality and a continuous outcome

Roland A Matsouaka ^1,², Aneesh B Singhal ³, Rebecca A Betensky ^4,⁵

PMCID: PMC5393279 NIHMSID: NIHMS847056 PMID: 27920364

Abstract

We consider a two-group randomized clinical trial, where mortality affects the assessment of a follow-up continuous outcome. Using the worst-rank composite endpoint, we develop a weighted Wilcoxon–Mann–Whitney test statistic to analyze the data. We determine the optimal weights for the Wilcoxon–Mann–Whitney test statistic that maximize its power. We derive a formula for its power and demonstrate its accuracy in simulations. Finally, we apply the method to data from an acute ischemic stroke clinical trial of normobaric oxygen therapy.

Keywords: Missing data, survivor bias, multiple endpoints, weighted Wilcoxon–Mann–Whitney test, censored-by-death, composite endpoints

1 Introduction

In many randomized clinical trials, the difference between treatment groups is evaluated using measurements of an outcome of interest after a pre-specified follow-up time. However, for some participants, follow-up measurements may be missing if a disease-related event, such as death (or withdrawal due to worsening disease condition), has occurred prior to the end of follow-up time. Our motivating example is a clinical trial of acute ischemic stroke conducted at Massachusetts General Hospital in Boston, MA. In this trial, patients who had acute ischemic stroke were randomized to either normobaric oxygen (NBO) therapy or room air and assessed serially to monitor their functional ability. Among other measures, patients’ neurological recovery was assessed and quantified using the NIH Stroke Scale (NIHSS) score, a function rating scale used to quantify neurological deficit due to stroke.^1,2 However, investigators were confronted with early deaths, which precluded measurements of NIHSS scores for some participants at the end of the three-month follow-up period. Any analysis of the data that includes solely the subjects who survived would be biased and give spurious results.³

One approach to handle this issue is to combine the primary endpoint and mortality into a single composite endpoint: the worst-rank composite endpoint. It is calculated by considering death as the worst outcome on the same scale as the measure outcome and analyzed using ranks of these combined outcomes.^4–6 Unlike traditional analyses of composite endpoints that treat all of the component endpoints equally and focus on each study participant’s first occurring event, worst-rank composite endpoints incorporate a hierarchical ranking of these individual outcomes based on their clinical importance, frequency of occurrence or severity. Moreover, in contrast to the typical “time-to- event” analyses, worst-rank composite endpoints allow us to combine individual outcomes from multiple clinical domains, while accounting for their heterogeneity. Such outcomes could include both clinical events (e.g., death), continuous variables, or other clinical measurements (e.g., biomarker or quality-of-life measures.)⁷

Ranking individual outcomes that characterize various aspects of patients’ disease experience based on a prespecified hierarchy of various components suggest the existence of an implicit weighting scheme. In fact, several authors have suggested the use of a priori determined utility (or sometimes severity) weights to reflect the relative importance of the components of composite outcomes and add another layer of discrimination beyond hierarchical ordering alone.^8,9 Such weighting may be based on subjective criteria or elicitation of experts. However, deriving such a priori weights and finding a consensus about them have proven to be difficult.^10–14

Building upon our previous work on this topic,⁶ and assuming there is a pre-specified hierarchy of various components of a composite outcome, we introduce an optimal approach that not only acknowledges such a hierarchy, but also estimates the weights so as to maximize the power to detect globally any treatment effect when present.

The use of multivariate tests to compare treatment effects from multivariate outcomes has gain interest in clinical trials of multifaceted complex diseases, where the clinical course of the disease is manifested in complex ways through a host of clinical outcomes. A global test statistic for composite endpoints that accounts for the complexity of the disease, rather than evaluating individual components, provides a comprehensive method to evaluate more effectively and more efficiently the efficacy of a treatment.^15,16 Tests such as O’Brien test,¹⁷ Wei and Johnson’s test,¹⁸ Finkelstein and Schoenfeld’s test,¹⁹ Moye; et al.’s test^20,21 are rank-based tests developed using U- statistics. Some of these tests of combined endpoints are weighted tests where the optimal weights are determined by maximizing the power of the test statistic under a particular alternative hypothesis: this is the framework we will focus on in this paper.

In this paper, we use the given hierarchy of outcomes to construct a worst-rank composite endpoint such that death (or a missing continuous outcome due to worsening of the disease condition) is considered a worse outcome than any observed primary endpoint measurement. Furthermore, two subjects who died are ranked with respect to their survival times.^4–6 In Section 2, we give the rationale for the weighted Wilcoxon–Mann–Whitney (WMW) test statistic for such a worst-rank composite endpoint. We then derive data-based optimal weights that maximize the power of the weighted WMW test statistic along with its analytical power formula. We demonstrate that the optimal-weighted WMW test statistic has greater power than the ordinary WMW test statistic. We illustrate the accuracy of our results through simulation studies (Section 3). Finally, we apply the procedures to the clinical trial of the NBO therapy for acute ischemic stroke patients.

2 Weighted WMW

2.1 Notations

In this section, we present the ordinary WMW test for the worst-rank composite outcome and its analytical power formula that we previously derived.⁶ Then, we motivate its extension to a weighted WMW test through a decomposition of the WMW U-statistic.

Consider a randomized clinical trial in which m and n subjects are assigned, respectively, to the control treatment (group 1) and the active treatment (group 2) and then followed for time period T. For subject j in group i, X_ij denotes the value of the continuous endpoint at the end of the follow-up time, t_ij denotes the time to death or disease-related withdrawal (for simplicity, we will refer to both as death), δ_ij = I(t_ij ≤ T) indicates early death (i.e., before T), and p_i = E(δ_ij) = P(t_ij ≤ T) the probability of early death for subjects in group i.

If the subject died before T, X is unknown. Thus, following the assumed hierarchy of outcomes, this subject is assigned a worst-rank score equal to η + t_ij, which is a function of his or her survival time, where η = min(X) − 1 − T.

Without loss of generality, we assume larger values of X correspond to better health outcome. For each subject, the worst-rank composite endpoint is thus

{\tilde{X}}_{i j} = (1 - δ_{i j}) X_{i j} + δ_{i j} (η + t_{i j}), i = 1, 2 and j = 1, \dots, N

(1)

Let F_i and G_i be, respectively, the cumulative conditional distributions of the informative event times and observed non-fatal outcome for patients in group i, i.e. F_i(v) = P(t_ij ≤ v|0 < t_ij ≤ T) and G_i(x) = P(X_ij ≤ x|t_ij > T). The distribution of ${\tilde{X}}_{i}$ is given by

{\tilde{G}}_{i} (x) = p_{i} F_{i} (x - η) I (x < ζ) + (1 - p_{i}) G_{i} (x) I (x \geq ζ), ζ = \min (X) - 1

(2)

We would like to test the null hypothesis that the two treatments are equivalent with respect to both survival and the non-fatal outcome

H_{0} : G_{1} (x) = G_{2} (x) and F_{1} (t) = F_{2} (t) for all x and t

(3)

against the uni-directional alternative hypothesis that the active treatment is at least as effective as the control treatment for both mortality and the non-fatal outcome and is not harmful for either, i.e.

H_{1} : G_{1} (x) \geq G_{2} (x) and F_{1} (t) \geq F_{2} (t), for some x and / or t

(4)

with both G₁(x) = G₂(x) and F₁(t) = F₂(t) not occurring simultaneously for all x and t.

2.2 Ordinary WMW test

We will now define the ordinary WMW test using the framework of the worst-rank composite endpoint $\tilde{X}$ of the previous section. The ordinary WMW U-statistic is defined by

U = {(m n)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} I ({\tilde{X}}_{1 k} < {\tilde{X}}_{2 l})

(5)

Using equation (1), we note that $I ({\tilde{X}}_{1 k} < {\tilde{X}}_{2 l})$ is equal to

δ_{1 k} δ_{2 l} I (t_{1 k} < t_{2 l}) + δ_{1 k} (1 - δ_{2 l}) + (1 - δ_{1 k}) (1 - δ_{2 l}) I (X_{1 k} < X_{2 l})

(6)

Therefore

\begin{array}{l} μ_{1} = E (U) = π_{U 1} \\ σ_{1}^{2} = Var (U) = {(mn)}^{- 1} [π_{U 1} (1 - π_{U 1}) + (m - 1) (π_{U 2} - π_{U 1}^{2}) + (n - 1) (π_{U 3} - π_{U 1}^{2})] \end{array}

(7)

where

\begin{array}{l} q_{i} = 1 - p_{i}, π_{t 1} = P (t_{1 k} < t_{2 l} | t_{1 k} \leq T, t_{2 l} \leq T) \\ π_{t 2} = P (t_{1 k} < t_{2 l}, t_{1 k'} < t_{2 l} | t_{1 k} \leq T, t_{1 k'} \leq T, t_{2 l} \leq T) \\ π_{t 3} = P (t_{1 k} < t_{2 l}, t_{1 k} < t_{2 l'} | t_{1 k} \leq T, t_{2 l} \leq T, t_{2 l'} \leq T) \\ π_{x 1} = P (X_{1 k} < X_{2 l}), π_{x 2} = P (X_{1 k} < X_{2 l}, X_{1 k'} < X_{2 l}) \\ π_{x 3} = P (X_{1 k} < X_{2 l}, X_{1 k} < X_{2 l'}) \\ π_{U 1} = p_{1} p_{2} π_{t 1} + p_{1} q_{2} + q_{1} q_{2} π_{x 1} \\ π_{U 2} = p_{1}^{2} q_{2} + p_{1}^{2} p_{2} π_{t 2} + 2 p_{1} q_{1} q_{2} π_{x 1} + q_{1}^{2} q_{2} π_{x 2} \\ π_{U 3} = p_{1} q_{2}^{2} + p_{1} p_{2}^{2} π_{t 3} + 2 p_{1} p_{2} q_{2} π_{t 1} + q_{1} q_{2}^{2} π_{x 3} \end{array}

(see the proof in Appendix 1).

Under the null hypothesis (H₀) of no difference between the two treatment groups, μ₀ = E₀(U) = 1/2 and $σ_{0}^{2} = V a r_{0} (U) = (n + m + 1) / (12 m n)$ . The distribution of the ordinary WMW test statistic

Z = \frac{U - E_{0} (U)}{\sqrt{V a r_{0} (U)}}

(8)

converges to the standard normal distribution N(0, 1) as m and n tend to infinity, and m/n → ρ, 0 < ρ < 1.

The power of this WMW test is given by

Φ (\frac{σ_{0}}{σ_{1}} Z_{\frac{α}{2}} + \frac{μ_{1} - μ_{0}}{σ_{1}}) + Φ (\frac{σ_{0}}{σ_{1}} Z_{\frac{α}{2}} - \frac{μ_{1} - μ_{0}}{σ_{1}}) \approx Φ (\frac{σ_{0}}{σ_{1}} Z_{\frac{α}{2}} + \frac{| μ_{1} - μ_{0} |}{σ_{1}})

(9)

where μ₁ = E(U) and of $σ_{1}^{2} = Var (U)$ under the alternative hypothesis (H₁) (see the proof in Matsouaka and Betensky).⁶

2.3 Weighted WMW test

To motivate our weighted test, we now write the WMW U-statistic applied to the worst-rank scores (5) as a sum of three dependent WMW U-statistics. Then, we demonstrate that to optimally compare two treatment groups using worst-rank scores, we need to use a weighted statistic that takes into account the dependence that exists among the three statistics.

Assume there exists weights w = (w₁, w₂), w₁ + w₂ = 1, such that equation (1) becomes

{\tilde{X}}_{i j} = w_{1} δ_{i j} (η + t_{i j}) + w_{2} (1 - δ_{i j}) X_{i j}, i = 1, 2 and j = 1, \dots, N

(10)

The U-statistic (5) then becomes $U_{w} = w_{1}^{2} U_{t} + w_{1} w_{2} U_{t x} + w_{2}^{2} U_{x},$ where U_t, U_tx and U_x are defined by

\begin{array}{l} U_{t} = {(mn)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} δ_{1 k} δ_{2 l} I (t_{1 k} < t_{2 l}) \\ U_{tx} = {(mn)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} δ_{1 k} (1 - δ_{2 l}) \\ U_{x} = {(mn)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} (1 - δ_{1 k}) (1 - δ_{2 l}) I (X_{1 k} < X_{2 l}) \end{array}

(11)

Using vector notation, we can write U_w as U_w = c′U where we define U′ = (U_t, U_tx, U_x) and $c' = (c_{1}, c_{2}, c_{3}) = (w_{1}^{2}, w_{1} w_{2}, w_{2}^{2})$ . Notice that c₁ + 2c₂ + c₃ = (w₁ + w₂)² = 1.

Using the results in Appendix 2, we have

\begin{array}{l} μ_{1 w} = E (U_{w}) = c' (p_{1} p_{2} π_{t 1}, p_{1} q_{2}, q_{1} q_{2} π_{x 1})' \\ σ_{1 w} = Var (U_{w}) = c' \sum c \end{array}

where Σ = Var(U) is a 3 × 3 matrix given in Appendix 2.

Under the null hypothesis

\begin{array}{l} μ_{0 w} = E_{0} (U_{w}) = \frac{1}{2} c' (p^{2}, 2 p q, q^{2})' \\ = \frac{1}{2} [w_{1}^{2} p^{2} + 2 w_{1} w_{2} p q + w_{2}^{2} q^{2}] = \frac{1}{2} {[w_{1} p + w_{2} q]}^{2} \\ σ_{0 w} = V a r_{0} (U_{w}) = c' \sum_{0} c \end{array}

with Σ₀ = Var₀(U) a 3 × 3 matrix given in Appendix 2.

2.3.1 Pre-specified weights

When there are pre-specified weights, usually determined as to reflect the relative importance or the severity of component outcomes, they can be used to calculate the weighted WMW test statistic

Z_{w} = \frac{U_{w} - E_{0} (U_{w})}{\sqrt{V a r_{0} (U_{w})}}

(12)

Z_w converges to the standard normal distribution N(0, 1) as m and n tend to infinity, and m/n → ρ, 0 < ρ < 1.

The corresponding power is given by

Φ (\frac{σ_{0 w}}{σ_{1 w}} z_{\frac{α}{2}} + \frac{μ_{1 w} - μ_{0 w}}{σ_{1 w}}) + Φ (\frac{σ_{0 w}}{σ_{1 w}} z_{\frac{α}{2}} - \frac{μ_{1 w} - μ_{0 w}}{σ_{1 w}}) \approx Φ (\frac{σ_{0 w}}{σ_{1 w}} z_{\frac{α}{2}} + \frac{| μ_{1 w} - μ_{0 w} |}{σ_{1 w}})

(13)

For instance, after surveying a panel of clinical investigators, Bakal et al.⁹ used pre-specified weights in a study that used a composite endpoints of death, cardiogenic shock (Shock), congestive heart failure (CHF), and recurrent myocardial infarction (RE-MI). The weights were 1 for death, 0.5 for Shock, 0.3 for hospitalization for CHF, and 0.2 for RE-MI, i.e., in this context, $w = \frac{1}{2} (1, 0.5, 0.3, 0.2)$ . In another example,²² the composite outcome consisted of events weighted according to their severity: RE-MI (weight w₁ = 0.415), CHF that required the use of open- label angiotensin-converting enzyme (ACE) inhibitors (weight w₂ = 0.17), and hospitalization to treat CHF (weight w₃ = 0.415).

Although the use of pre-specified weights provides a more nuanced approach to the importance of individual endpoints of a composite outcome, recognizes the potential underlying differences that exists among them, and facilitates the results interpretation compare to traditional composite endpoints, the selection of appropriate weights is not straightforward since inherently subjective.^22–24 However, when they exist, failing to use such utility (or severity) weights to highlight clinical importance of the component outcomes of a composite endpoint implies that we assume equal weights, which sometimes even worse.^23–25

We note that when the weights w₁ and w₂ are equal, i.e., $c_{1} = c_{2} = c_{3} = w_{1}^{2}$ , the test statistic Z_w coincides with the (ordinary) WMW test statistic Z given in equation (8). Indeed, in that case, $c' U = w_{1}^{2} [U_{t} + U_{t x} + U_{x}] = w_{1}^{2} U$ with U given by equation (5). Thus, $c' E_{0} (U) = w_{1}^{2} E_{0} (U)$ and $V a r_{0} (c' U) = w_{1}^{4} V a r_{0} (U)$ , which implies that Z = Z_w

2.3.2 Optimal weights

Now we want to estimate the optimal weights w for the weighted WMW test statistic

Z_{c} = \frac{c' (U - E_{0} (U)}{\sqrt{V a r_{0} (c' U)}} = \frac{c' (U - E_{0} (U))}{\sqrt{c' V a r_{0} (U) c}}

(14)

with U′ = (U_t, U_tx, U_x) and $c' = (c_{1}, c_{2}, c_{3}) = (w_{1}^{2}, w_{1} w_{2}, w_{2}^{2}) .$ Optimal weights c₁, c₂, and c₃ for the test statistic Z_w are those that maximize its power.

We will use the power formula of Z_c, to derive its optimal weights. Then, we introduce the optimal-weighted WMW test statistic Z_opt and highlight some of its properties and characteristics.

From the definition of U, we show in Appendix 2 that

\begin{array}{l} E (U) = (E (U_{t}), E (U_{t x}), E (U_{x}))' \\ = ({π_{t}}_{1} p_{1} p_{2}, p_{1} q_{2}, π_{x 1} q_{1} q_{2})' \end{array}

(15)

and Var(U) = Σ, where ${\sum = {(mn)}^{- 1} (\sum_{ij})}_{1 \leq i, j \leq 3}$ is a 3 × 3 matrix.

Under the null hypothesis of no difference between the two groups, with respect to both survival and nonfatal outcome, we have p₁ = p₂ = p, q₁ = q₂ = q = 1 − p, π_t₁ = π_x₁ = 1/2, and π_t₂ = π_x₂ = π_t₃ = π_x₃ = 1/3. Thus

E_{0} (U) = \frac{1}{2} (p^{2}, 2 p q, q^{2})' and {Var}_{0} (U) = \sum_{0}

(16)

where $\sum_{0} = {(mn)}^{- 1} {(\sum_{0 i j})}_{1 \leq i, j \leq 3}$ is a symmetric matrix with

\begin{array}{l} \sum_{011} = \frac{p^{2}}{12} A (p), \sum_{012} = \sum_{021} = - \frac{p^{2} q^{2}}{4} (n + m - 1), \sum_{013} = \sum_{031} = \frac{p^{2} q}{2} ((n - 1) q - m p) \\ \sum_{022} = \frac{q^{2}}{12} A (q), \sum_{023} = \sum_{032} = \frac{p q^{2}}{2} ((m - 1) p - n q), \sum_{033} = p q (n q^{2} + m p^{2} + p q) \\ A (x) = 6 + 4 (n + m - 2) x - 3 (n + m - 1) x^{2} \end{array}

Moreover, since Var₀(U_w) = Var₀(c′U) = c′Σ₀c ≥ 0 by definition, the matrix Σ₀ is a semi-positive definite.

The power formula for the weighted WMW, similar to equation (9), is

Φ (\frac{σ_{0 w}}{σ_{1 w}} z_{\frac{α}{2}} + \frac{μ_{1 w} - μ_{0 w}}{σ_{1 w}}) + Φ (\frac{σ_{0 w}}{σ_{1 w}} z_{\frac{α}{2}} - \frac{μ_{1 w} - μ_{0 w}}{σ_{1 w}}) \approx Φ [\frac{σ_{0 w}}{σ_{1 w}} (z_{\frac{α}{2}} + \frac{| μ_{1 w} - μ_{0 w} |}{σ_{0 w}})]

(17)

where μ_1w= c′E(U), μ_0w = c′E(U), σ_1w = c′Σc, and σ_0w= c′Σ₀c.

Under the assumptions that

n/m converges to a constant ρ (0 < ρ < 1),
both $\sqrt{N} {F_{1} (t) - F_{2} (t)} and \sqrt{N} {G_{1} (x) - G_{2} (x)}$ are bounded, i.e. $\frac{σ_{0 w}}{σ_{1 w}}$ converges to 1 as N = m + n →∞,

a weight-vector c maximizes the power (17) if and only if it maximizes |μ_1w − μ₀_w|/σ_0w.

We prove in Appendix 3 that the optimal-weight vector c_opt is given by

c_{opt} = \frac{\sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ}

(18)

for b′ = (1, 2, 1) and $μ = E (U) - E (U) = E_{0} (U) = (π_{t 1} p_{1} p_{2} - \frac{1}{2} p^{2}, p_{1} q_{2} - p q, π_{x 1} q_{1} q_{2} - \frac{1}{2} q^{2})' .$ Therefore, from equation (14), the corresponding optimal test statistic Z_w (denoted here Z_opt) is then given by

Z_{opt} = \frac{c_{opt}^{'} (U - E_{0} (U))}{\sqrt{c_{opt}^{'} \sum_{0} c_{opt}}} = \frac{μ' \sum_{0}^{- 1} (U - E_{0} (U))}{\sqrt{μ' \sum_{0}^{- 1} μ}}

(19)

2.3.3 Remarks

The test statistic Z_opt given by equation (19) encompasses the contributions of the effects of treatment on both mortality (via U_t) and the non-fatal outcome (via U_x) as well as the corresponding proportions of deaths and survivors in both treatment groups (via U_tx) and their relative importance and magnitude, where each component is weighted accordingly through c_opt.
As demonstrated, the ordinary WMW test statistic is a special case of a weighted WMW test statistics (corresponding to a weighted WMW test statistic with equal weights). This implies that both the ordinary and the optimal-weighted WMW test statistics belong to same family of weighted WMW tests.
Note that the optimal weight vector $c_{opt} = \sum_{0}^{- 1} μ$ depends on unknown population parameters π_t₁, π_x₁, p₁, p₂, and p which must be estimated in practice (since they are not available from the observed sample data). A good estimation method of these unknown parameters is needed to calculate the test statistic Z_opt given by equation (19):
1. When the distributions of the primary endpoint, X, and the survival time, t, are known approximately, we can estimate analytically the probabilities π_t₁ and π_x₁, p₁, p₂ (as we have done in Appendix 4 for our simulation studies) and calculate an estimate of the probability p under the null hypothesis (H₀) as $\hat{p} = (m {\hat{p}}_{1} + n {\hat{p}}_{2}) / (m + n)$ (pooled sample proportion).
  
  In general, the distributions of both the primary endpoint and the survival time are not known. Optimal weights are estimated using either data from a pilot study (or from previous studies, when available) or the data at hand.
2. If we have data from prior studies, we can leverage them to estimate these parameters. Using Bayesian methods, we can elicit expert opinions to define prior distributions associated with Σ₀ and μ that best reflect the characteristics of the disease under study and determine posterior distributions to provide a more accurate assessment of the optimal weights.²⁶ Alternatively, if the data are structured such that we have multiple strata available (e.g., different enrollment periods or different clinical centers for patients), we can use an adaptive weighting scheme to estimate Σ₀ and μ.^27,28
3. In absence of data from prior studies, it is recommended to use a bootstrap approach to estimate the weights. To do this, we generate B bootstrap samples (e.g., B = 500, 1000, or 2000) and, for each bootstrap sample, we estimate the corresponding optimal weight vector c_opt. Then, we compute the average weights from the B estimates. Finally, using these average weights, we compute the test statistic Z_opt on the original sample with the average weights estimated in the first part and test the null hypothesis.
4. With the data at hand, we can also use a K-fold cross-validation. In that regard, we divide the data into K subsets of roughly equal size and estimate the weights c_opt,_k and the test statistic Z_opt,_k exactly K times. At the k-th time, k = 1,…, K, we use the k-th subset as validation data to calculate the weights c_opt,k and combine the remaining K − 1 subsets as training data to estimate the test statistic Z_opt,k using the weights defined at the validation stage. Then, we estimate the test statistic Z_opt by averaging over all the K test statistics $Z_{{opt}_{k}}, k = 1, \dots, K$ and run the hypothesis test.

3 Simulation studies

We conducted simulation studies to assess the performance of the weighted test statistic. We generated data set to follow the pattern seen in stroke trials, where the outcome of interest (patient’s improvement on the NIHSS score over a three-month period) may be missing for some patients due to death. We simulated death times under a proportional hazards model with t_1k ~ Exp(λ₁), t_2l ~ Exp(λ₂), such that q₂ = exp(−λ₂T) and HR = λ₁/λ₂ with T = 3 months, HR = 1.0, 1.2, 1.4, 1.6, 2.0, 2.4, 3.0 and q₂ = 0.6, 0.8. For the non-fatal outcome, X_1k ~ N(0, 1), $X_{2l} ~ N (\sqrt{2} Δ_{x}, 1),$ k = 1,…,m; l = 1,…,n with $Δ_{x} = (μ_{x_{2}} - μ_{x_{1}}) / (σ_{x_{1}} \sqrt{2}) = 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6.$ The conditional probabilities, π_ty and π_xy, γ = 1, 2, 3, are given in Appendix 4. We computed power for the weighted WMW test for n = m = 50 patients, using the analytical power formula (17) and a two-sided α = 0.05. In addition, we estimated power empirically by averaging over 10,000 simulated data sets.

The results, given in Table 1, illustrate the accuracy of the analytical power formula (17). They indicate also that the weighted WMW test statistic is more powerful than the ordinary WMW test for the worst-rank score composite outcome. The largest differences are seen in two different scenarios:

The standardized difference in the non-fatal outcome Δ_x is small (Δ_x < 0.3) and the difference in mortality is moderate or high (HR ≥ 1.2)
The difference in mortality is small (HR < 1.2) and the standard difference in the non-fatal outcome Δ_x is moderate or high (Δ_x ≥ 0.3).

Table 1.

Power comparisons for a continuous outcome under proportional hazards for time to death.

Δ_x	HR
	q₂ = 60 %							q₂ = 80 %
	1.0	1.2	1.4	1.6	2.0	2.4	3.0	1.0	1.2	1.4	1.6	2.0	2.4	3.0
(a) Analytical power for the weighted WMW test
0.0	0.05^a	0.11	0.24	0.41	0.73	0.90	0.98	0.05^a	0.08	0.15	0.24	0.45	0.68	0.87
0.1	0.08	0.12	0.25	0.42	0.73	0.90	0.98	0.09	0.12	0.18	0.28	0.51	0.70	0.88
0.2	0.15	0.19	0.30	0.46	0.75	0.91	0.98	0.21	0.24	0.30	0.38	0.58	0.75	0.90
0.3	0.27	0.30	0.40	0.53	0.78	0.92	0.98	0.39	0.41	0.46	0.53	0.69	0.82	0.93
0.4	0.41	0.44	0.51	0.61	0.82	0.93	0.98	0.59	0.61	0.64	0.69	0.79	0.88	0.95
0.5	0.55	0.57	0.62	0.70	0.86	0.94	0.99	0.76	0.77	0.79	0.81	0.87	0.92	0.97
0.6	0.68	0.68	0.72	0.77	0.89	0.95	0.99	0.88	0.88	0.89	0.90	0.93	0.96	0.98
(b) Empirical power for the weighted WMW test
0.0	0.05^a	0.10	0.23	0.40	0.72	0.91	0.99	0.05^a	0.08	0.15	0.24	0.45	0.67	0.87
0.1	0.08	0.12	0.24	0.41	0.73	0.90	0.99	0.09	0.12	0.18	0.28	0.51	0.70	0.89
0.2	0.15	0.19	0.29	0.47	0.75	0.91	0.99	0.21	0.24	0.30	0.38	0.58	0.76	0.91
0.3	0.26	0.30	0.40	0.53	0.78	0.92	0.99	0.39	0.41	0.46	0.54	0.69	0.83	0.94
0.4	0.39	0.43	0.51	0.63	0.81	0.93	0.99	0.59	0.61	0.65	0.71	0.81	0.89	0.96
0.5	0.54	0.56	0.63	0.71	0.87	0.94	0.99	0.76	0.78	0.81	0.83	0.90	0.94	0.98
0.6	0.67	0.68	0.73	0.79	0.89	0.96	0.99	0.89	0.89	0.91	0.92	0.95	0.97	0.99
(c) Empirical power for the ordinary WMW test in worst-rank scores
0.0	0.05	0.09	0.17	0.31	0.62	0.84	0.98	0.05	0.06	0.09	0.13	0.29	0.48	0.74
0.1	0.06	0.12	0.22	0.38	0.67	0.87	0.98	0.08	0.11	0.17	0.24	0.42	0.61	0.82
0.2	0.07	0.16	0.30	0.44	0.74	0.90	0.99	0.14	0.21	0.29	0.37	0.56	0.72	0.89
0.3	0.12	0.22	0.37	0.53	0.78	0.93	0.99	0.26	0.33	0.43	0.53	0.70	0.83	0.94
0.4	0.16	0.29	0.44	0.59	0.82	0.94	0.99	0.40	0.50	0.59	0.66	0.81	0.89	0.96
0.5	0.22	0.36	0.52	0.66	0.86	0.96	0.99	0.57	0.66	0.73	0.79	0.89	0.95	0.98
0.6	0.30	0.44	0.59	0.70	0.88	0.97	0.99	0.71	0.78	0.84	0.88	0.94	0.97	0.99

Open in a new tab

The weights are equal and fixed to 1. We assumed the treatment is better either on both mortality and non-fatal outcome or on one outcome and not different from the control on the other outcome. We used exponential distributions for the survival times, normal distributions for the non-fatal outcome, and the same number of subjects in each group (n₁ = n₂ = 50). Δ_x: standardized mean difference on the non-fatal outcome of interest; HR: hazard ratio; q₂ survival probability (proportion of patients alive) at three months in the treatment group. (a) Estimated using formula (9); (b) and (c) Proportion of simulated data sets for which |Z_opt| > 1.96 and |Z| > 1.96, respectively.

Overall, these results mean that whenever the effect on the primary outcome is small, the larger difference in mortality is diluted when assessing the overall difference through the ordinary WMW, where mortality and the non-fatal outcome are weighted equally. Likewise, if the difference in mortality is small, but the difference in the non-fatal outcome is moderate or high, the ordinary WMW test on the composite outcome has less power than the weighted WMW.

4 Application to a stroke clinical trial

A clinical trial of NBO therapy was conducted at Massachusetts General Hospital for patients who had an acute ischemic stroke.^1,2 In this trial, 85 patients were randomly assigned to either NBO therapy (43 patients) or to room air (control) for 8 h and assessed serially with clinical function scores. The primary efficacy and safety endpoints were, respectively, the mean change in NIHSS from baseline to 4 h (during therapy) and 24 hours (after therapy).¹ For illustration purposes, we focused on the secondary endpoint and examined the mean change in NIHSS scores from baseline to three months or at discharge.

Twenty-four of the 85 patients died, 17 of whom were in the NBO group. Fifty-three patients (with 31 in the control group) were discharged prior to the three-month follow-up period. Subjects with missing three-month NIHSS scores were included in the estimation of the log rank test, but excluded in the assessment of the change in NIHSS scores. The log rank test of survival was significant (χ² = 6 with 1 d.f., p = 0.016), indicating that the active treatment had an unfavorable effect on mortality. The ordinary WMW test applied to the survivors was not significant (W = 572.5, p = 0.27). Using the untied worst-rank composite endpoint of death times and NIHSS scores, we found a significant result with the ordinary WMW test (W = 1112.5, p = 0.01).

Finally, we applied the proposed method, estimating the weights and the test statistic Z_w using B = 2000 bootstrap samples, as explained in part (iii) of the Remarks 2.3.3. The estimated weight vector c′, the mean difference μ, the variance-covariance matrix for U under the null, and the probability p were, respectively,

c' = (0.45, 0.16, 0.24), \sum_{0} = (\begin{array}{l} 0.59 & 0.50 & - 0.90 \\ 0.50 & 4.77 & - 1.27 \\ - 0.90 & - 1.27 & 5.16 \end{array}), μ = - (0.016, 0.098, 0.073), and p = 0.283.

This corresponds to w₁ = 0.61 and w₂ = 0.39, which means mortality was weighted more heavily (61 % of the weight) than NIHSS score, in addition to ranking death worse than any measure of the continuous outcome (NIHSS score). The optimally weighted WMW test statistic Z_opt was equal to 3.42 with a correspondingp value of 6.2 × 10⁻⁴. This result is stronger than that from the ordinary WMW test as it captures the significant difference in mortality between the two treatment groups and demonstrates the efficiency of our test statistic.

5 Discussion

In this paper, we have generalized the notion of the WMW test for a worst-rank composite outcome by deriving the optimally weighted WMW test. Against the null hypothesis of no difference on both mortality and continuous endpoint, we have focused on the alternative hypothesis that “the active treatment has a preponderance of positive effects on the multiple outcomes considered, while not being harmful for any.”²⁹ We have motivated the worst- rank composite outcome in the context of the clinical trial of a non-mortality primary outcome where the assessment of the primary outcome of interest at a pre-specified time-point may be precluded by death, any other debilitating event, or worsening of the disease condition. The corresponding composite outcome takes into account all patients enrolled in the trial, including those who had terminal events before the end of follow-up.

When there exists a hierarchy of the constituent endpoints of a composite outcome, the method we have presented in this paper enables different components of the WMW test statistic to be weighted differentially. Using weights allows for an additional level of discrimination between the component outcomes beyond ranks alone. While the worst-rank score mechanism pertains with how the different component outcomes of the composite endpoint are aggregated, assigning weights strengthen (or lessen) the influence these prioritized component outcomes exert in the overall composite. We considered weights obtained or elicited from expert judgments (utility weights) or determined in a way that the corresponding WMW test statistic has a maximum power. Based on a U-statistic approach, we first provided the test statistic and the power of the weighted WMW test when utilities (or severity) weights, determined a priori, are available. We also demonstrated that the ordinary (unweighted) WMW test on the worst-rank score outcome is a special case of the weighted WMW test, i.e. when the weights are all equal. Then, we derived the optimal weights such that the power of the corresponding weighted WMW test statistic is maximal. Finally, we conducted simulation studies to evaluate the accuracy of our power formula and confirmed, in the process, that the weighted WMW is more powerful than ordinary WMW test.

We applied the proposed method to the data from a clinical trial of NBO therapy for patients with acute ischemic stroke. Patients’ improvement was assessed using the National Institutes of Health Stroke Scale (NIHSS) Scores. The results indicated a statistically significant difference between NBO therapy and room air—using either the proposed method or the ordinary WMW test on the worst-rank composite outcome of death and change in NIHSS—which we couldn’t detect using the ordinary WMW on the survivors alone.

The difference between NBO therapy and room air was driven by the difference in mortality since there was a disproportionate number of NBO-treated patients who died. It is actually for this reason the trial was stopped by the Data and Safety Monitoring Board (DSMB) after 85 patients out of the projected 240 were enrolled. The stark imbalance between the two treatment group, although not attributed to the treatment, made it untenable to continue the trial.^1,30

The end result of the NBO trial is one of the dreaded scenarios in the (traditional) analysis of composite endpoints. That the active treatment must be better than the control for one or both of the constituent outcomes (mortality and non-fatal outcome) and not worse for either of them as suggested by our alternative hypothesis H₁ (stated in equation (4)), was clearly not the case for the NBO trial. While the active treatment was equivalent to the control treatment in change in NIHSS, the data showed also that NBO therapy increased mortality. Ideally, components of a composite endpoint should have similar clinical importance, frequency, and treatment effect. However, this is rarely the case as outcomes of different levels of severity are usually combined to facilitate the interpretation of such results, several authors have suggested running complementary analyses on components of the composite outcome.^31–38

When the impact of the active treatment on mortality is of greater clinical importance than its effect on the primary outcome of interest, the weighted WMW test statistic we have presented can be included into a set of testing procedures that ensure that the treatment is not inferior on both mortality and the outcome of interest and that it is superior on a least one of these endpoints. In the context of ischemic stroke, the clinical investigators desired a treatment that would have a positive impact on mortality while also improving survivors’ functional outcomes. Testing procedures that incorporate contributions of each individual component of the composite while penalizing for any disadvantage in the active treatment when the treatment operates in opposite directions on the components of the composite outcome have been discussed.^39–42 For the analysis of NBO clinical trial, we propose two different stepwise procedures to analyze data using this weighted test: (1) two individual non-inferiority tests on mortality and non-fatal outcome followed (if non-inferiority established) by a global test using the optimal- weighted WMW test on the worst-rank composite endpoint; or (2) a global test using the optimal-weighted WMW test on the worst-rank composite endpoint, and then (if significant global test) two individual non-inferiority tests followed by individual superiority tests on mortality and non-fatal outcome. In either scenario, the overall type I error is preserved.^39,40,43,44

The method presented in this paper can be applied or extended to many other settings of composite endpoints beyond the realm of death-censored observations. The rationale, advantages (and limitations), and recommendations for using composite outcomes—based on clinical information, expert knowledge or practical matters—abound in the literature.^14,35,45 One can also accommodate ties as well as noninformative censoring in the definition of the WMW U-statistic. In particular, when non-informative censoring is present (and, without loss of generality, assuming there is no ties), survival times can be assessed using Gehan’s U-statistic, which is an extension of the WMW U-statistic to right censored data.⁴⁶ In this case, I(t_1k < t_2l) will be equal to 1 if subject l in group 2 lived longer than subject k in group 1 and 0 if it is uncertain which subject lived longer.

Our method can be applied in many disease areas in which different outcomes are clinically related and represent the manifestation of the same underlying condition. Clinical trials of unstable angina and non-ST segment elevation myocardial infarction are examples of such an application.^47,48 The method can also be applied in clinical trials where the overall effect of treatment on a disease depends on hierarchy of meaningful—yet of different importance, magnitude, and impact—heterogenous outcomes. For instance, in clinical trials of asthma or of benign prostatic hyperplasia (BPH), several outcomes are necessary to capture the multifaceted manifestations of the disease. For patients with asthma, four outcomes (forced expiratory volume in 1 second (FEV₁), peak expiratory flow (PEF) rate, symptom score, and additional rescue medication use) are necessary to measure the different manifestations of the disease.⁴⁹. Due to subjective nature of BPH symptoms, in addition to BPH symptom score index, measures to assess disease progression include: prostate specific antigen (PSA), urinary cytology, post-void residual volume (PVR), urine flow rate, cystoscopy, urodynamic pressure-flow study, and ultrasound of the kidney or the prostate.

Our method does not immediately apply to the case where the treatment effect is assessed by stratifying for a confounding variable (baseline scores, baseline disease severity, age,…) pre-specified in the study design.^50,51 For the NBO trial, had the investigators anticipated the imbalance between subjects on some baseline variables (e.g., large infarcts, advanced age, co-morbidities, and most importantly, withdrawal of care based on pre-expressed wishes or family preference), they could have stratified the study population with respect to these variable.^1,30 The test statistic we have proposed does not adjust for such baseline covariates as the appropriate-weighted WMW test for this case must take into account the stratum specific characteristics in addition to the specificities of the worstranking procedure; this is a topic for future investigations.

A strong case may be made on why one should prefer analysis of covariance to the analysis of change from baseline score as we have done in this paper.⁵² But in reality, issues are more nuanced and the approach to use depends closely on the nature of the data as well as the clinical question of interest.^53–58 For the difference in NIHSS scores (from baseline to three months), the fundamental question of interest was “on average, how much NBO-treated patients changed over three-month period compare to patients assigned to room air?” The change- from-baseline-score paradigm assumes that the same measure is used before and after the treatment and that these two measures are highly correlated.^59,60 In the stroke literature, it is proven that change from baseline in NIHSS satisfies this assumption since baseline NIHSS is a strong predictor of outcome after stroke.^61,62 Moreover, it has been shown that change in the NIHSS score is a useful tool to measure treatment effect in acute stroke trials (see for instance the papers by Bruno et al.⁶³ and by Parsons et al.⁶⁴) Hence, this justified the choice of improvement (or change) in NIHSS score as outcome of interest in this paper.

We have assumed throughout this paper that mortality is worse than any impact ischemic stroke may have on patients. Our assumption stems from the common view that ranks death as inferior to any quality-of-life measure, such a view is advocated in several medical fields.^7,8,65–70 However, some people (patients, their family members or caregivers) may argue otherwise and affirm that there are levels of stroke that are worse than death. For instance, in a study of the effects of thrombolytic therapy in reducing damage from a myocardial infarction, the hierarchy of the quality of component outcomes was “stroke resulting in a vegetative state, death, serious morbidity requiring major assistance, serious morbidity but capable of self-care, excess spontaneous hemorrhage (≥ 3 blood transfusions), and 1–2 transfusions”.¹⁰ There are number of papers in the causal inference literature that offer an alternative approach based on Rosenbaum’s proposal of using different “placements of death”.⁷¹ However, as Rubin⁷² pointed out, this elegant idea “maybe difficult to convey to consumers”⁷² and we have not pursued this avenue here.

Finally, the null hypothesis (3) for WMW test stipulates that the treatment does not change the outcome distribution, which means that the treatment has no effect on any patient. However, some studies may require a weaker version of the null hypothesis, i.e. the treatment does not affect the average group response.^73,74 In such a case, the WMW is not an asymptotically valid test for the weaker null hypothesis.^75,76 As an alternative, one can use the Brunner and Munzel test⁷⁷ where the marginal distribution functions of the two treatment groups are not assumed to be equal and may have different shapes, even under the null hypothesis. In this paper, we have chosen the WMW test because it is simple, widely used, efficient, and robust against parametric distributional assumptions. The use of a weighted Brunner-Munzel test for analysis of the worst-rank composite outcome of death and a quality-of-life (such as the NIHSS score) warrants further investigations and is beyond the scope of this paper.

Acknowledgments

The content of this paper is solely the responsibility of the authors and does not necessarily represent the official view of the National Institutes of Health.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants P50-NS051343, R01-CA075971, T32 NS048005, and UL1TR001117 awarded by the National Institutes of Health. This work was also supported by grants 1R01HL118336-01 (PI: Anastasios Tsiatis) and R01- NS051412 awarded by the National Institutes of Health.

Appendix 1. Mean and variance of the U-statistic

Consider the untied worst-rank adjusted values for subjects in the control and active treatment groups ${\tilde{X}}_{1 k} = (1 - δ_{1 k}) X_{1 k} + δ_{1 k} (η + t_{1 k})$ , for k = 1,…, m and ${\tilde{X}}_{2 l} = (1 - δ_{2 l}) X_{2 l} + δ_{2 l} (η + t_{2 l}),$ for l = 1,…, n.

Define the WMW U-statistic

U = {(mn)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} U_{k l}, where U_{k l} = I ({\tilde{X}}_{1 k} < {\tilde{X}}_{2 l})

Since U_kl = 1 if {t_1k < t_2l and δ₁_kδ_2l = 1}, {δ_1k = 1 and δ_2l = 0}, or {X_1k < X_2l and (δ_1k = δ_1l = 0)}, we have U_kl = I(t₁_k < t₂_l, δ₁_kδ_2l = 1) + I(δ_1k = 1, δ_2l = 0) + I(X_1k < X₂_l, δ_1k = δ_2l = 0)

Therefore

\begin{array}{l} E (U) = E (U_{k l}) \\ = P (t_{1 k} < t_{2 l} | δ_{1 k} δ_{2 l} = 1) P (δ_{1 k} δ_{2 l} = 1) + P (δ_{1 k} = 1, δ_{2 l} = 0) + P (X_{1 k} < X_{2 l}) P (δ_{1 k} = δ_{2 l} = 0) \\ = p_{1} p_{2} \cdot P (t_{2 l} < t_{2 l} | δ_{1 k} = δ_{2 l} = 1) + p_{1} q_{2} + q_{1} q_{2} \cdot P (X_{1 k} < X_{2 l}) \\ = p_{1} p_{2} π_{t 1} + p_{1} q_{2} + q_{1} q_{2} π_{x 1} = π_{U 1} \end{array}

(20)

where q₁ = 1− p₁, q₂ = 1 − p₂, π_t1 = P(t_1k < t_2l|δ_1k = δ_2l =1), and π_x₁ = P(X_1k < X_2l)

\begin{array}{l} Var (U) = {(m m)}^{- 2} [\sum_{k = 1}^{m} \sum_{l = 1}^{n} Var (U_{k l}) + \sum_{k = 1}^{m} \sum_{l = 1}^{n} \sum_{k' = 1}^{m} \sum_{l' = 1}^{n} Cov (U_{k l}, U_{k' l'})], with k \neq k' or l \neq l' or both \\ = {(mn)}^{- 1} [Var (U_{k l}) + (m - 1) Cov (U_{k l}, U_{k' l}) + (n - 1) Cov (U_{k l}, U_{k l'})] \end{array}

Note that Cov(U_kl, U_k′l_′) = E(U_kl, U_k′l_′)−E(U_kl)E(U_k′l_′) = 0 Cov(U_kl, U_k′l_′) = E(U_kl, U_k′l_′)−E(U_kl)E(U_k′l_′) and Cov(U_kl, U_kl_′) = E(U_klU_kl_′) − E(U_kl)E(U_kl_′), for k ≠ k′, l ≠ l′. In addition, because $U_{k l} = I ({\tilde{X}}_{1 k} < {\tilde{X}}_{2 l})$ follows Bernoulli distribution with probability π_U₁, we derive the variance Var(U_kl) = E(U_kl)[1 − E(U_kl)] = π_U₁ (1 − π_U₁).

\begin{array}{l} E (U_{k l} U_{k' l}) = P (U_{k l} U_{k' l} = 1) \\ = P (δ_{1 k} δ_{1 k'} = 1, δ_{2 l} = 0) + P (t_{1 k} < t_{2 l}, t_{1 k'} < t_{2 l} | δ_{1 k} δ_{1 k'} δ_{2 l} = 1) P (δ_{1 k} δ_{1 k'} δ_{2 l} = 1) \\ + P (X_{1 k'} < X_{2 l}) P (δ_{1 k} = 1, δ_{1 k'} = δ_{2 l} = 0) + P (X_{1 k} < X_{2 l}) P (δ_{1 k} = 0, δ_{1 k'} = 1, δ_{2 l} = 0) \\ + P (X_{1 k} < X_{2 l}, X_{1 k'} < X_{2 l}) P (δ_{1 k} = δ_{1 k'} = δ_{2 l} = 0) \\ = p_{1}^{2} q_{2} + p_{1}^{2} p^{2} π_{t 2} + 2 p_{1} q_{2} π_{x 1} + q_{1}^{2} q^{2} π_{x 2} \\ E (U_{k l} U_{k l}) = P (U_{k l} U_{k l'} = 1) \\ = P (δ_{1 k} = 1, δ_{2 l} = δ_{2 l'} = 0) + P (t_{1 k} < t_{2 l}, t_{1 k} < t_{2 l'} | δ_{1 k} δ_{2 l} δ_{2 l'} = 1) P (δ_{1 k} δ_{2 l} δ_{2 l'} = 1) \\ + P (t_{1 k} < t_{2 l} | δ_{1 k} δ_{2 l} = 1, δ_{2 l} = 0) P (δ_{1 k} δ_{2 l} = 1, δ_{2 l'} = 0) \\ + P (t_{1 k} < t_{2 l} | δ_{1 k} = 1, δ_{2 l} = 0, δ_{2 l'} = 1) P (δ_{1 k} = 1, δ_{2 l'} = 0, δ_{2 l'} = 1) \\ + P (X_{1 k} < X_{2 l}, X_{1 k} < X_{2 l'}) P (δ_{1 k} = δ_{2 l} = δ_{2 l'} = 0) \\ = p_{1} q_{2}^{2} + p_{1} q_{2}^{2} π_{t 3} + 2 p_{1} p_{2} q_{2} π_{t 1} + q_{1} q_{2}^{2} π_{x 3} \\ with π_{t 2} = P (t_{1 k} < t_{2 l}, t_{1 k'} < t_{2 l} | δ_{1 k} = δ_{1 k'} = δ_{2 l} = 1), π_{x 2} = P (X_{1 k} < X_{2 l}, X_{1 k'} < X_{2 l}) \\ π_{t 3} = P (t_{1 k} < t_{2 l}, t_{1 k} < t_{2 l} | δ_{1 k} = δ_{2 l} = δ_{2 l'} = 1), and π_{x 3} = P (X_{1 k} < X_{2 l}, X_{1 k} < X_{2 l'}) \end{array}

In summary

Var (U) = {(mn)}^{- 1} [π_{U 1} (1 - π_{U 1}) + (m - 1) (π_{U 2} - π_{U 1}^{2}) + (n - 1) (π_{U 3} - π_{U 1}^{2})]

(21)

where $π_{U 2} = p_{1}^{2} q^{2} + p_{1}^{2} p^{2} π_{t 2} + 2 p_{1} q_{1} q_{2} π_{x 1} + q_{1}^{2} q_{2} π_{x 2}$ and $π_{U 3} = p_{1} q_{2}^{2} + p_{1} q_{2}^{2} π_{t 3} + 2 p_{1} p_{2} q_{2} π_{t 1} + q_{1} q_{2}^{2} π_{x 3} .$

Under the null hypothesis of no difference between the two groups, with respect to survival and non-fatal outcome, we have F₁ = F₂ = F, G₁ = G₂ = G, and p₁ = p₂ = p, q₁ = q₂ = q. This implies

\begin{array}{l} π_{t 1} = P (t_{1 k} < t_{2 l} / t_{1 k} \leq T, t_{2 l} \leq T) = \frac{1}{p^{2}} \int_{0}^{'} F (t) d F (t) = \frac{1}{2 p^{2}} [F {(T)}^{2} - F {(0)}^{2}] = \frac{1}{2} \\ π_{t 2} = P (t_{1 k} < t_{2 l}, t_{1 k'} < t_{2 l} | t_{1 k} \leq T, t_{1 k'} \leq T, t_{2 l} \leq T) = \frac{1}{p^{3}} \int_{0}^{'} F {(t)}^{2} d F (t) = \frac{1}{3 p^{3}} [F {(T)}^{3} - F {(0)}^{3}] = \frac{1}{3} \\ π_{t 3} = P (t_{1 k} < t_{2 l}, t_{1 k} < t_{2 l'} | t_{1 k} \leq T, t_{2 l} \leq T, t_{2 l'} \leq T)) = \frac{1}{p^{3}} \int_{o}^{T} {[1 - F (t)]}^{2} d F (t) = \frac{1}{3 p^{3}} {{[1 - F (T)]}^{3} - {[1 - F (0)]}^{3}} = \frac{1}{3} \\ π_{x 1} = P (X_{1 k} < X_{2 l}) = \int_{- \infty}^{\infty} G (x) d G (x) = \frac{1}{2} {[G {(x)}^{2}]}_{- \infty}^{\infty} = \frac{1}{2} \\ π_{x 2} = P (X_{1 k} < X_{2 l}, X_{1 k'} < X_{2 l}) = {\int_{- \infty}^{\infty} G {(t)}^{2} d G (t) = \frac{1}{3} [G {(x)}^{3}]}_{- \infty}^{\infty} = \frac{1}{3} \\ π_{x 3} = P (X_{1 k} < X_{2 l}, X_{1 k} < X_{2 l'}) \int_{- \infty}^{\infty} {[1 - G (t)]}^{2} d G (t) = - \frac{1}{3} {{[1 - G (x)]}^{3}}_{- \infty}^{\infty} = \frac{1}{3} \end{array}

Therefore

\begin{array}{l} π_{U 1} = p_{1} p_{2} π_{t 1} + p_{1} q_{2} + q_{1} q_{2} π_{x 1} = \frac{1}{2} p^{2} + p q + \frac{1}{2} q^{2} = \frac{1}{2} {(p + q)}^{2} = \frac{1}{2} \\ π_{U 2} = p_{1}^{2} q_{2} + p_{1}^{2} p_{2} π_{t 2} + 2 p_{1} q_{1} q_{2} π_{x 1} + q_{1}^{2} q_{2} π_{x 2} = p^{2} q + \frac{1}{3} p^{3} + p q^{2} + \frac{1}{3} q^{3} = \frac{1}{3} {(p + q)}^{3} = \frac{1}{3} \\ π_{U 3} = p_{1} q_{2}^{2} + p_{1} p_{2}^{2} π_{t 3} + 2 p_{1} p_{2} q_{2} π_{x 1} + q_{1} q_{2}^{2} π_{x 3} = p q^{2} + \frac{1}{3} p^{3} + p^{2} q + \frac{1}{3} q^{3} = \frac{1}{3} {(p + q)}^{3} = \frac{1}{3} . \end{array}

The mean and variance become

\begin{array}{l} μ_{0} = E_{0} (U) = π_{U 1} = \frac{1}{2} \\ σ_{0}^{2} = V a r_{0} (U) = {(mn)}^{- 1} [π_{U 1} (1 - π_{U 1}) + (m - 1) (π_{U 2} - π_{U 1}^{2}) + (n - 1) (π_{U 3} - π_{U 1}^{2})] \\ = {(m m)}^{- 1} [\frac{1}{2} (1 - \frac{1}{2}) + (m - 1) (\frac{1}{3} - {(\frac{1}{2})}^{2}) + (n - 1) (\frac{1}{3} - {(\frac{1}{2})}^{2})] \\ = {(m m)}^{- 1} [\frac{1}{4} + \frac{1}{12} (m - 1) + \frac{1}{12} (n - 1)] = \frac{m + n + 1}{12 m n} \end{array}

Appendix 2. Mean and variance of the weighted U-statistic

Consider the weights w = (w₁, w₂), we define the vector $c' = (c_{1}, c_{2}, c_{3}) = (w_{1}^{2}, w_{1} w_{2}, w_{2}^{2})$ . Let ${\tilde{X}}_{1 k} = w_{1} δ_{1 k} (η + t_{1 k}) + w_{2} (1 - δ_{1 k}) X_{1 k}$ , for k = 1,…, m and ${\tilde{X}}_{2 l} = w_{1} δ_{2 l} (η + t_{2 l}) + w_{2} (1 - δ_{2 l}) X_{2 l}$ , for l=1,…,n.

We define the weighted WMW U-statistic by c′U=(U_t, U_tx,U_x)where U′ = (U_t, U_tx,U_x) and

\begin{array}{l} U_{t} = {(m n)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} δ_{1 k} δ_{2 l} I (t_{1 k} < t_{2 l}) \\ U_{t x} = {(m n)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} δ_{1 k} (1 - δ_{2 l}) \\ U_{x} = {(m n)}^{- 1} \sum_{k = 1}^{m} \sum_{l = 1}^{n} (1 - δ_{1 k}) (1 - δ_{2 l}) I (X_{1 k} < X_{2 l}) \end{array}

(22)

\begin{array}{l} E (U) = (P (δ_{1 k} = 1) P (δ_{2 l} = 1) P (t_{1 k} < t_{2 l} | δ_{1 k} = δ_{2 l} = 1), P (δ_{1 k} = 1) P (δ_{2 l} = 0) p (δ_{1 k} = 0) P (δ_{2 l} = 0) P (X_{1 k} < X_{2 l}))' \\ = (p_{1} p_{2} \cdot P (t_{1 k} < t_{2 l} | δ_{1 k} = δ_{2 l} = 1), p_{1} q_{2}, q_{1} q_{2} \cdot P (X_{1 k} < X_{2 l}))' \\ = (p_{1} p_{2} π_{t 1}, p_{1} q_{2}, q_{1} q_{2} π_{x 1})' \end{array}

(23)

where q₁ =1−p₁, q₂=1−p₂, π_t₁ = $P (t_{1 k} < t_{2 l} | δ_{1 k} = δ_{2 l} = 1)$ and π_x₁ = P(X₁_k < X_2l).Var(U) = Σ, where $\sum = {(mn)}^{- 1} {(\sum_{i j})}_{1 \leq i, j \leq 3}$ is a 3 × 3 matrix such that

\begin{array}{l} \sum_{11} & = E [(U_{t} - p_{1} p_{2} π_{t 1}) (U_{t} - p_{1} p_{2} π_{t 1})] \\ = p_{1} p_{2} [π_{t 1} (1 - π_{t 1}) + p_{1} (m - 1) (π_{t 2} - π_{t 1}^{2}) + p_{2} (n - 1) (π_{t 3} - π_{t 1}^{2}) + π_{t 1}^{2} (m p_{1} q_{2} + (n - 1) p_{2} q_{1} + q_{1})] \\ \sum_{12} & = \sum_{21} = E [(U_{t} - p_{1} p_{2} π_{t 1}) (U_{t x} - p_{1} q_{2})] = π_{t 1} p_{1} p_{2} q_{2} [(n - 1) q_{1} - m p_{1}] \\ \sum_{13} & = \sum_{31} = E [(U_{t} - p_{1} p_{2} π_{t 1}) (U_{x} - q_{1} q_{2} π_{x 1})] = - π_{t 1} π_{x 1} (m + n - 1) p_{1} q_{1} p_{2} q_{2} \\ \sum_{22} & = E [(U_{t x} - p_{1} q_{2}) (U_{t x} - p_{1} q_{2} p)] = p_{1} q_{2} [m p_{1} p_{2} + (n - 1) q_{1} q_{2} + q_{1}] \\ \sum_{23} & = \sum_{32} = E [(U_{t x} - p_{1} q_{2}) (U_{x} - q_{1} q_{2} π_{x 1})] = π_{x 1} p_{1} q_{1} q_{2} [(m - 1) p_{2} - n q_{2}] \\ \sum_{33} & = q_{1} q_{2} [π_{x 1} (1 - π_{x 1}) + q_{1} (m - 1) (π_{x 2} - π_{x 1}^{2}) + q_{2} (n - 1) (π_{x 3} - π_{x 1}^{2}) + π_{x 1}^{2} (m q_{1} p_{2} + (n - 1) q_{2} p_{1} + p_{1})] \end{array} = = =

Therefore

Var (c' U) = c' \sum_{c}

Under the null hypothesis of no difference between the two groups, with respect to both survival and non-fatal outcome, we have p₁ = p₂ = p, q₁ = q₂ = q = 1 −p, π_x1 = 1/2, and π_t2 = π_x2 = π_t3 = π_x3 = 1/3: Thus

E_{0} (U) = \frac{1}{2} (p^{2}, 2 pq, q^{2})' and {Var}_{0} (U) = \sum_{0}

(24)

where $\sum_{0} = {(m n)}^{- 1} {(\sum_{0 i j})}_{1 \leq i, j \leq 3}$ is a symmetric matrix with

\begin{matrix} \sum_{011} = \frac{p^{2}}{12} A (p), \sum_{012} = \sum_{021} = \frac{p^{2} q}{2} ((n - 1) q - m p), \sum_{013} = \sum_{031} = - \frac{p^{2} q^{2}}{4} (n + m - 1) \\ \sum_{022} = p q (n q^{2} + m p^{2} + p q), \sum_{023} = \sum_{032} = \frac{p q^{2}}{2} ((m - 1) p - n q), \sum_{033} = \frac{q^{2}}{12} A (q) \\ A (x) = 6 + 4 (n + m - 2) x - 3 (n + m - 1) x^{2} \end{matrix}

Moreover, since $V a r_{0} (c' U) = c' \sum_{0} c \geq 0$ by definition, the matrix Σ₀ is positive semi-definite. In practice, p is estimated by the pooled sample proportion $\hat{p} = (m p_{1} + n p_{2}) / (m + n)$ and both E₀(U) and Var₀(U) are calculated accordingly.

Appendix 3. Optimal weights

From equation (17), we have

μ_{1 w} - μ_{0 w} = c_{1} (π_{t 1} p_{1} p_{2} - \frac{1}{2} p^{2}) + c_{2} (p_{1} q_{2} - p q) + c_{3} (π_{x 1} q_{1} q_{2} - \frac{1}{2} q^{2}) c' μ

where c′ = (c_1, c_2, c₃), c₁+2c₂+c₃ = 1, and $μ' = (π_{t 1} p_{1} p_{2} - \frac{1}{2} p^{2}, p_{1} q_{2} - p q, π_{x 1} q_{1} q_{2} - \frac{1}{2} q^{2})$ and p is estimated by $\hat{p} = (m p_{1} + n p_{2}) / (m + n) .$

We assume that det(Σ₀) > 0 i.e. Σ₀ is positive-definite. Maximizing $\frac{| μ_{1 w} - μ_{0 w} |}{σ_{0 w}},$ subject to c₁ + 2c₂+c₃ = 1 with respect to c corresponds to maximizing the Lagrange function

O (c, λ) = | c' μ | {(c' \sum_{0} c)}^{- \frac{1}{2}} - λ (c' b - 1)

with respect to the vector c and λ where λ is the Lagrange multiplier and b′ = (1, 2, 1)

Let $K (c) = sign (c' μ) [{(c' \sum_{0} c)}^{- \frac{3}{2}}],$ we have

\frac{\partial}{\partial c} O (c, λ) = K (c) [(c' \sum_{0} c) μ - (\sum_{0} c) (c' μ)] - λ b = 0

(25)

\frac{\partial}{\partial λ} O (c, λ) = c' b - 1 = 0

(26)

From equations (25) and (26), we have

0 = c' {K (c) [(c' \sum_{0} c) μ - (\sum_{0} c) (c' μ)] - λ b} = K (c) [(c' \sum_{0} c) c' μ - (c' \sum_{0} c) (c' μ)] - λ c' b = λ

because both (c′Σ₀c) and (c′μ) are scalars and c′b = c₁+2c₂+c₃ = 1.

Then equation (25) implies (c′Σ₀c)μ = (Σ₀c)(c′μ), i.e. $μ = (\sum_{0} c) \frac{(c' μ)}{(c' \sum_{0} c)} = \sum_{0} \frac{(c' μ)}{(c' \sum_{0} c)} c .$ Since we assume that the matrix $\sum_{0}^{- 1}$ exists, this implies

\sum_{0}^{- 1} μ = \frac{(c' μ)}{(c' \sum_{0} c)} c

(27)

and thus, $b' \sum_{0}^{- 1} μ = \frac{(c' μ)}{(c' \sum_{0} c)} b' c = \frac{(c' μ)}{(c' \sum_{0} c)} .$

Replacing $\frac{(c' μ)}{(c' \sum_{0} c)}$ by $b' \sum_{0}^{- 1} μ$ in equation (27) yields $\sum_{0}^{- 1} μ = (b' \sum_{0}^{- 1} μ) c .$ Therefore, the optimal weight-vector is

c_{opt} = \frac{\sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ}

(28)

as long as $b' \sum_{0}^{- 1} μ \neq 0.$ In addition

\begin{array}{l} \frac{\partial^{2}}{\partial c^{2}} {[O (c)]}_{c = c_{opt}} = sign (c' μ) {(c' \sum_{0}^{- 1} c)}^{- \frac{3}{2}} {[2 (c' \sum_{0}) μ - μ' (\sum_{0} c) - \sum_{0} (c' μ)]}_{c = c_{opt}} - 3 sign (c' μ) (\sum_{0} c) {(μ' \sum_{0}^{- 1} μ)}^{- \frac{5}{2}} {[(c' \sum_{0} c) μ - (\sum_{0} c) (c' μ)]}_{c = c_{opt}} \\ = 2 sign (c' μ) {(μ' \sum_{0}^{- 1} μ)}^{- \frac{3}{2}} {(b' \sum_{0}^{- 1} μ)}^{2} [μ μ' - (μ' \sum_{0}^{- 1} μ) \sum_{0}] \\ = 2 sign (b' \sum_{0}^{- 1} μ) {(μ' \sum_{0}^{- 1} μ)}^{- \frac{3}{2}} {(b' \sum_{0}^{- 1} μ)}^{2} [μ μ' - (μ' \sum_{0}^{- 1} μ) \sum_{0}] \end{array}

Since Σ₀ is positive definite, we can show that the border-preserving principal minors of order k > 2 have sign (−1)^k Therefore, $c = c_{opt} = \frac{\sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ}$ maximizes O(c).

Let us define two vectors. $d_{1}^{'} = (1, 1, 0)$ and $d_{2}^{'} = b' - d_{1}^{'} = (0, 1, 1) .$ To calculate w₁ and w₂, we just need to consider the relationships $c = (w_{1}^{2}, w_{1} w_{2}, w_{2}^{2})$ and w₁+w₂ = 1. We have $d_{1}^{'} c = w_{1}^{2} + w_{1} (1 - w_{1}) = w_{1} .$ Therefore, using the result given in equation (28), we can deduce $w_{1} = d_{1}^{'} c = \frac{d_{1}^{'} \sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ}$ and $w_{2} = 1 - d_{1}^{'} c = \frac{(b' - d_{1}^{'}) \sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ} = \frac{d_{2}^{'} \sum_{0}^{- 1} μ}{b' \sum_{0}^{- 1} μ} .$

Appendix 4. Conditional probabilities

D.1. Exponential distribution

Suppose that the death times t_1, t₂ follow exponential distributions with hazards λ₁, λ₂, respectively, and denote $θ = \frac{λ_{1}}{λ_{2}}, q_{1} = q_{2}^{θ}, and q_{2} = e^{- T λ_{2}}$ Given that $P (δ_{1 k} = 1) = p_{1}, p (δ_{21} = 1) = p_{2},$ we have

\begin{array}{l} π_{t 1} = P (t_{1 k} < t_{21} | δ_{1 k} = δ_{21} = 1) = {(p_{1} p_{2})}^{- 1} \int_{o}^{T} (1 - e^{- λ_{1} u}) λ_{2} e^{- λ_{2} u} d u \\ = \frac{1}{(1 - q_{2}^{θ})} [1 - \frac{1 - q_{2}^{(1 + θ)}}{(1 + θ) (1 - q_{2})}] \\ π_{t 2} = P (t_{1 k} < t_{2 l}, t_{1 k'} < t_{2 l} | δ_{1 k} = δ_{1 k'} = δ_{2 l} = 1) = p_{1}^{- 2} p_{2}^{- 1} \int_{o}^{T} {(1 - e^{- λ_{1} u})}^{2} λ_{2} e^{- λ_{2} u} d u \\ = {(1 - q_{2}^{θ})}^{- 2} {1 + \frac{1}{(1 - q_{2})} [\frac{1 - q_{2}^{(1 + 2 θ)}}{1 + 2 θ} - \frac{2 (1 - q_{2}^{(1 + θ)})}{1 + θ}]} \\ π_{t 3} = P (t_{1 k} < t_{2 l}, t_{1 k} < t_{2 l'} | δ_{1 k} = δ_{2 l} = δ_{2 l'} = 1) = p_{1}^{- 1} p_{2}^{- 2} \int_{0}^{T} {(e^{- λ_{2} T} - e^{- λ_{2} u})}^{2} λ_{1} e^{- λ_{1} u} d u \\ = {(\frac{q_{2}}{1 - q_{2}})}^{2} [1 + \frac{θ (1 - q_{2}^{(2 + θ)})}{(2 + θ) (1 - q_{2}^{θ}) q_{2}^{2}} - \frac{2 θ (1 - q_{2}^{(1 + θ)})}{(1 + θ) (1 - q_{2}^{θ}) q_{2}}] \end{array}

D.2. Normal distribution

Suppose that the non-fatal outcomes X1, X2 follow normal distributions N(μ_x_1, σ_x₁) and N(μ_x_2, σ_x₂), respectively.

Consider $Δ_{x} = \frac{μ_{x_{2}} - μ_{x_{1}}}{\sqrt{σ_{x_{1}}^{2} + σ_{x_{2}}^{2}}}$ , $ρ_{x_{j}} = \frac{σ_{x_{j}}^{2}}{σ_{x_{1}}^{2} + σ_{x_{2}}^{2}}$ , and $Z_{k l} = \frac{x_{1 k} - x_{2 l} - (μ_{x_{1}} - μ_{x_{2}})}{\sqrt{σ_{x_{1}}^{2} + σ_{x_{2}}^{2}}}$

We can show that

\begin{array}{l} π_{x 1} = P (X_{1 k} < X_{2 l}) = Φ (Δ_{x}) \\ π_{x 2} = P (X_{1 k} < X_{2 l}, X_{1 k'} < X_{2 l}) = P (Z_{k l} < Δ_{x}, Z_{k' l} < Δ_{x}) \\ π_{x 3} = P (X_{1 k} < X_{2 l}, X_{1 k} < X_{2 l'}) = P (Z_{k l} < Δ_{x}, Z_{k l'} < Δ_{x}) \\ (Z_{k l}, Z_{k' l}) \sim N ((\begin{array}{l} \begin{array}{l} 0 \\ 0 \end{array} \end{array}), (\begin{array}{l} 1 & ρ_{x_{2}} \\ ρ_{x_{2}} & 1 \end{array})) and (Z_{k l}, Z_{k l'}) \sim N ((\begin{array}{l} 0 \\ 0 \end{array}), (\begin{array}{l} 1 & ρ_{x_{1}} \\ ρ_{x_{1}} & 1 \end{array})) \end{array}

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

1.Singhal AB. Normobaric oxygen therapy in acute ischemic stroke trial. ClinicalTrials.gov Database. http://clinicaltrials.gov/ct2/show/NCT00414726 (accessed 7 November 2016)
2.Singhal AB. A review of oxygen therapy in ischemic stroke. Neurol Res. 2007;29:173–183. doi: 10.1179/016164107X181815. [DOI] [PubMed] [Google Scholar]
3.Little RJ, Rubin DB. Statistical analysis with missing data. Hoboken, New Jersey: Wiley; 2002. [Google Scholar]
4.Lachin J. Worst-rank score analysis with informatively missing observations in clinical trials. Control Clin Trials. 1999;20:408–422. doi: 10.1016/s0197-2456(99)00022-7. [DOI] [PubMed] [Google Scholar]
5.McMahon R, Harrell F., Jr Power calculation for clinical trials when the outcome is a composite ranking of survival and a nonfatal outcome. Control Clin Trials. 2000;21:305–312. doi: 10.1016/s0197-2456(00)00052-0. [DOI] [PubMed] [Google Scholar]
6.Matsouaka RA, Betensky RA. Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations. Stat Med. 2015;34:406–431. doi: 10.1002/sim.6355. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Felker GM, Maisel AS. A global rank end point for clinical trials in acute heart failure. Circulation. 2010;3:643–646. doi: 10.1161/CIRCHEARTFAILURE.109.926030. [DOI] [PubMed] [Google Scholar]
8.Follmann D, Wittes J, Cutler JA. The use of subjective rankings in clinical trials with an application to cardiovascular disease. Stat Med. 1992;11:427–437. doi: 10.1002/sim.4780110402. [DOI] [PubMed] [Google Scholar]
9.Bakal JA, Westerhout CM, Armstrong PW. Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Stat Med Med Res. 2012;24:980–988. doi: 10.1177/0962280211436004. [DOI] [PubMed] [Google Scholar]
10.Hallstrom A, Litwin P, Douglas Weaver W. A method of assigning scores to the components of a composite outcome: an example from the MITI trial. Control Clin Trials. 1992;13:148–155. doi: 10.1016/0197-2456(92)90020-z. [DOI] [PubMed] [Google Scholar]
11.Neaton J, Gray G, Zuckerman B, et al. Key issues in end point selection for heart failure trials: composite end points. J Cardiac Fail. 2005;11:567–575. doi: 10.1016/j.cardfail.2005.08.350. [DOI] [PubMed] [Google Scholar]
12.Califf R, DeMets D. Principles from clinical trials relevant to clinical practice: part I. Circulation. 2002;106:1015. doi: 10.1161/01.cir.0000023260.78078.bb. [DOI] [PubMed] [Google Scholar]
13.Braunwald E, Cannon C, McCabe C. An approach to evaluating thrombolytic therapy in acute myocardial infarction. The ‘unsatisfactory outcome’ end point. Circulation. 1992;86:683. doi: 10.1161/01.cir.86.2.683. [DOI] [PubMed] [Google Scholar]
14.Moyé L. Multiple analyses in clinical trials: fundamentals for investigators. New York City, New York: Springer Verlag; 2003. [Google Scholar]
15.Huang P, Tilley BC, Woolson RF, et al. Adjusting O’Brien’s test to control type i error for the generalized nonparametric behrens-fisher problem. Biometrics. 2005;61:532–539. doi: 10.1111/j.1541-0420.2005.00322.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Häberle L, Pfahlberg A, Gefeller O. Assessment of multiple ordinal endpoints. Biometrical J. 2009;51:217–226. doi: 10.1002/bimj.200810502. [DOI] [PubMed] [Google Scholar]
17.O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
18.Wei L, Johnson W. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72:359. [Google Scholar]
19.Finkelstein D, Schoenfeld D. Combining mortality and longitudinal measures in clinical trials. Stat Med. 1999;18:1341–1354. doi: 10.1002/(sici)1097-0258(19990615)18:11<1341::aid-sim129>3.0.co;2-7. [DOI] [PubMed] [Google Scholar]
20.Moyé L, Davis B, Hawkins C. Analysis of a clinical trial involving a combined mortality and adherence dependent interval censored endpoint. Stat Med. 1992;11:1705–1717. doi: 10.1002/sim.4780111305. [DOI] [PubMed] [Google Scholar]
21.Moyé LA, Lai D, Jing K, et al. Combining censored and uncensored data in a u-statistic: design and sample size implications for cell therapy research. Int J Biostat. 2011;7:1–29. doi: 10.2202/1557-4679.1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Sampson UK, Metcalfe C, Pfeffer MA, et al. Composite outcomes: weighting component events according to severity assisted interpretation but reduced statistical power. J Clin Epidemiol. 2010;63:1156–1158. doi: 10.1016/j.jclinepi.2010.01.019. [DOI] [PubMed] [Google Scholar]
23.Ahmad Y, Nijjer S, Cook CM, et al. A new method of applying randomised control study data to the individual patient: a novel quantitative patient-centred approach to interpreting composite end points. Int J Cardiol. 2015;195:216–224. doi: 10.1016/j.ijcard.2015.05.109. [DOI] [PubMed] [Google Scholar]
24.Wilson RF, Berger AK. Are all end points created equal? The case for weighting. J Am Coll Cardiol. 2011;57:546–548. doi: 10.1016/j.jacc.2010.10.014. [DOI] [PubMed] [Google Scholar]
25.Armstrong PW, Westerhout CM, Van de Werf F, et al. Refining clinical trial composite outcomes: An application to the assessment of the safety and efficacy of a new thrombolytic-3 (assent-3) trial. Am Heart J. 2011;161:848–854. doi: 10.1016/j.ahj.2010.12.026. [DOI] [PubMed] [Google Scholar]
26.Minas G, Rigat F, Nichols TE, et al. A hybrid procedure for detecting global treatment effects in multivariate clinical trials: theory and applications to fMRI studies. Stat Med. 2012;31:253–268. doi: 10.1002/sim.4395. [DOI] [PubMed] [Google Scholar]
27.Fisher LD. Self-designing clinical trials. Stat Med. 1998;17:1551–1562. doi: 10.1002/(sici)1097-0258(19980730)17:14<1551::aid-sim868>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
28.Ramchandani R, Schoenfeld DA, Finkelstein DM. Global rank tests for multiple, possibly censored, outcomes. Biometrics. 2016;72:s1–s10. doi: 10.1111/biom.12475. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Lachin JM, Bebu I. Application of the wei-lachin multivariate one-directional test to multiple event-time outcomes. ClinTrials. 2015;12:627–633. doi: 10.1177/1740774515601027. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Samson K. News from the AAN annual meeting: why a trial of normobaric oxygen in acute ischemic stroke was halted early. Neurol Today. 2013;13:34–35. [Google Scholar]
31.Freemantle N, Calvert M, Wood J, et al. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA. 2003;289:2554. doi: 10.1001/jama.289.19.2554. [DOI] [PubMed] [Google Scholar]
32.Cordoba G, Schwartz L, Woloshin S, et al. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. Br Med J. 2010;341:c3920. doi: 10.1136/bmj.c3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Tomlinson G, Detsky AS. Composite end points in randomized trials: there is no free lunch. JAMA. 2010;303:267–268. doi: 10.1001/jama.2009.2017. [DOI] [PubMed] [Google Scholar]
34.Ferreira-Gonzalez I, Permanyer-Miralda G, Busse J, et al. Composite outcomes can distort the nature and magnitude of treatment benefits in clinical trials. Ann Intern Med. 2009;150:566. doi: 10.7326/0003-4819-150-8-200904210-00016. [DOI] [PubMed] [Google Scholar]
35.Ferreira-Gonzalez I, Permanyer-Miralda G, Busse JW, et al. Methodologic discussions for using and interpreting composite endpoints are limited, but still identify major concerns. J Clin Epidemiol. 2007;60:651–657. doi: 10.1016/j.jclinepi.2006.10.020. [DOI] [PubMed] [Google Scholar]
36.Ferreira-Gonzalez I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007;334:786. doi: 10.1136/bmj.39136.682083.AE. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Lubsen J, Just H, Hjalmarsson A, et al. Effect of pimobendan on exercise capacity in patients with heart failure: main results from the Pimobendan in Congestive Heart Failure (PICO) trial. Heart. 1996;76:223. doi: 10.1136/hrt.76.3.223. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Lubsen J, Kirwan BA. Combined endpoints: can we use them? Stat Med. 2002;21:2959–2970. doi: 10.1002/sim.1300. [DOI] [PubMed] [Google Scholar]
39.Huque MF, Alosh M, Bhore R. Addressing multiplicity issues of a composite endpoint and its components in clinical trials. J Biopharm Stat. 2011;21:610–634. doi: 10.1080/10543406.2011.551327. [DOI] [PubMed] [Google Scholar]
40.Mascha EJ, Turan A. Joint hypothesis testing and gatekeeping procedures for studies with multiple endpoints. Anesth Anal. 2012;114:1304–1317. doi: 10.1213/ANE.0b013e3182504435. [DOI] [PubMed] [Google Scholar]
41.Dmitrienko A, D’Agostino RB, Huque MF. Key multiplicity issues in clinical drug development. Stat Med. 2013;32:1079–1111. doi: 10.1002/sim.5642. [DOI] [PubMed] [Google Scholar]
42.Sankoh AJ, Li H, D’Agostino RB. Use of composite endpoints in clinical trials. Stat Med. 2014;33:4709–4714. doi: 10.1002/sim.6205. [DOI] [PubMed] [Google Scholar]
43.Logan B, Tamhane A. Superiority inferences on individual endpoints following noninferiority testing in clinical trials. Biometrical J. 2008;50:693–703. doi: 10.1002/bimj.200710447. [DOI] [PubMed] [Google Scholar]
44.Röhmel J, Gerlinger C, Benda N, et al. On testing simultaneously non-inferiority in two multiple primary endpoints and superiority in at least one of them. Biometrical J. 2006;48:916–933. doi: 10.1002/bimj.200510289. [DOI] [PubMed] [Google Scholar]
45.Gómez G, Lagakos SW. Statistical considerations when using a composite endpoint for comparing treatment groups. Stat Med. 2013;32:719–738. doi: 10.1002/sim.5547. [DOI] [PubMed] [Google Scholar]
46.Gehan EA. A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]
47.Braunwald E, Antman EM, Beasley JW, et al. ACC/AHA 2002 guideline update for the management of patients with unstable angina and non-st-segment elevation myocardial infarctionsummary article: a report of the American College of Cardiology/American Heart Association Task force on practice guidelines (committee on the management of patients with unstable angina) J Am Coll Cardiol. 2002;40:1366–1374. doi: 10.1016/s0735-1097(02)02336-7. [DOI] [PubMed] [Google Scholar]
48.Grech E, Ramsdale D. Acute coronary syndrome: unstable angina and non-st segment elevation myocardial infarction. BMJ. 2003;326:1259. doi: 10.1136/bmj.326.7401.1259. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.National Asthma Education and Prevention Program (National Heart, Lung, and Blood Institute) Third Expert Panel on the Management of Asthma. Expert panel report 3: guidelines for the diagnosis and management of asthma. NIH Publication: US Department of Health and Human Services, National Institutes of Health, National Heart, Lung, and Blood Institute; 2007. [Google Scholar]
50.Van Elteren P. On the combination of independent two-sample tests of Wilcoxon. Bull Int Stat Inst. 1960;37:351–361. [Google Scholar]
51.Zhao Y. Sample size estimation for the van Elteren test - a stratified Wilcoxon-Mann-Whitney test. Stat Med. 2006;25:2675–2687. doi: 10.1002/sim.2441. [DOI] [PubMed] [Google Scholar]
52.Senn S. Change from baseline and analysis of covariance revisited. Stat Med. 2006;25:4334–4344. doi: 10.1002/sim.2682. [DOI] [PubMed] [Google Scholar]
53.Fitzmaurice G. A conundrum in the analysis of change. Nutrition. 2001;17:360–361. doi: 10.1016/s0899-9007(00)00593-1. [DOI] [PubMed] [Google Scholar]
54.van Breukelen GJ. Ancova versus change from baseline in nonrandomized studies: the difference. Multivariate Behav Res. 2013;48:895–922. doi: 10.1080/00273171.2013.831743. [DOI] [PubMed] [Google Scholar]
55.Shahar E, Shahar DJ. Causal diagrams and change variables. J Eval Clin Pract. 2012;18:143–148. doi: 10.1111/j.1365-2753.2010.01540.x. [DOI] [PubMed] [Google Scholar]
56.Pearl J. Technical report. Citeseer; 2014. Lord’s paradox revisited-(oh lord! kumbaya!) [Google Scholar]
57.Oakes JM, Feldman HA. Statistical power for nonequivalent pretest-posttest designs the impact of change-score versus ancova models. Eval Rev. 2001;25:3–28. doi: 10.1177/0193841X0102500101. [DOI] [PubMed] [Google Scholar]
58.Willett JB. Questions and answers in the measurement of change. Rev Res Edu. 1988;15:345–422. [Google Scholar]
59.Bonate PL. Analysis of pretest-posttest designs. Boca Raton, Florida: CRC Press; 2000. [Google Scholar]
60.Campbell DT, Kenny DA. A primer on regression artifacts. New York City, New York: Guilford Publications; 1999. [Google Scholar]
61.Young FB, Weir CJ, Lees KR, et al. Comparison of the national institutes of health stroke scale with disability outcome measures in acute stroke trials. Stroke. 2005;36:2187–2192. doi: 10.1161/01.STR.0000181089.41324.70. [DOI] [PubMed] [Google Scholar]
62.Adams H, Jr, Davis P, Leira E, et al. Baseline NIH Stroke Scale score strongly predicts outcome after stroke: a report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST) Neurology. 1999;53:126. doi: 10.1212/wnl.53.1.126. [DOI] [PubMed] [Google Scholar]
63.Bruno A, Saha C, Williams LS. Using change in the national institutes of health stroke scale to measure treatment effect in acute stroke trials. Stroke. 2006;37:920–921. doi: 10.1161/01.STR.0000202679.88377.e4. [DOI] [PubMed] [Google Scholar]
64.Parsons M, Spratt N, Bivard A, et al. A randomized trial of tenecteplase versus alteplase for acute ischemic stroke. N Engl J Med. 2012;366:1099–1107. doi: 10.1056/NEJMoa1109842. [DOI] [PubMed] [Google Scholar]
65.Brittain E, Palensky J, Blood J, et al. Blinded subjective rankings as a method of assessing treatment effect: a large sample example from the systolic hypertension in the elderly program (SHEP) Stat Med. 1997;16:681–693. doi: 10.1002/(sici)1097-0258(19970330)16:6<681::aid-sim487>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
66.Felker G, Anstrom K, Rogers J. A global ranking approach to end points in trials of mechanical circulatory support devices. J Cardiac Fail. 2008;14:368–372. doi: 10.1016/j.cardfail.2008.01.009. [DOI] [PubMed] [Google Scholar]
67.Allen LA, Hernandez AF, O’Connor CM, et al. End points for clinical trials in acute heart failure syndromes. J Am Coll Cardiol. 2009;53:2248–2258. doi: 10.1016/j.jacc.2008.12.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Sun H, Davison BA, Cotter G, et al. Evaluating treatment efficacy by multiple endpoints in phase ii acute heart failure clinical trials: analyzing data using a global method. Circulation. 2012;5:742–749. doi: 10.1161/CIRCHEARTFAILURE.112.969154. [DOI] [PubMed] [Google Scholar]
69.Subherwal S, Anstrom KJ, Jones WS, et al. Use of alternative methodologies for evaluation of composite end points in trials of therapies for critical limb ischemia. Am Heart J. 2012;164:277. doi: 10.1016/j.ahj.2012.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Berry JD, Miller R, Moore DH, et al. The combined assessment of function and survival (cafs): a new endpoint for als clinical trials. Amyotroph Lateral Scler Frontotemp Degen. 2013;14:162–168. doi: 10.3109/21678421.2012.762930. [DOI] [PubMed] [Google Scholar]
71.Rosenbaum PR. Comment: the place of death in the quality of life. Stat Sci. 2006;21:313–316. [Google Scholar]
72.Rubin DB. Rejoinder:causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Stat Sci. 2006;21:319–321. [Google Scholar]
73.Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv. 2010;4:1. doi: 10.1214/09-SS051. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Gail MH, Mark SD, Carroll RJ, et al. On design considerations and randomization-based inference for community intervention trials. Stat Med. 1996;15:1069–1092. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1069::AID-SIM220>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]
75.Pratt JW. Robustness of some procedures for the two-sample location problem. J Am Stat Assoc. 1964;59:650–665. [Google Scholar]
76.Chung E, Romano JP. Asymptotically valid and exact permutation tests based on two-sample U-statistics. J Stat Plann Infer. 2016;168:97–105. [Google Scholar]
77.Brunner E, Munzel U. The nonparametric behrens-fisher problem: asymptotic theory and a small-sample approximation. Biometrical J. 2000;42:17–25. [Google Scholar]

[R1] 1.Singhal AB. Normobaric oxygen therapy in acute ischemic stroke trial. ClinicalTrials.gov Database. http://clinicaltrials.gov/ct2/show/NCT00414726 (accessed 7 November 2016)

[R2] 2.Singhal AB. A review of oxygen therapy in ischemic stroke. Neurol Res. 2007;29:173–183. doi: 10.1179/016164107X181815. [DOI] [PubMed] [Google Scholar]

[R3] 3.Little RJ, Rubin DB. Statistical analysis with missing data. Hoboken, New Jersey: Wiley; 2002. [Google Scholar]

[R4] 4.Lachin J. Worst-rank score analysis with informatively missing observations in clinical trials. Control Clin Trials. 1999;20:408–422. doi: 10.1016/s0197-2456(99)00022-7. [DOI] [PubMed] [Google Scholar]

[R5] 5.McMahon R, Harrell F., Jr Power calculation for clinical trials when the outcome is a composite ranking of survival and a nonfatal outcome. Control Clin Trials. 2000;21:305–312. doi: 10.1016/s0197-2456(00)00052-0. [DOI] [PubMed] [Google Scholar]

[R6] 6.Matsouaka RA, Betensky RA. Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations. Stat Med. 2015;34:406–431. doi: 10.1002/sim.6355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Felker GM, Maisel AS. A global rank end point for clinical trials in acute heart failure. Circulation. 2010;3:643–646. doi: 10.1161/CIRCHEARTFAILURE.109.926030. [DOI] [PubMed] [Google Scholar]

[R8] 8.Follmann D, Wittes J, Cutler JA. The use of subjective rankings in clinical trials with an application to cardiovascular disease. Stat Med. 1992;11:427–437. doi: 10.1002/sim.4780110402. [DOI] [PubMed] [Google Scholar]

[R9] 9.Bakal JA, Westerhout CM, Armstrong PW. Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Stat Med Med Res. 2012;24:980–988. doi: 10.1177/0962280211436004. [DOI] [PubMed] [Google Scholar]

[R10] 10.Hallstrom A, Litwin P, Douglas Weaver W. A method of assigning scores to the components of a composite outcome: an example from the MITI trial. Control Clin Trials. 1992;13:148–155. doi: 10.1016/0197-2456(92)90020-z. [DOI] [PubMed] [Google Scholar]

[R11] 11.Neaton J, Gray G, Zuckerman B, et al. Key issues in end point selection for heart failure trials: composite end points. J Cardiac Fail. 2005;11:567–575. doi: 10.1016/j.cardfail.2005.08.350. [DOI] [PubMed] [Google Scholar]

[R12] 12.Califf R, DeMets D. Principles from clinical trials relevant to clinical practice: part I. Circulation. 2002;106:1015. doi: 10.1161/01.cir.0000023260.78078.bb. [DOI] [PubMed] [Google Scholar]

[R13] 13.Braunwald E, Cannon C, McCabe C. An approach to evaluating thrombolytic therapy in acute myocardial infarction. The ‘unsatisfactory outcome’ end point. Circulation. 1992;86:683. doi: 10.1161/01.cir.86.2.683. [DOI] [PubMed] [Google Scholar]

[R14] 14.Moyé L. Multiple analyses in clinical trials: fundamentals for investigators. New York City, New York: Springer Verlag; 2003. [Google Scholar]

[R15] 15.Huang P, Tilley BC, Woolson RF, et al. Adjusting O’Brien’s test to control type i error for the generalized nonparametric behrens-fisher problem. Biometrics. 2005;61:532–539. doi: 10.1111/j.1541-0420.2005.00322.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Häberle L, Pfahlberg A, Gefeller O. Assessment of multiple ordinal endpoints. Biometrical J. 2009;51:217–226. doi: 10.1002/bimj.200810502. [DOI] [PubMed] [Google Scholar]

[R17] 17.O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]

[R18] 18.Wei L, Johnson W. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72:359. [Google Scholar]

[R19] 19.Finkelstein D, Schoenfeld D. Combining mortality and longitudinal measures in clinical trials. Stat Med. 1999;18:1341–1354. doi: 10.1002/(sici)1097-0258(19990615)18:11<1341::aid-sim129>3.0.co;2-7. [DOI] [PubMed] [Google Scholar]

[R20] 20.Moyé L, Davis B, Hawkins C. Analysis of a clinical trial involving a combined mortality and adherence dependent interval censored endpoint. Stat Med. 1992;11:1705–1717. doi: 10.1002/sim.4780111305. [DOI] [PubMed] [Google Scholar]

[R21] 21.Moyé LA, Lai D, Jing K, et al. Combining censored and uncensored data in a u-statistic: design and sample size implications for cell therapy research. Int J Biostat. 2011;7:1–29. doi: 10.2202/1557-4679.1286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Sampson UK, Metcalfe C, Pfeffer MA, et al. Composite outcomes: weighting component events according to severity assisted interpretation but reduced statistical power. J Clin Epidemiol. 2010;63:1156–1158. doi: 10.1016/j.jclinepi.2010.01.019. [DOI] [PubMed] [Google Scholar]

[R23] 23.Ahmad Y, Nijjer S, Cook CM, et al. A new method of applying randomised control study data to the individual patient: a novel quantitative patient-centred approach to interpreting composite end points. Int J Cardiol. 2015;195:216–224. doi: 10.1016/j.ijcard.2015.05.109. [DOI] [PubMed] [Google Scholar]

[R24] 24.Wilson RF, Berger AK. Are all end points created equal? The case for weighting. J Am Coll Cardiol. 2011;57:546–548. doi: 10.1016/j.jacc.2010.10.014. [DOI] [PubMed] [Google Scholar]

[R25] 25.Armstrong PW, Westerhout CM, Van de Werf F, et al. Refining clinical trial composite outcomes: An application to the assessment of the safety and efficacy of a new thrombolytic-3 (assent-3) trial. Am Heart J. 2011;161:848–854. doi: 10.1016/j.ahj.2010.12.026. [DOI] [PubMed] [Google Scholar]

[R26] 26.Minas G, Rigat F, Nichols TE, et al. A hybrid procedure for detecting global treatment effects in multivariate clinical trials: theory and applications to fMRI studies. Stat Med. 2012;31:253–268. doi: 10.1002/sim.4395. [DOI] [PubMed] [Google Scholar]

[R27] 27.Fisher LD. Self-designing clinical trials. Stat Med. 1998;17:1551–1562. doi: 10.1002/(sici)1097-0258(19980730)17:14<1551::aid-sim868>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]

[R28] 28.Ramchandani R, Schoenfeld DA, Finkelstein DM. Global rank tests for multiple, possibly censored, outcomes. Biometrics. 2016;72:s1–s10. doi: 10.1111/biom.12475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Lachin JM, Bebu I. Application of the wei-lachin multivariate one-directional test to multiple event-time outcomes. ClinTrials. 2015;12:627–633. doi: 10.1177/1740774515601027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Samson K. News from the AAN annual meeting: why a trial of normobaric oxygen in acute ischemic stroke was halted early. Neurol Today. 2013;13:34–35. [Google Scholar]

[R31] 31.Freemantle N, Calvert M, Wood J, et al. Composite outcomes in randomized trials: greater precision but with greater uncertainty? JAMA. 2003;289:2554. doi: 10.1001/jama.289.19.2554. [DOI] [PubMed] [Google Scholar]

[R32] 32.Cordoba G, Schwartz L, Woloshin S, et al. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. Br Med J. 2010;341:c3920. doi: 10.1136/bmj.c3920. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Tomlinson G, Detsky AS. Composite end points in randomized trials: there is no free lunch. JAMA. 2010;303:267–268. doi: 10.1001/jama.2009.2017. [DOI] [PubMed] [Google Scholar]

[R34] 34.Ferreira-Gonzalez I, Permanyer-Miralda G, Busse J, et al. Composite outcomes can distort the nature and magnitude of treatment benefits in clinical trials. Ann Intern Med. 2009;150:566. doi: 10.7326/0003-4819-150-8-200904210-00016. [DOI] [PubMed] [Google Scholar]

[R35] 35.Ferreira-Gonzalez I, Permanyer-Miralda G, Busse JW, et al. Methodologic discussions for using and interpreting composite endpoints are limited, but still identify major concerns. J Clin Epidemiol. 2007;60:651–657. doi: 10.1016/j.jclinepi.2006.10.020. [DOI] [PubMed] [Google Scholar]

[R36] 36.Ferreira-Gonzalez I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007;334:786. doi: 10.1136/bmj.39136.682083.AE. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Lubsen J, Just H, Hjalmarsson A, et al. Effect of pimobendan on exercise capacity in patients with heart failure: main results from the Pimobendan in Congestive Heart Failure (PICO) trial. Heart. 1996;76:223. doi: 10.1136/hrt.76.3.223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Lubsen J, Kirwan BA. Combined endpoints: can we use them? Stat Med. 2002;21:2959–2970. doi: 10.1002/sim.1300. [DOI] [PubMed] [Google Scholar]

[R39] 39.Huque MF, Alosh M, Bhore R. Addressing multiplicity issues of a composite endpoint and its components in clinical trials. J Biopharm Stat. 2011;21:610–634. doi: 10.1080/10543406.2011.551327. [DOI] [PubMed] [Google Scholar]

[R40] 40.Mascha EJ, Turan A. Joint hypothesis testing and gatekeeping procedures for studies with multiple endpoints. Anesth Anal. 2012;114:1304–1317. doi: 10.1213/ANE.0b013e3182504435. [DOI] [PubMed] [Google Scholar]

[R41] 41.Dmitrienko A, D’Agostino RB, Huque MF. Key multiplicity issues in clinical drug development. Stat Med. 2013;32:1079–1111. doi: 10.1002/sim.5642. [DOI] [PubMed] [Google Scholar]

[R42] 42.Sankoh AJ, Li H, D’Agostino RB. Use of composite endpoints in clinical trials. Stat Med. 2014;33:4709–4714. doi: 10.1002/sim.6205. [DOI] [PubMed] [Google Scholar]

[R43] 43.Logan B, Tamhane A. Superiority inferences on individual endpoints following noninferiority testing in clinical trials. Biometrical J. 2008;50:693–703. doi: 10.1002/bimj.200710447. [DOI] [PubMed] [Google Scholar]

[R44] 44.Röhmel J, Gerlinger C, Benda N, et al. On testing simultaneously non-inferiority in two multiple primary endpoints and superiority in at least one of them. Biometrical J. 2006;48:916–933. doi: 10.1002/bimj.200510289. [DOI] [PubMed] [Google Scholar]

[R45] 45.Gómez G, Lagakos SW. Statistical considerations when using a composite endpoint for comparing treatment groups. Stat Med. 2013;32:719–738. doi: 10.1002/sim.5547. [DOI] [PubMed] [Google Scholar]

[R46] 46.Gehan EA. A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]

[R47] 47.Braunwald E, Antman EM, Beasley JW, et al. ACC/AHA 2002 guideline update for the management of patients with unstable angina and non-st-segment elevation myocardial infarctionsummary article: a report of the American College of Cardiology/American Heart Association Task force on practice guidelines (committee on the management of patients with unstable angina) J Am Coll Cardiol. 2002;40:1366–1374. doi: 10.1016/s0735-1097(02)02336-7. [DOI] [PubMed] [Google Scholar]

[R48] 48.Grech E, Ramsdale D. Acute coronary syndrome: unstable angina and non-st segment elevation myocardial infarction. BMJ. 2003;326:1259. doi: 10.1136/bmj.326.7401.1259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.National Asthma Education and Prevention Program (National Heart, Lung, and Blood Institute) Third Expert Panel on the Management of Asthma. Expert panel report 3: guidelines for the diagnosis and management of asthma. NIH Publication: US Department of Health and Human Services, National Institutes of Health, National Heart, Lung, and Blood Institute; 2007. [Google Scholar]

[R50] 50.Van Elteren P. On the combination of independent two-sample tests of Wilcoxon. Bull Int Stat Inst. 1960;37:351–361. [Google Scholar]

[R51] 51.Zhao Y. Sample size estimation for the van Elteren test - a stratified Wilcoxon-Mann-Whitney test. Stat Med. 2006;25:2675–2687. doi: 10.1002/sim.2441. [DOI] [PubMed] [Google Scholar]

[R52] 52.Senn S. Change from baseline and analysis of covariance revisited. Stat Med. 2006;25:4334–4344. doi: 10.1002/sim.2682. [DOI] [PubMed] [Google Scholar]

[R53] 53.Fitzmaurice G. A conundrum in the analysis of change. Nutrition. 2001;17:360–361. doi: 10.1016/s0899-9007(00)00593-1. [DOI] [PubMed] [Google Scholar]

[R54] 54.van Breukelen GJ. Ancova versus change from baseline in nonrandomized studies: the difference. Multivariate Behav Res. 2013;48:895–922. doi: 10.1080/00273171.2013.831743. [DOI] [PubMed] [Google Scholar]

[R55] 55.Shahar E, Shahar DJ. Causal diagrams and change variables. J Eval Clin Pract. 2012;18:143–148. doi: 10.1111/j.1365-2753.2010.01540.x. [DOI] [PubMed] [Google Scholar]

[R56] 56.Pearl J. Technical report. Citeseer; 2014. Lord’s paradox revisited-(oh lord! kumbaya!) [Google Scholar]

[R57] 57.Oakes JM, Feldman HA. Statistical power for nonequivalent pretest-posttest designs the impact of change-score versus ancova models. Eval Rev. 2001;25:3–28. doi: 10.1177/0193841X0102500101. [DOI] [PubMed] [Google Scholar]

[R58] 58.Willett JB. Questions and answers in the measurement of change. Rev Res Edu. 1988;15:345–422. [Google Scholar]

[R59] 59.Bonate PL. Analysis of pretest-posttest designs. Boca Raton, Florida: CRC Press; 2000. [Google Scholar]

[R60] 60.Campbell DT, Kenny DA. A primer on regression artifacts. New York City, New York: Guilford Publications; 1999. [Google Scholar]

[R61] 61.Young FB, Weir CJ, Lees KR, et al. Comparison of the national institutes of health stroke scale with disability outcome measures in acute stroke trials. Stroke. 2005;36:2187–2192. doi: 10.1161/01.STR.0000181089.41324.70. [DOI] [PubMed] [Google Scholar]

[R62] 62.Adams H, Jr, Davis P, Leira E, et al. Baseline NIH Stroke Scale score strongly predicts outcome after stroke: a report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST) Neurology. 1999;53:126. doi: 10.1212/wnl.53.1.126. [DOI] [PubMed] [Google Scholar]

[R63] 63.Bruno A, Saha C, Williams LS. Using change in the national institutes of health stroke scale to measure treatment effect in acute stroke trials. Stroke. 2006;37:920–921. doi: 10.1161/01.STR.0000202679.88377.e4. [DOI] [PubMed] [Google Scholar]

[R64] 64.Parsons M, Spratt N, Bivard A, et al. A randomized trial of tenecteplase versus alteplase for acute ischemic stroke. N Engl J Med. 2012;366:1099–1107. doi: 10.1056/NEJMoa1109842. [DOI] [PubMed] [Google Scholar]

[R65] 65.Brittain E, Palensky J, Blood J, et al. Blinded subjective rankings as a method of assessing treatment effect: a large sample example from the systolic hypertension in the elderly program (SHEP) Stat Med. 1997;16:681–693. doi: 10.1002/(sici)1097-0258(19970330)16:6<681::aid-sim487>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]

[R66] 66.Felker G, Anstrom K, Rogers J. A global ranking approach to end points in trials of mechanical circulatory support devices. J Cardiac Fail. 2008;14:368–372. doi: 10.1016/j.cardfail.2008.01.009. [DOI] [PubMed] [Google Scholar]

[R67] 67.Allen LA, Hernandez AF, O’Connor CM, et al. End points for clinical trials in acute heart failure syndromes. J Am Coll Cardiol. 2009;53:2248–2258. doi: 10.1016/j.jacc.2008.12.079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] 68.Sun H, Davison BA, Cotter G, et al. Evaluating treatment efficacy by multiple endpoints in phase ii acute heart failure clinical trials: analyzing data using a global method. Circulation. 2012;5:742–749. doi: 10.1161/CIRCHEARTFAILURE.112.969154. [DOI] [PubMed] [Google Scholar]

[R69] 69.Subherwal S, Anstrom KJ, Jones WS, et al. Use of alternative methodologies for evaluation of composite end points in trials of therapies for critical limb ischemia. Am Heart J. 2012;164:277. doi: 10.1016/j.ahj.2012.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] 70.Berry JD, Miller R, Moore DH, et al. The combined assessment of function and survival (cafs): a new endpoint for als clinical trials. Amyotroph Lateral Scler Frontotemp Degen. 2013;14:162–168. doi: 10.3109/21678421.2012.762930. [DOI] [PubMed] [Google Scholar]

[R71] 71.Rosenbaum PR. Comment: the place of death in the quality of life. Stat Sci. 2006;21:313–316. [Google Scholar]

[R72] 72.Rubin DB. Rejoinder:causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Stat Sci. 2006;21:319–321. [Google Scholar]

[R73] 73.Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv. 2010;4:1. doi: 10.1214/09-SS051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Gail MH, Mark SD, Carroll RJ, et al. On design considerations and randomization-based inference for community intervention trials. Stat Med. 1996;15:1069–1092. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1069::AID-SIM220>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]

[R75] 75.Pratt JW. Robustness of some procedures for the two-sample location problem. J Am Stat Assoc. 1964;59:650–665. [Google Scholar]

[R76] 76.Chung E, Romano JP. Asymptotically valid and exact permutation tests based on two-sample U-statistics. J Stat Plann Infer. 2016;168:97–105. [Google Scholar]

[R77] 77.Brunner E, Munzel U. The nonparametric behrens-fisher problem: asymptotic theory and a small-sample approximation. Biometrical J. 2000;42:17–25. [Google Scholar]

PERMALINK

An optimal Wilcoxon–Mann–Whitney test of mortality and a continuous outcome

Roland A Matsouaka

Aneesh B Singhal

Rebecca A Betensky

Abstract

1 Introduction

2 Weighted WMW

2.1 Notations

2.2 Ordinary WMW test

2.3 Weighted WMW test

2.3.1 Pre-specified weights

2.3.2 Optimal weights

2.3.3 Remarks

3 Simulation studies

Table 1.

4 Application to a stroke clinical trial

5 Discussion

Acknowledgments

Appendix 1. Mean and variance of the U-statistic

Appendix 2. Mean and variance of the weighted U-statistic

Appendix 3. Optimal weights

Appendix 4. Conditional probabilities

D.1. Exponential distribution

D.2. Normal distribution

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

An optimal Wilcoxon–Mann–Whitney test of mortality and a continuous outcome

Roland A Matsouaka

Aneesh B Singhal

Rebecca A Betensky

Abstract

1 Introduction

2 Weighted WMW

2.1 Notations

2.2 Ordinary WMW test

2.3 Weighted WMW test

2.3.1 Pre-specified weights

2.3.2 Optimal weights

2.3.3 Remarks

3 Simulation studies

Table 1.

4 Application to a stroke clinical trial

5 Discussion

Acknowledgments

Appendix 1. Mean and variance of the U-statistic

Appendix 2. Mean and variance of the weighted U-statistic

Appendix 3. Optimal weights

Appendix 4. Conditional probabilities

D.1. Exponential distribution

D.2. Normal distribution

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases