A Weighted Rank-Sum Procedure for Comparing Samples with Multiple Endpoints

Qizhai Li; Aiyi Liu; Kai Yu; Kai F Yu

doi:10.4310/sii.2009.v2.n2.a9

. Author manuscript; available in PMC: 2009 Oct 9.

Published in final edited form as: Stat Interface. 2009 Jan 1;2(2):197–201. doi: 10.4310/sii.2009.v2.n2.a9

A Weighted Rank-Sum Procedure for Comparing Samples with Multiple Endpoints

Qizhai Li ^1,², Aiyi Liu ³, Kai Yu ¹, Kai F Yu ^3,^*

PMCID: PMC2759535 NIHMSID: NIHMS99059 PMID: 19823699

Summary

For comparing the distribution of two samples with multiple endpoints, O’Brien (1984) proposed rank-sum-type test statistics. Huang et al. (2005) extended these statistics to the general nonparametric Behrens-Fisher hypothesis problem and obtained improved test statistics by replacing the ad hoc variance with the asymptotic variance of the rank-sum statistics. In this paper we generalize the work of O’Brien (1984) and Huang et al. (2005) and propose a weighted rank-sum statistic. We show that the weighted rank-sum statistic is asymptotically normally distributed, permitting the computation of power, p-values and confidence intervals. We further demonstrate via simulation that the weighted rank-sum statistic is efficient in controlling the type I error rate and under certain alternatives, is more powerful than the statistics of O’Brien (1984) and Huang et al.(2005).

Keywords: Asymptotic normality, Behrens-Fisher problem, Case-Control, Clinical trials, Multiple endpoints, Rank-sum statistics, Weights

1. Introduction

Comparison of two or more samples with multiple endpoints is a common statistical problem in biomedical research. As an example, O’Brien (1984) described a randomized clinical trial of two therapies for the treatment of diabetes to investigate whether the experimental therapy yields better nerve function as measured by 34 electromyographic variables. Huang et al. (2005) gave another example of a clinical trial of Coenzyme Q₁₀ in treating early Parkinson’s disease to slow the functional decline of the disease, as indexed by a number of outcome measures, including mentation, motor and average daily living scales. Other examples can be found in Pocock, Geller and Tsiatis (1987), Shames et al. (1998), Tilley et al. (2000), and Li, Zhao and Paty (2001), to name a few.

Hotelling’s T² and the Bonferroni procedure are two popular approaches for comparing two multivariate samples. Hotelling’s T² is a global test statistic and makes no distinction between variables in their direction of change. The Bonferroni procedure assigns the Type I error for each variable and then tests the null hypothesis concerning each individual variable. Noting these drawbacks of the two methods, O’Brien (1984) proposed a nonparametric procedure, a rank-sum-type test, which is based on the rank of each individual variable among the combined observations from the two samples. Under the null hypothesis that the two multivariate samples have the same distribution, O’Brien’s (1984) rank-sum test statistic asymptotically is distribution-free and follows a standard normal distribution. Huang et al. (2005) noticed that under a more general null hypothesis in the Behrens-Fisher problem, e.g. Troendle (2002), O’Brien’s (1984) test statistics are no longer distribution-free and can substantially inflate the Type I error rate when used for testing the general Behrens-Fisher hypothesis. Subsequently Huang et al. (2005) provided a modification of O’Brien’s (1984) test by adjusting for the variances of the rank sums.

Generalizing O’Brien’s (1984) rank-sum test and the modified test of Hang et al. (2005), we propose a weighted-rank-sum statistic for testing the general nonparametric Behrens-Fisher hypothesis. The weights can be chosen to be constants emphasizing the importance of the individual variables, or they can be chosen to minimize the variance of the weighted-rank-sum statistic. Under mild conditions, the weighted-rank-sum statistic is asymptotically normally distributed, thus permitting the computation of power, p-values and confidence intervals. Simulation studies demonstrate that the weighted rank-sum statistic is efficient in controlling type I error and is more powerful than the statistics of O’Brien (1984) and Huang et al. (2005) for certain alternatives.

2. Weighted-Rank-Sum Statistics for the Behrens-Fisher Problem

Suppose our interest is to compare the distribution of two p-dimensional variables, X = (X₁, ···, X_p)’, and Y = (Y₁, ···, Y_p)’, representing the outcomes of p endpoints from subjects in, say, the standard therapy arm and the experimental therapy arm in a clinical trial, or the controls and cases in a case-control study, respectively. We assume that X and Y follow distributions F and G, with marginal distributions F_a and G_a of X_a and Y_a respectively, where a = 1, ···, p. Following Huang et al. (2005), we define

θ_{a} = \Pr (X_{a} < Y_{a}) - \Pr (X_{a} > Y_{a}), a = 1, \dots, p,

(1)

and consider testing the null hypothesis

H_{0} : θ_{1} = \dots = θ_{p} = 0 .

(2)

This is a nonparametric version of the Behrens-Fisher problem. The null space under H₀, {(F, G): θ₁ = ··· = θ_p = 0.}, is larger than the usual null space under $H_{0}, {(F, G) : θ_{1} = \dots = θ_{p} = 0 .}$ . In a clinical trial setting θ_a can be viewed as a measure of the marginal treatment efficacy (corresponding to the ath endpoint) of the experimental therapy relative to the standard therapy, assuming that larger outcomes indicate better treatment results. Thus a larger positive value of θ_a indicates better treatment results with respect to the ath outcome variable for the experimental therapy than the standard.

2.1 Rank-sum type test statistics

Let x_i = (x_i1, ···, x_ip)’, i = 1, ···, m, be the outcomes for the ith subject from the X-sample and y_j = (y_j1, ···, y_jp)’, j = 1, ···, n, be the outcomes of the jth subject from the Y-sample, and write N = m + n. For the ath outcome variable, a = 1, ···, p, we combine the two samples and rank the N observations x_1a, ···, x_ma, y_1a, ···, y_na, and denote by R_xia and R_yja, the midrank of x_ia and y_ja, respectively. Then we observe $S_{x i} = \sum_{a = 1}^{p} R_{x i a}$ for each subject, i = 1, ···, m, from the X-sample, and $S_{y j} = \sum_{a = 1}^{p} R_{y j a}$ for each subject, j = 1, ···, n from the Y-sample, by summing up the ranks of the p variables. O’Brien (1984) suggested reducing the problem of comparing two multivariate distributions to one of comparing the rank sums between {S_xi, i = 1, ···, m} and {S_yj, j = 1, ···, n} using the usual two-sample t-tests. This yields a t-test statistic

T_{1} = \frac{{\overset{‒}{S}}_{y} - {\overset{‒}{S}}_{x}}{\hat{σ} \sqrt{1 ∕ m + 1 ∕ n}}, T_{2} = \frac{{\overset{‒}{S}}_{y} - {\overset{‒}{S}}_{x}}{\sqrt{{\hat{σ}}_{x}^{2} ∕ m + {\hat{σ}}_{y}^{2} n}},

(3)

analogous to the usual two-sample t-test with equal variances and unequal variances, respectively, where ${\overset{‒}{S}}_{x} = \sum_{i = 1}^{m} S_{x i} ∕ m$ , ${\overset{‒}{S}}_{y} = \sum_{j = 1}^{n} S_{y i} ∕ n$ , ${\hat{σ}}_{x}^{2} = \sum_{i = 1}^{m} {(S_{x i} - {\overset{‒}{S}}_{x})}^{2} ∕ (m - 1)$ , ${\hat{σ}}_{y}^{2} = \sum_{j = 1}^{n} {(S_{y j} - {\overset{‒}{S}}_{y})}^{2} ∕ (n - 1)$ and ${\hat{σ}}^{2} = ((m - 1) {\hat{σ}}_{x}^{2} + (n - 1) {\hat{σ}}_{y}^{2}) ∕ (N - 2)$ .

Huang et al. (2005) noticed that under the more restricted null hypothesis, $H_{0}^{'}$ : F = G, both T₁ and T₂ asymptotically follow the standard normal distribution. However, when F ≠ G these two statistics remain asymptotically normally distributed, but with nonunit variances. When used to test the Behrens-Fisher hypothesis H₀, these test statistics can substantially inflate the Type I error rate, as demonstrated in Huang et al. (2005). To make O’Brien’s (1984) test suitable for testing the null hypothesis H₀, Huang et al. (2005) derived the asymptotic variances of the two statistics and suggested using the following two modified test statistics for H₀:

T_{1 a} = \frac{{\overset{‒}{S}}_{y} - {\overset{‒}{S}}_{x}}{\hat{σ} \sqrt{{\hat{h}}_{1} (1 ∕ m + 1 ∕ n)}}, T_{2 a} = \frac{{\overset{‒}{S}}_{y} - {\overset{‒}{S}}_{x}}{\sqrt{{\hat{h}}_{2} (σ_{x}^{2} ∕ m + σ_{y}^{2} ∕ n)}},

(4)

where ${\hat{h}}_{1}$ and ${\hat{h}}_{2}$ are consistent estimates of

h_{1} = \frac{\sum_{a = 1}^{p} \sum_{b = 1}^{p} {(1 + λ)}^{2} (c_{a b} + d_{a b} λ)}{\sum_{a = 1}^{p} \sum_{b = 1}^{p} (e_{a b} λ^{3} + (d_{a b} + 2 f_{a b}) λ^{2} + (c_{a b} + 2 η_{a b}) λ + ξ_{a b})},

and

h_{2} = \frac{\sum_{a = 1}^{p} \sum_{b = 1}^{p} {(1 + λ)}^{2} (c_{a b} + d_{a b} λ)}{\sum_{a = 1}^{p} \sum_{b = 1}^{p} (d_{a b} λ^{3} + (e_{a b} + 2 η_{a b}) λ^{2} + (ξ_{a b} + 2 f_{a b}) λ + c_{a b})},

respectively, with $λ = m ∕ n$ , $c_{a b} = Cov (G_{a} (X_{a}), G_{b} (X_{b}))$ , $d_{a b} = Cov (F_{a} (Y_{a}), F_{b} (Y_{b}))$ , $e_{a b} = Cov (F_{a} (X_{a}), F_{b} (X_{b}))$ , $f_{a b} = Cov (F_{a} (X_{a}), G_{b} (X_{b}))$ , and $ξ_{a b} = Cov (G_{a} (Y_{a}), G_{b} (Y_{b}))$ , $η_{a b} = Cov (G_{a} (Y_{a}), F_{b} (Y_{b}))$ .

Huang et al. (2005) further showed that, under the null hypothesis H₀, the two test statistics asymptotically follow the standard normal distribution and thus the Type I error rate can be controlled at significance level α by rejecting H₀ if the magnitude of the test statistic exceeds the critical value of Φ^-1(1 - α/2), where Φ is the standard normal distribution function.

2.2 Weighted rank-sum statistics

O’Brien’s (1984) rank-sum test and the modified version of Huang et al. (2005) gave equal weights to the rank of each individual variable. In many situations unequal weights are desirable so that different emphasis can be assigned to different variables. Moreover statistical optimization requires that the weights be proportional to the reciprocals of the variances of the variables to be combined, when the variables are mutually independent. Taking these arguments into consideration, we propose using weighted rank-sum statistics. Let w_a ≥ 0, a = 1, . . ., p, be a constant or a random variable. The weighted rank for the ith subject in the X-sample is defined as $R_{x i} = \sum_{a = 1}^{p} w_{a} R_{x i a}$ , i = 1, ···, m, and the weighted rank for the jth subject in the Y-sample is $R_{y j} = \sum_{a = 1}^{p} w_{a} R_{y j a}$ , j = 1, ···, n. Setting w_a = 1 for every a, a = 1, . . . , p leads to the test statistics considered by O’Brien (1984) or Huang et al. (2005). Moreover, if only a subset of the p variables are of interest, we can set the weights to be one for variables in the subset and zero for variables not in the subset.

Using the weighted rank sum, we propose the following two test statistics:

T_{w 1} = \frac{{\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}}{\hat{σ} \sqrt{{\hat{h}}_{w 1} (1 ∕ m + 1 ∕ n)}}, T_{w 2} = \frac{{\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}}{\sqrt{{\hat{h}}_{w 2} ({\hat{σ}}_{x}^{2} ∕ m + {\hat{σ}}_{y}^{2} ∕ n)}},

(5)

where ${\overset{‒}{R}}_{x} = \sum_{i = 1}^{m} R_{x i} ∕ m$ , ${\overset{‒}{R}}_{y} = \sum_{j = 1}^{m} R_{y j} ∕ n$ , ${\hat{σ}}_{x}^{2} = \sum_{i = 1}^{m} {(R_{x i} - {\overset{‒}{R}}_{x})}^{2} ∕ (m - 1)$ , ${\hat{σ}}_{y}^{2} = \sum_{j = 1}^{n} {(R_{y j} - {\overset{‒}{R}}_{y})}^{2} ∕ (n - 1)$ and ${\hat{σ}}^{2} = [(m - 1) {\hat{σ}}_{x}^{2} + (n - 1) {\hat{σ}}_{y}^{2}] ∕ (m + n - 2)$ and ${\hat{h}}_{w 1}$ and ${\hat{h}}_{w 2}$ are consistent estimates (e.g. empirical estimates) of

h_{w 1} = \frac{\sum_{a = 1}^{p} \sum_{b = 1}^{p} {(1 + λ)}^{2} w_{a} w_{b} (c_{a b} + d_{a b} λ)}{\sum_{a = 1}^{p} \sum_{b = 1}^{p} w_{a} w_{b} [e_{a b} λ^{3} + (d_{a b} + 2 f_{a b}) λ^{2} + (c_{a b} + 2 η_{a b}) λ + ξ_{a b}]},

and

h_{w 2} = \frac{\sum_{a = 1}^{p} \sum_{b = 1}^{p} {(1 + λ)}^{2} w_{a} w_{b} (c_{a b} + d_{a b} λ)}{\sum_{a = 1}^{p} \sum_{b = 1}^{p} w_{a} w_{b} [d_{a b} λ^{3} + (e_{a b} + 2 η_{a b}) λ^{2} + (ξ_{a b} + 2 f_{a b}) λ + c_{a b}]},

respectively, with c_ab, d_ab, e_ab, f_ab, ξ_ab and η_ab as for Eq. (4).

We have the following results.

Theorem 1

Under the null hypothesis H₀, T_w1andT_w2 both converge in distribution to the standard normal distribution as min{m, n} → ∞ and 0 < λ = lim{m/n} < ∞.

Therefore H₀ is rejected if |T_w1| or |T_w2| is larger than Φ^-1(1 - α/2); both tests asymptotically maintain the Type I error rate at the nominal level of α.

Proof of Theorem 1

Denote the indicator function by I_{·} on a set by I_{·}. Let Λ = (λ_ab), with $λ_{a b} = Cov ({\overset{‒}{R}}_{y a} - {\overset{‒}{R}}_{x a}, {\overset{‒}{R}}_{y b} - {\overset{‒}{R}}_{x b})$ and let w = (w₁, ···, w_p)’.

{\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x} = \frac{m + n}{2 m n} \sum_{a = 1}^{p} w_{a} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {I_{{x_{i a} < y_{j a}}} - I_{{x_{i a} > y_{j a}}}} = \sqrt{m + n} w^{'} U,

where

U = \frac{\sqrt{m + n}}{2 m n} (\sum_{i = 1}^{m} \sum_{j = 1}^{n} {I_{{x_{i 1} < y_{j 1}}} - I_{{x_{i 1} > y_{j 1}}}}, \dots, \sum_{i = 1}^{m} \sum_{j = 1}^{n} {I_{{x_{i p} < y_{j p}}} - I_{{x_{i p} > y_{j p}}}})^{'} .

Note that U is a p-dimensional vector with each element being a U-statistic. It follows from standard asymptotic theory on U-statistics that, under the null hypothesis H₀, U converges to a p-variate normal distribution with mean 0 = (0, ···, 0)’ and variance-covariance matrix Δ = Cov(U) as min{m, n} → ∞ and 0 < m/n < ∞. Therefore $({\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}) ∕ \sqrt{w^{'} Δ w}$ asymptotically follows a standard normal distribution under the null hypothesis H₀. Hence, it suffices to show that $\hat{σ} \sqrt{{\hat{h}}_{w 1} (1 ∕ m + 1 ∕ n)}$ and $\sqrt{{\hat{h}}_{w 2} ({\hat{σ}}_{x}^{2} ∕ m + {\hat{σ}}_{y}^{2} ∕ n)}$ are consistent estimates of $\sqrt{w^{'} Δ w}$ , which can be derived following the proof of Theorem 1 and Theorem 2 in Huang et al. (2005).

3. Selection of Weights

The weights w can be chosen to meet practical needs, for example, to exclude some variables by setting the weights to be zero. In other situations the weights can be so determined that the variances of the weighted rank-sum is minimized at certain parameter values in the null or alternative hypothesis space. Here we search for the w that minimizes the variance V(w) of ${\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}$ under the null hypothesis. To this end, we first give the following definitions. For any a ∈ {1, ···, p}, define R_y(x_ia) to be the midrank of x_ia among {x_ia, y_1a, ···, y_na}, R_x(x_ia) the midrank of x_ia among {x_1a, ···, x_ma}, R_x(y_ja) the midrank of y_ja among {x_1a, ···, x_ma, y_ja}, R_y(y_ja) the midrank of y_ja among {y_1a, ···, y_na}. Let I_k be the identity matrix of order k, J_k be the column vector of order k whose elements are 1, and define $H_{k} = I_{k} - J_{k} J_{k}^{'} ∕ k$ . Then following Huang et al. (2005), we can obtaine consistent estimates of ${\hat{h}}_{w 1}$ and ${\hat{h}}_{w 2}$ as

{\hat{h}}_{w 1} = \frac{s^{2}}{m n} \times \frac{w^{'} (P^{'} P + U^{'} U) w}{w^{'} [{(P + Q)}^{'} (P + Q) + {(U + V)}^{'} (U + V)] w},

and

{\hat{h}}_{w 2} = s^{2} \times \frac{w^{'} (P^{'} P + U^{'} U) w}{w^{'} [n^{2} {(P + Q)}^{'} (P + Q) m^{2} {(U + V)}^{'} (U + V)] w},

respectively, where P = (p_ia)_m×p with $p_{i a} = 2 R_{y} (x_{i a}) - 2 - n + n {\hat{θ}}_{a}$ , Q = (q_ia)_m×p with q_ia = 2R_x(x_ia) - 1 - m, U = (u_ja)_n×p with $u_{j a} = 2 R_{x} (y_{j a}) - 2 - m - m {\hat{θ}}_{a}$ , and V = (v_ja)_n×p with q_ja = 2R_y(y_ja)-1-n, where i = 1, ···, m, a = 1, p, j = 1, ···, n, and ${\hat{θ}}_{a} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} {I_{{x_{i a} < y_{j a}}} - I_{{x_{i a} > y_{j a}}}} ∕ (m n)$ .

Hence, for T_w1 and T_w2, we have the estimated variances of ${\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}$ ,

\hat{V_{1} (w)} = \frac{{(m + n)}^{3}}{m^{2} n^{2} (m + n - 2)} \times \frac{w^{'} [M_{x}^{'} H_{m} M_{x} + M_{y}^{'} H_{n} M_{y}] w w^{'} (P^{'} P + U^{'} U) w}{w^{'} [{(P + Q)}^{'} (P + Q) + {(U + V)}^{'} (U + V)] w},

and

\hat{V_{2} (w)} = {(m + n)}^{2} \times \frac{w^{'} [\frac{M_{x}^{'} H_{m} M_{x}}{m (m - 1)} + \frac{M_{y}^{'} H_{n} M_{y}}{n (n - 1)}] w w^{'} (P^{'} P + U^{'} U) w}{w^{'} [n^{2} {(P + Q)}^{'} (P + Q) + m^{2} {(U + V)}^{'} (U + V)]},

where M_x = (R_xia) and M_y = (R_yja), the rank matrix for the X-sample and Y-sample, respectively.

The optimal weights w₁ (or w₂) are those that minimize the variance of ${\overset{‒}{R}}_{y} - {\overset{‒}{R}}_{x}$ , and they can be estimated by

{\hat{w}}_{1} = {argmin}_{w^{'} J_{p} = 1, w \geq 0} \hat{V_{1} (w)}, {\hat{w}}_{2} = {argmin}_{w^{'} J_{p} = 1, w \geq 0} \hat{V_{2} (w)} .

The weights and their estimates can be computed only numerically, since there are no closed forms. Furthermore numerical results show that ${\hat{w}}_{1} \approx {\hat{w}}_{2}$ . This is understandable since, as pointed out earlier, both $\hat{σ} \sqrt{{\hat{h}}_{w 1} (1 ∕ m + 1 ∕ n)}$ and $\sqrt{{\hat{h}}_{w 2} ({\hat{σ}}_{x}^{2} ∕ m + {\hat{σ}}_{y}^{2} ∕ n)}$ are consistent estimates of $\sqrt{w^{'} Δ w}$ .

4. Simulation Study and Real Data Example

4.1 Simulation studies

In this section, we conduct a simulation study to evaluate the type I error rate and power of the proposed tests, T_w1 and T_w2, for comparison with those of O’Brien (1984), T₁ and T₂, and Huang et al. (2005), T_h1 and T_h2. To this end, we consider X = (x_i1, X_i2)’, i = 1, . . ., m, random samples from a bivariate normal distribution with mean (0, 0.5)′ and variance-covariance matrix $(\begin{matrix} 1 & 0.8 \\ 0.8 & 1 \end{matrix})$ , and Y = (Y_j1, Y_j2)’, j = 1,..., n, random samples from a bivariate normal distribution with mean (0, 0.5)’ and variance-covariance matrix $(\begin{matrix} 4 & 7.2 \\ 7.2 & 16 \end{matrix})$ . Clearly the null hypothesis holds with these two distributions, i.e., for any i and j, Pr(X_i1 < Y_j1)-Pr(X_i1 > Y_j1) = Pr(X_i2 < Y_j2) - Pr(X_i2 > Y_j2) = 0. We generate 10,000 replicates for each pair of m and n selected from {50, 100, 200}. For each replicate the optimal weights are estimated from the simulated data using the method described in Section 3. The simulated Type I error is the proportion of the null hypothesis H₀ being rejected at nominal significance level of 0.05 (two-sided).

The simulated power is obtained similarly under the same settings except that the mean vector of the X-samples is set to (0,-0.5)’, and the variance-covariance matrices are set to $(\begin{matrix} 1 & 1.13 \\ 1.13 & 2 \end{matrix})$ for the X-sample and $(\begin{matrix} 2 & 2.55 \\ 2.55 & 4 \end{matrix})$ for the Y-sample.

Table 1 presents the simulation results for the type I error and power. The results indicate that both the methods in the present paper and in Huang et al. (1984) effectively maintain the nominal type I error, with a minor discrepancy possibly due to variation in the simulation. In comparison, O’Brien’s (1984) tests produce inconsistent type I error rates, mostly inflated over the nominal significance level of 0.05. For example, with m = 200, n = 100, the empirical type I error rates of O’Brien’s tests are 0.101 and 0.059, while the tests in Huang et al. (2005) are 0.051 and 0.050, and the proposed two tests give 0.054 and 0.053, respectively. The power of the proposed tests is substantially higher than those of O’Brien (1984) and Huang et al. (2005). For example, with m = 100, n = 50, the power values are, respectively, 0.394 and 0.351 for O’Brien’ (1984) tests, 0.360 and 0.355 for the tests of Huang et al. (2005), and 0.567 and 0.563, for the proposed tests, more than 15% higher than other tests. It is worth noting that even when O’Brien’s (1984) tests produce smaller type I error (m = 50, n = 100 and m = 100, n = 200), the proposed tests still achieve a considerably higher power than other tests.

Table 1.

Type I error and power results under significance level, 0.05. (10,000 replicates)

m	n	T₁	T₂	T_h1	T_h2	T_w1	T_w2
Type I error rate
50	50	0.064	0.063	0.052	0.051	0.056	0.055
100	100	0.066	0.066	0.052	0.051	0.052	0.052
200	200	0.061	0.061	0.048	0.048	0.049	0.049
50	100	0.031	0.069	0.051	0.051	0.053	0.053
100	200	0.029	0.067	0.049	0.050	0.050	0.051
100	50	0.105	0.061	0.056	0.054	0.060	0.059
200	100	0.101	0.059	0.051	0.050	0.054	0.053

Power
50	50	0.311	0.311	0.316	0.315	0.512	0.511
100	100	0.534	0.533	0.535	0.535	0.734	0.734
200	200	0.825	0.825	0.825	0.825	0.933	0.933
50	100	0.375	0.431	0.436	0.433	0.637	0.635
100	200	0.664	0.716	0.717	0.716	0.866	0.865
100	50	0.394	0.351	0.360	0.355	0.567	0.563
200	100	0.641	0.591	0.596	0.593	0.790	0.788

Open in a new tab

4.2 An example

The role of certain growth hormones is one major objective of The Growth and Maturation in Children with Autism or Autistic Spectrum Disorder (ASD) Study (the Autism/ASD Study), a case-control study conducted by the Eunice Kennedy Shriver National Institute of Child Health and Human Development in 2002-2005; see Mills et al. (2007) for details of subject enrollment and data collection. The study enrolled eighty-one subjects, 75 boys and 6 girls, diagnosed as having autism/ASD, and eighty age-matched controls (59 boys and 21 girls). Blood samples were assayed for insulin-like growth factors (IGF-1, IGF-2), insulin-like growth factor binding protein (IGFBP-3), and growth hormone binding protein (GHBP), as well as for dehydroepiandrosterone (DHEA) and DHEA-sulphate (DHEAS).

To illustrate the proposed methods in comparison with other approaches, we exclude from the analysis data from the girls, due to their small sample size, and four boys in the case group who did not provide blood samples, thus yielding 71 cases and 59 controls in the analysis. We confine our attention to five hormones: insulin-like growth factor-1 (IGF-1), insulin-like growth factor 2 (IGF-2), IGF binding protein (IGFBP-3), growth hormone binding protein (GHBP), and dehydroepiandrosterone (DHEA). DHEA-sulphate (DHEAS) was not included in the analysis since its levels were undetectable in more than half of the subjects (Mills et al., 2007). To be investigated is whether the levels of a growth-related hormone, if any, differ between cases and controls.

We applied the proposed test and the tests of O’Brien (1984) and Huang et al. (2005) to the five growth-related hormone levels in cases and controls. The P-values were 2.86 × 10^-6 and 1.75 × 10^-6 for O’Brien’s tests and 6.28 × 10^-7 and 6.30 × 10^-7 for the two tests of Huang et al. In contrast the proposed two tests give P-values 1.93 × 10^-7 and 3.33 × 10^-7, respectively. For this example, the proposed method is shown to be more powerful than O’Brien’s method, but only slightly better than the tests of Huan et al. (2005). This is partly because the differences between cases and controls all fall into the same direction, that is, for each hormone, its level among cases are higher than among controls.

5. Discussion

For testing the Behrens-Fisher hypothesis, we proposed a weighted rank-sum test statistic that effectively maintains the type I error rate and possesses higher power than the tests of O’Brien (1984) and Huang et al. (2005). The optimal weights do not have closed forms and need to be estimated using available data.

All the tests discussed are nonparametric in nature and are “global” in the sense that they summarize the multi-dimensional data into one-dimensional statistics. Under the most restricted null hypothesis that the two multivariate distributions are identical these tests are asymptotically equivalent. However, under the less restrictive Behrens-Fisher hypotheses, they perform differently. It would be interesting to see how these test statistics behave under other hypotheses.

The proposed test gains its power by accumulating evidence across comparisons on each individual outcomes, but may still have low power in situations when the differences in the outcomes between the two samples, as measured by the θ_as, exist but fall into different directions (some differences are positive and some are negative or zero). In this regard, more robust tests, such as the one described in Yu et al. (2006), could serve as plausible alternatives.

Acknowledgements

We would like to thank Dr. B.J. Stone for help. The authors are supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (AL, KFY), and the National Cancer Institute (QL, KY), National Institutes of Health. The opinions expressed in the article are not necessarily those of the National Institutes of Health. Q Li is also supported in part by the Knowledge Innovation Program of the Chinese Academy of Sciences, No. 30465W0 and 30475V0. The authors thank the referee, an Associate Editor and the Editor for helpful comments and suggestions.

References

Huang P, Tilley BC, Woolson RF, Lipsitz S. Adjusting O’Brien’s test to control type I error for the generalized nonparametric Behrens-Fisher problem. Biometrics. 2005;61:532–539. doi: 10.1111/j.1541-0420.2005.00322.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li DK, Zhao GJ, Paty DW. Randomized controlled trial of interferon-beta- 1a in secondary progressive MS: MRI results. Neurology. 2001;56:1505–1513. doi: 10.1212/wnl.56.11.1505. [DOI] [PubMed] [Google Scholar]
Mills JL, Hediger ML, Molloy CA, Chrousos GP, Manning-Courtney P, Yu KF, Brasington M, England LJ. Elevated levels of growth-related hormones in autism and autism spectrum disorder. Clinical Endocrinology. 2007;67:230–237. doi: 10.1111/j.1365-2265.2007.02868.x. [DOI] [PubMed] [Google Scholar]
O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
Pocock SJ, Geller NL, Tsiatis AA. The analysis of multiple endpoints in clinical trials. Biometrics. 1987;43:487–498. [PubMed] [Google Scholar]
Shames RS, Heilbron DC, Janson SL, Kishiyama JL, Au DS, Adelman DC. Clinical differences among women with and without self-reported perimenstrual asthma. Annals of Allergy Asthma Immunology. 1998;81:65–72. doi: 10.1016/S1081-1206(10)63111-0. [DOI] [PubMed] [Google Scholar]
Tilley BC, Pillemer SR, Heyse SP, Li S, Clegg DO, Alarcn GS. Global statistical tests for comparing multiple outcomes in rheumatoid arthritis trials. Arthritis and Rheumatism. 1999;42:1879–1888. doi: 10.1002/1529-0131(199909)42:9<1879::AID-ANR12>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
Troendle JF. A likelihood ratio test for the nonparametric Behrens-Fisher problem. Biometrical Journal. 2002;44:813–824. [Google Scholar]
Yu K, Gu C, Xiong CJ, An P, Province MA. Global transmission/disequilibrium tests for haplotypes reconstructed from multiple genes. Genetic Epidemiology. 2005;29:323–335. doi: 10.1002/gepi.20102. [DOI] [PubMed] [Google Scholar]

[R1] Huang P, Tilley BC, Woolson RF, Lipsitz S. Adjusting O’Brien’s test to control type I error for the generalized nonparametric Behrens-Fisher problem. Biometrics. 2005;61:532–539. doi: 10.1111/j.1541-0420.2005.00322.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Li DK, Zhao GJ, Paty DW. Randomized controlled trial of interferon-beta- 1a in secondary progressive MS: MRI results. Neurology. 2001;56:1505–1513. doi: 10.1212/wnl.56.11.1505. [DOI] [PubMed] [Google Scholar]

[R3] Mills JL, Hediger ML, Molloy CA, Chrousos GP, Manning-Courtney P, Yu KF, Brasington M, England LJ. Elevated levels of growth-related hormones in autism and autism spectrum disorder. Clinical Endocrinology. 2007;67:230–237. doi: 10.1111/j.1365-2265.2007.02868.x. [DOI] [PubMed] [Google Scholar]

[R4] O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]

[R5] Pocock SJ, Geller NL, Tsiatis AA. The analysis of multiple endpoints in clinical trials. Biometrics. 1987;43:487–498. [PubMed] [Google Scholar]

[R6] Shames RS, Heilbron DC, Janson SL, Kishiyama JL, Au DS, Adelman DC. Clinical differences among women with and without self-reported perimenstrual asthma. Annals of Allergy Asthma Immunology. 1998;81:65–72. doi: 10.1016/S1081-1206(10)63111-0. [DOI] [PubMed] [Google Scholar]

[R7] Tilley BC, Pillemer SR, Heyse SP, Li S, Clegg DO, Alarcn GS. Global statistical tests for comparing multiple outcomes in rheumatoid arthritis trials. Arthritis and Rheumatism. 1999;42:1879–1888. doi: 10.1002/1529-0131(199909)42:9<1879::AID-ANR12>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]

[R8] Troendle JF. A likelihood ratio test for the nonparametric Behrens-Fisher problem. Biometrical Journal. 2002;44:813–824. [Google Scholar]

[R9] Yu K, Gu C, Xiong CJ, An P, Province MA. Global transmission/disequilibrium tests for haplotypes reconstructed from multiple genes. Genetic Epidemiology. 2005;29:323–335. doi: 10.1002/gepi.20102. [DOI] [PubMed] [Google Scholar]

PERMALINK

A Weighted Rank-Sum Procedure for Comparing Samples with Multiple Endpoints

Qizhai Li

Aiyi Liu

Kai Yu

Kai F Yu

Summary

1. Introduction

2. Weighted-Rank-Sum Statistics for the Behrens-Fisher Problem

2.1 Rank-sum type test statistics

2.2 Weighted rank-sum statistics

Theorem 1

Proof of Theorem 1

3. Selection of Weights

4. Simulation Study and Real Data Example

4.1 Simulation studies

Table 1.

4.2 An example

5. Discussion

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Weighted Rank-Sum Procedure for Comparing Samples with Multiple Endpoints

Qizhai Li

Aiyi Liu

Kai Yu

Kai F Yu

Summary

1. Introduction

2. Weighted-Rank-Sum Statistics for the Behrens-Fisher Problem

2.1 Rank-sum type test statistics

2.2 Weighted rank-sum statistics

Theorem 1

Proof of Theorem 1

3. Selection of Weights

4. Simulation Study and Real Data Example

4.1 Simulation studies

Table 1.

4.2 An example

5. Discussion

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases