Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 1.
Published in final edited form as: Biometrics. 2012 Sep;68(3):983–989. doi: 10.1111/j.1541-0420.2011.01723.x

Median Tests for Censored Survival Data; a Contingency Table Approach

Shaowu Tang 1, Jong-Hyeon Jeong 1,*
PMCID: PMC4281497  NIHMSID: NIHMS650086  PMID: 23189327

Summary

The median failure time is often utilized to summarize survival data because it has a more straightforward interpretation for investigators in practice than the popular hazard function. However, existing methods for comparing median failure times for censored survival data either require estimation of the probability density function or involve complicated formulas to calculate the variance of the estimates. In this article, we modify a K-sample median test for censored survival data (Brookmeyer and Crowley, 1982, Journal of the American Statistical Association 77, 433–440) through a simple contingency table approach where each cell counts the number of observations in each sample that are greater than the pooled median or vice versa. Under censoring, this approach would generate noninteger entries for the cells in the contingency table. We propose to construct a weighted asymptotic test statistic that aggregates dependent χ2-statistics formed at the nearest integer points to the original noninteger entries. We show that this statistic follows approximately a χ2-distribution with k − 1 degrees of freedom. For a small sample case, we propose a test statistic based on combined p-values from Fisher’s exact tests, which follows a χ2-distribution with 2 degrees of freedom. Simulation studies are performed to show that the proposed method provides reasonable type I error probabilities and powers. The proposed method is illustrated with two real datasets from phase III breast cancer clinical trials.

Keywords: Censoring, Median failure time, Mood’s median test, Quantile, Survival data

1. Introduction

In clinical research, investigators are often interested in estimating the mean or median time to occurrence of an event in the population under study because the mean or median failure time would be intuitively more interpretable than the popular hazard function-based results. For example, for the analysis of data from clinical trials, it would be more straightforward to summarize the efficacy of a new drug in terms of extending the mean or median time from randomization to an event of interest or death by a certain number of years (Perry, Herndon, and Eaton, 1998; Hoskins, Swenerton, and Pike, 2002; Cunningham, Humblet, and Siena, 2004), because the hazard ratio only concerns the constant ratio of two instantaneous failure rates. Also in one of the real examples we use in Section 4, breast cancer patients in the experimental group were treated with tamoxifen (a hormonal therapy); therefore, investigators as well as patients would be interested in knowing how much longer a patient’s remaining life years can be prolonged by taking tamoxifen, rather than reduction in the hazard ratio due to the hormonal therapy. Because the median is less sensitive to the outliers, however, the median failure time is often preferred as a natural choice to summarize a time-to-event distribution (Reid, 1981).

The asymptotic variance formulas for 2- or K-sample test statistics for the median involve the probability density function of the underlying distribution (Brookmeyer and Crowley, 1982; Wang and Hettmansperger, 1990). This means that to evaluate a test statistic for an observed dataset one needs to estimate the probability density function under censoring, which is not always straightforward. To avoid estimating the probability density function, Su and Wei (1993) proposed a nonparametric test statistic for comparing two median failure times, based on the minimum dispersion statistic (Basawa and Koul, 1988). However, the algorithm to minimize the dispersion statistic could be cumbersome and time consuming yet their simulation results indicated that the test was conservative in type I error probabilities, which would generally lead to reduction in powers.

In this article, we modify the Brookmeyer and Crowley’s K-sample test for censored survival data (Brookmeyer and Crowley, 1982) through a simple contingency table approach, where each cell counts the number of observations in each sample that are greater than the pooled median or vice versa. In the method proposed in Brookmeyer and Crowley (1982), the variance formula for the test statistic involves the probability density function of the distribution of failure times, even though it disappears under the null hypothesis with balanced sample proportions. Furthermore, their standardized statistic involves the G-inverse that might not have the full rank, which led the authors to estimate the median from the extrapolated (continuous) version of the weighted Kaplan–Meier estimates that seems unnatural, especially for small sample cases. The proposed method is more straightforward in that the test statistic is a weighted sum of chi-square statistics from contingency tables at the nearest integer points to the original noninteger entries generated under censoring. This would make the implementation much simpler in practice for applied statisticians and medical investigators.

The rest of the article is organized as follows: In Section 2, the Brookmeyer and Crowley’s K-sample test is simplified in a contingency table setting, and new statistics are proposed to test the equality of the medians. In Section 3, extensive numerical studies are performed to assess the finite sample properties of the proposed test statistics. In Section 4, the proposed method is illustrated with breast cancer datasets from National Surgical Adjuvant Breast and Bowel Project (NSABP). We conclude with a brief remark in Section 5.

2. Median Test Statistics—Old and New

Assume that Xij is the jth independent subject drawn from group i with 1 ≤ ik, 1 ≤ jni, and θpool is the median from the pooled sample. Then for each group one can count the number of elements that are greater than θ̂pool for a median test (Mood, 1950). The results can be summarized into the following 2 × k contingency table:

Group 1 Group 2 Group k Total
> θpool n11 n21 nk1 m1
≤ θpool n12 n22 nk2 m2

Total n1 n2 nk N

Here N=m1+m2=ikni. For each nis, one can define the expected number µis as

μis=nimsN,wherei=1,2,,k;s=1,2.

Then a χ2-test statistic can be formed as

V=i=1ks=12(nisμis)2μis~χk12. (1)

It is well known that the chi-square approximation improves as µis increases, and µis ≥ 5 is usually sufficient for a good approximation.

Because the median test described above was designed only for complete data, in this section, we modify it for censored survival data. Let X be time to an event, which is a nonnegative random variable from a homogeneous population, and C be an independent censoring variable from an arbitrary distribution. Denote the corresponding survival function and density function as S(x) and f(x), respectively. Under censoring, one can only observe pairs of random variables (T, δ), where T = min(X,C) and δ = 1 for an event and δ = 0 for a censored observation. The median of the distribution of X, θ, is defined as

θ=min{x:S(x)12}. (2)

Now suppose that ni independent observations are drawn from the ith population (i = 1, 2, …, k). Let Tij be the jth observed survival time from the ith population (1 ≤ jni), δij being an associated censoring indicator. One can estimate the pooled median failure time θ̂pool as,

θ̂pool=min{t:Ŝpool(t)12},

where Ŝpool(t) is the weighted Kaplan–Meier estimates (Brookmeyer and Crowley, 1982) defined as

Ŝpool(t)=i=1kλiNŜi(t)withλiN=niN, (3)

where Ŝi (t) is the Kaplan–Meier estimate for the ith population (Kaplan and Meier, 1958).

Note that for 1 ≤ ik, the entries in the above contingency table can be generalized for censored data as

ni1=j=1niPr(Xij>θpool),

which can be estimated by

i1=j=1niPr(Xij>θ̂pool),

where Xij is the true failure time of the jth observation in the ith population, and Pr(Xij > θ̂pool) estimates the probability that the true failure time of the jth observation is beyond θ̂pool, which can be defined as the ‘pseudocount’ because it does not necessarily generate integer values. Specifically, for i = 1, 2, …, k and j = 1, …, ni, we have (Brookmeyer and Crowley, 1982)

Pr(Xij>θ̂pool)={1,ifTij>θ̂poolandδij=1,1,ifTijθ̂poolandδij=0,0,ifTijθ̂poolandδij=1,ij,ifTij<θ̂poolandδij=0,

where ij satisfies

ij=Pr(Xij>θ̂pool|Tij<θ̂pool,δij=0)=Pr(Xij>θ̂pool|Xij>Tij)=Pr(Xij>θ̂pool)Pr(Xij>Tij)=Ŝi(θ̂pool)Ŝi(Tij).

Now one can define

i2=j=1niPr(Xijθ̂pool)=nii1.

Therefore, the 2 × k contingency table for the modified median test can be displayed as follows:

Group 1 Group 2 Group k Total
>̂ θpool 11 21 k1 1
≤̂ θpool 12 22 k2 2

Total n1 n2 nk N

Note that although i1 + i2 = ni holds for i = 1, 2, …, k, in general i1, i2, and s are not integers. To generate 2 × k contingency tables with neighboring integer entries, let us define

ñi1=sup{nN:ni1}andñi2=niñi1,

where N is the set of positive integers. Obviously if ñi1 happens to be an integer, then ñi1 = i1 and ñi2 = i2. Otherwise we have

ñi1<i1<ñi1+1andñi21<i2<ñi2.

Given each set of the nearest integer points indexed by l(l = 1, 2, …, 2k), one can construct the following 2 × k contingency table:

Group 1 Group 2 Group k Total
>̃ θpool
ñ11(l)
ñ21(l)
ñk1(l)
1(l)
≤̃ θpool
ñ12(l)
ñ22(l)
ñk2(l)
2(l)

Total n1 n2 nk N

where n˜i1(l)=n˜i1orn˜i1+1,n˜i2(l)=nin˜i1(l),andm˜s(l)=i=1kn˜is(l)(s=1,2). This means that, for each i, there are only two choices for n˜i1(l), so that one can construct 2k such 2 × k contingency tables.

After a χk12statistic, Vl (l = 1, 2, …, 2k), is formed for each of 2k contingency tables, one can aggregate those 2k test statistics by assigning some weights to Vl’s. To this end, for i = 1, 2, …,, k, let us define

i1=ñi1+ηiwith0ηi<1,

and

ϖi(l)={1ηi,ifi1(l)=ñi1,ηi,ifi1(l)=ñi1+1.

and therefore the weight corresponding to the lth 2 × k contingency table is defined as

ϖ(l)=i=1kϖi(l). (4)

For example, for k = 2, the four weights can be given as

ϖ(1)=(1η1)(1η2),ϖ(2)=η1(1η2),ϖ(3)=η2(1η1),ϖ(4)=η1η2. (5)

In this case, if both η1 and η2 are small, then more weight is assigned to V1, and if both of them are large, then more weight is assigned to V4.

Therefore a way of aggregating information from 2k subtables would be through a weighted test statistic defined as

U=l=12kϖ(l)Vl. (6)

It will be shown that U follows a χ2-distribution with (k − 1) degrees of freedom. Proofs of the following lemma and theorem are provided in the Web Appendix.

Lemma 1. Under the null hypothesis of k equal medians, as n1, n2, …, nk → ∞,

Corr(Zg,Zh)=1,

where Zl=(n˜11(l),n˜21(l),,n˜k1(l),n˜12(l),n˜22(l),,n˜k2(l)), l = 1, 2, …, 2k

Theorem 1. Suppose that Vl (l = 1, 2, …, 2k) follows a χ2-distribution with (k − 1) degrees of freedom. Then under the null hypothesis of k equal medians, the statistic U=l=12kϖ(l)Vl,wherel=12kϖ(l)=1, approximately follows a χ2-distribution with (k − 1) degrees of freedom.

When µis in (1) is small, we can adopt an alternative approach to directly combining information through p-values. Assuming that pl is a p-value associated with Vl (l = 1, 2, …, 2k) computed from the Fisher’s exact test, one can define the weighted p-value associated with the statistic U as

Q=l=12kϖ(l)Ql=l=12kϖ(l){2log(pl)},

where Ql = −2 log(pl) follows a χ2-distribution with 2 degrees of freedom (Fisher, 1932) under the null hypothesis. Similarly as in Theorem 2, one can show that the statistic Q follows a χ2-distribution with 2 degrees of freedom under the null hypothesis.

3. Simulation Study—Type I Errors and Powers

Extensive simulation studies have been performed to study type I errors and powers of the proposed method at various nominal levels of α, sample sizes of ni and censoring proportions. Similarly as in Su and Wei (1993), two scenarios are used to generate failure times to investigate type I error probabilities as follows:

  1. S1(t) = S2(t) = exp(−t).

  2. S1(t) = exp(−t) and S2(t) = 1 − Φ(log(1.44t)).

In case 1, both samples are taken from the same distribution, so that the medians are equal. In case 2, two samples are taken from different distributions but with the same medians down to two decimal places. Censoring times are generated from Uniform(0, Ci), where Ci determines the censoring proportions for different distributions Si (t), i = 1, 2. We have chosen Ci’s such that the censoring proportions are close to those reported in Su and Wei (1993) for comparison. For the sample sizes of 30, 50, and 100 for each group and various scenarios of censoring proportions, 1000 iterations were carried out and the proportion of cases that gave significant difference in median failure times between two samples was evaluated at a given nominal level. The simulation results are summarized in Tables 1 and 2 for various significance levels of 5%, 10%, 15%, and 20%. One can notice that the proposed approach tends to be less conservative, compared with Su and Wei’s method (Table 1, Su and Wei, 1993).

Table 1.

Empirical type I error probabilities for case I S1(t) = S2(t) = exp(−t)

Mean censoring proportions

ni α 0.43,0.43 0.28,0.28 0.1,0.1 0.01,0.01 0.1,0.28 0.1,0.43
30 0.05 0.020 0.043 0.050 0.072 0.050 0.040
0.1 0.066 0.064 0.083 0.086 0.091 0.084
0.15 0.105 0.120 0.156 0.210 0.159 0.124
0.2 0.150 0.184 0.203 0.188 0.192 0.187
50 0.05 0.025 0.035 0.048 0.042 0.055 0.049
0.1 0.066 0.092 0.085 0.095 0.082 0.078
0.15 0.103 0.134 0.136 0.125 0.154 0.129
0.2 0.159 0.173 0.181 0.206 0.178 0.184
100 0.05 0.027 0.034 0.037 0.050 0.040 0.045
0.1 0.077 0.082 0.097 0.108 0.107 0.081
0.15 0.110 0.111 0.156 0.132 0.126 0.126
0.2 0.155 0.177 0.192 0.214 0.184 0.193

Table 2.

Empirical type I error probabilities for case II S1(t) = exp(−t), S2(t) = 1 − Φ(log(1.44t))

Mean censoring proportions

ni α 0.43,0.43 0.28,0.28 0.1,0.1 0.01,0.01 0.1,0.28 0.1,0.43
30 0.05 0.032 0.044 0.047 0.070 0.043 0.038
0.10 0.085 0.093 0.094 0.076 0.093 0.096
0.15 0.109 0.143 0.161 0.185 0.122 0.138
0.20 0.163 0.191 0.187 0.187 0.212 0.165
50 0.05 0.030 0.034 0.050 0.046 0.038 0.037
0.10 0.077 0.076 0.091 0.100 0.091 0.079
0.15 0.108 0.119 0.151 0.125 0.135 0.126
0.20 0.157 0.172 0.207 0.197 0.197 0.175
100 0.05 0.035 0.040 0.044 0.051 0.035 0.052
0.10 0.063 0.074 0.096 0.113 0.098 0.076
0.15 0.108 0.130 0.140 0.135 0.138 0.114
0.20 0.168 0.188 0.189 0.180 0.175 0.177

Power analysis has been also performed for various sample sizes and censoring proportions by increasing the median differences at a significance level of 0.05. Data were generated similarly as before from two distributions S1(t) = exp(−t) and S2(t) = exp(−t + t*), where t* is the median difference between the two exponential distributions. Table 3 summarizes the proportions of the proposed test statistic U being larger than the upper fifth percentile of the χ2-distribution with 1 degree of freedom, which quickly increase with larger median differences and sample sizes, as expected.

Table 3.

Empirical powers for S1(t) = 1 − exp(−t) and S2(t) = 1 − exp(−t + t*), where t* is the median difference; significance level = 0.05

Mean censoring proportions

ni t* 0.43,0.43 0.28,0.28 0.1,0.1 0.01,0.01 0.1,0.28 0.1,0.43
50 0.1 0.051 0.076 0.077 0.080 0.078 0.062
0.2 0.123 0.126 0.167 0.169 0.168 0.158
0.3 0.248 0.291 0.313 0.323 0.312 0.260
0.4 0.383 0.448 0.474 0.481 0.450 0.434
0.5 0.571 0.636 0.669 0.676 0.653 0.608
100 0.1 0.091 0.089 0.117 0.134 0.101 0.086
0.2 0.243 0.281 0.284 0.305 0.266 0.258
0.3 0.468 0.522 0.532 0.565 0.558 0.517
0.4 0.717 0.753 0.811 0.812 0.788 0.775
0.5 0.892 0.913 0.927 0.937 0.919 0.916
200 0.1 0.122 0.141 0.169 0.185 0.168 0.128
0.2 0.478 0.494 0.496 0.501 0.491 0.486
0.3 0.835 0.848 0.847 0.855 0.838 0.831
0.4 0.953 0.976 0.977 0.978 0.975 0.970
0.5 0.997 0.998 0.998 0.999 0.998 0.998

We have also compared powers of the proposed median tests, chi-square and p-value based (CombP), with other existing tests, log rank, Gehan (1965), and Brookmeyer and Crowley tests, by using the same simulation scenario as in Brookmeyer and Crowley (1982) for the double exponential case. In more details, for the baseline group, 50 failure times were generated from the double exponential distribution with the median of 100, and another 50 failure times were generated for the experimental group from the same distribution by decreasing the medians, i.e., 100, 99.8, 99.6, 99.4, and 99.2, so that the median differences would be 0, 0.2, 0.4, 0.6, and 0.8, respectively. Censoring times were also generated from the double exponential distribution with the median of 100.5. Figure 1 shows a Kaplan–Meier plot for one of the realizations of simulated data when the median difference is 0.6, which features with nonproportional hazards (p = 0.027). In fact, we noticed that the double exponential scenario generates a mix of proportional and nonproportional hazards data. Figure 2 shows that all of the compared tests have reasonable performance at the significance level of 0.05 under the null hypothesis, even though the proposed methods tend to be slightly conservative. We notice that the powers of the proposed tests are close to ones from Brookmeyer and Crowley (1982), especially for a larger difference in medians. The log-rank test is performing poorly due to the nonproportionality in this case compared to other tests, especially as the difference in the medians increases. In fact, we note that this type of comparison could be misleading because the general test such as log rank or Gehan compares the overall failure time distributions whereas the quantile test such as the median test compares a specific percentile. For example, the median test might perform better when there is a moderate middle difference between two failure time distributions, while the general tests might perform better when there is an early or a late difference between two distributions but the maximum difference occurs before or after the median failure time. Therefore we believe that the comparison with Su and Wei’s and Brookmeyer and Crowley’s test statistics would be more direct and meaningful even though the latter was unnaturally based on the extrapolated version of the weighted Kaplan–Meier estimates, as pointed out in Section 1. Therefore our simulation results indicate that both of the proposed methods are less conservative than Su and Wei’s and are comparable to Brookmeyer and Crowley’s, and the simple p-value-based method might be more powerful than other methods especially for a small sample case.

Figure 1.

Figure 1

Kaplan–Meier estimates of a simulated dataset from the double exponential distribution as in Brookmeyer and Crowley (1982). This figure appears in color in the electronic version of this article.

Figure 2.

Figure 2

Power comparison among the proposed methods (chi-square and CombP tests) and other methods (log-rank and Gehan’s tests) based on simulated datasets from the double exponential distribution as in Brookmeyer and Crowley (1982). This figure appears in color in the electronic version of this article.

4. Application to NSABP Data

In this section, we apply the proposed approach to two datasets from the NSABP studies (B-04 and B-14). The B-04 study (Fisher, Jeong, and Anderson, 2002) was designed to compare radical mastectomy with a less extensive surgery (total mastectomy) with or without radiation therapy. A total of 1079 women with clinically negative axillary nodes had either radical mastectomy or total mastectomy without axillary dissection if their nodes became positive. A total of 586 women with clinically positive axillary nodes experienced either radical mastectomy or total mastectomy without axillary dissection but with postoperative irradiation. The censoring proportion in node-negative patients is about 27%, while the censoring proportion in node-positive patients is about 16%. The second dataset comes from the NSABP B-14 study, where patients with primary breast cancer, negative axillary nodes, and estrogen receptor positive tumors were randomized to receive either tamoxifen (a hormonal therapy) or placebo following surgery. The trial itself is described in details in the literature (Fisher, Costantino, and Redmond, 1989). To demonstrate a small sample case, only 68 eligible patients with tumor size greater than 5 cm (30 from placebo group; 38 from tamoxifen group) were included in the second analysis. Comparison groups were nodal status (negative versus positive) in B-04 and treatment (placebo versus tamoxifen) in B-14.

A 2 × 2 table after calculating the sum of the pseudocounts for each cell in the B-04 dataset is given as follows:

Node negative Node positive Total
> θ̂pool 608.89 223.57 832.46
≤ θ̂pool 470.11 362.43 832.54

Total 1079 586 1665

With η1 = 0.89 and η2 = 0.57, equation (5) gives the weights as ϖ1 = 0.047, ϖ2 = 0.383, ϖ3 = 0.063, and ϖ4 = 0.507. Therefore, the four 2 × 2 contingency tables with neighboring integer entries are given by

Case 1:
Node negative Node positive Total
> θ̂pool 608 223 831
≤ θ̂pool 471 363 834

Total 1079 586 1665
Case 2:
Node negative Node positive Total
> θ̂pool 609 223 832
≤ θ̂pool 470 363 833

Total 1079 586 1665
Case 3:
Node negative Node positive Total
> θ̂pool 608 224 832
≤ θ̂pool 471 362 833

Total 1079 586 1665
Case 4
Node negative Node positive Total
> θ̂pool 609 224 833
≤ θ̂pool 470 362 832

Total 1079 586 1665

The four corresponding test statistics are V1 = 50.84, V2 = 51.35, V3 = 49.89, and V4 = 50.40, and equation (6) gives the combined test statistic U = 50.75 with a p-value of 1.05 × 10−12, which implies a significant difference in mean failure times between node-negative and node-positive women, consistent with the results from Jeong, Jung, and Costantino (2008). Note that the value of the combined test statistic U for the B-04 data is similar to those of Vi, i = 1, 2, 3,4 for this large sample case. The following table compares the results with other methods, including the combined p-value approach (CombP):

Method Test statistic p-value
Log rank 54.8 1.34 × 10−13
Gehan 69.1 1.11 × 10−16
Chi square (proposed) 50.75 1.05 × 10−12
CombP (proposed) 54.63 1.37 × 10−12

In this example, even if statistical conclusions are all the same, the Gehan’s test gives the most significant p-value due to the nonproportionality of hazards between node-negative and node-positive patients (p-value from the nonproportionality test = 0.000165).

Now let us consider the NSABP B-14 dataset. The 2 × 2 contingency table containing the sum of pseudocounts from each cell is given by

Placebo Tamoxifen Total
> θ̂pool 12.84 21.21 34.05
≤ θ̂pool 17.16 16.79 33.95

Total 30 38 68

Because η1 = 0.84 and η2 = 0.21 in this small sample example, the weights are calculated as ϖ1 = 0.13, ϖ2 = 0.66, ϖ3 = 0.03, and ϖ4 = 0.18. Therefore, the four 2 × 2 contingency tables with neighboring integer entries are given by

Case 1:
Placebo Tamoxifen Total
> θ̂pool 12 21 33
≤ θ̂pool 18 17 35

Total 30 38 68
Case 2:
Placebo Tamoxifen Total
> θ̂pool 13 21 34
≤ θ̂pool 17 17 34

Total 30 38 68
Case 3:
Placebo Tamoxifen Total
> θ̂pool 12 22 34
≤ θ̂pool 18 16 34

Total 30 38 68
Case 4:
Placebo Tamoxifen Total
> θ̂pool 13 22 35
≤ θ̂pool 17 16 33

Total 30 38 68

The four corresponding test statistics are V1 = 1.56, V2 = 0.95, V3 = 2.15, and V4 = 1.42 and hence the combined test statistic is U = 1.15 with χ12 p-value of 0.28, which implies the difference in median failure times between the two treatment groups among high-risk patients is not significant. The following table presents a comparison with other methods as before:

Method Test statistic p-value
Log rank 0.7 0.411
Gehan 1.2 0.269
Chi square (proposed) 1.15 0.28
CombP (proposed) 1.88 0.390

In this small sample example, the assumption of proportional hazards holds (p=0.16), so that the proposed combined p-value approach provides a similar result as the log-rank test, which is an optimal test under proportional hazards.

5. Conclusion

In this article, the Brookmeyer and Crowley’s median test (Brookmeyer and Crowley, 1982) has been modified through a contingency table approach. The proposed method is simple, is easy to be implemented, and does not involve estimation of the probability density function to evaluate the variance of the test statistic. The interpretation of the results based the median failure times is generally more straightforward and specific than ones from the hazard function approach. Our simulation results indicate that the proposed method is less conservative, compared to the results in Su and Wei (1993), and comparable to the Brookmeyer and Crowley’s test. The proposed method can be easily extended to adjust for other important (categorical) covariates by stratification. That is, a test statistic can be formed in each subcategory of other covariates and then a stratified test statistic can be formed by combining them. This article considered only inference on the median, but the results can be easily generalized to any quantiles.

Supplementary Material

Supplementary

Acknowledgements

We are thankful for thoughtful comments from a coeditor, an associate editor, and a referee, which improved the clarity of the article as well as broadened the scope of it substantially. This research was supported in part by National Health Institute (NIH) grants 5-U10-CA69974-09 and 5-U10-CA69651-11.

Footnotes

Supplementary Materials

Web Appendix referenced in Section 2 is available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

REFERENCES

  1. Basawa Iv, Koul HL. Large-sample statistics based on quadratic dispersion. International Statistical Review. 1988;56:199–219. [Google Scholar]
  2. Brookmeyer R, Crowley J. A K-sample median test for censored data. Journal of the American Statistical Association. 1982;77:433–440. [Google Scholar]
  3. Cunningham D, Humblet Y, Siena S. Cetuximab monotherapy and cetuximab plus irinotecan in irinotecan-refractory metastatic colorectal cancer. New England Journal of Medicine. 2004;351:337–345. doi: 10.1056/NEJMoa033025. [DOI] [PubMed] [Google Scholar]
  4. Fisher RA. Statistical Methods for Research Workers. London: Oliver and Boyd; 1932. [Google Scholar]
  5. Fisher B, Costantino J, Redmond C. A randomized clinical trial evaluating tamoxifen in the treatment of patients with node negative breast cancer who have estrogen-receptor-positive tumors. New England Journal of Medicine. 1989;320:479–484. doi: 10.1056/NEJM198902233200802. [DOI] [PubMed] [Google Scholar]
  6. Fisher B, Jeong J, Anderson S. Twenty-five year findings from a randomized clinical trial comparing radical mastectomy with total mastectomy and with total mastectomy followed by radiation therapy. New England Journal of Medicine. 2002;347:567–575. doi: 10.1056/NEJMoa020128. [DOI] [PubMed] [Google Scholar]
  7. Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]
  8. Hoskins PJ, Swenerton KD, Pike JA. Paclitaxel and carboplatin, alone or with irradiation, in advanced or recurrent endometrial cancer: A phase II study. Journal of Clinical Oncology. 2002;19:4048–4053. doi: 10.1200/JCO.2001.19.20.4048. [DOI] [PubMed] [Google Scholar]
  9. Jeong J-H, Jung SH, Costantino JP. Nonparametric inference on median residual life function. Biometrics. 2008;64:157–163. doi: 10.1111/j.1541-0420.2007.00826.x. [DOI] [PubMed] [Google Scholar]
  10. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]
  11. Mood AM. Introduction to the Theory of Statistics. New York: McGraw-Hill; 1950. [Google Scholar]
  12. Perry MC, Herndon JE, III, Eaton WL. Thoracic radiation therapy added to chemotherapy for small-cell lung cancer: An update of cancer and leukemia group B study 8083. Journal of Clinical Oncology. 1998;16:2466–2467. doi: 10.1200/JCO.1998.16.7.2466. [DOI] [PubMed] [Google Scholar]
  13. Reid N. Estimating the median survival time. Biometrika. 1981;68:601–608. [Google Scholar]
  14. Scatterthwaite FE. An approximate distribution of estimates of variance components. Biometrics Bulletin. 1946;2:110–114. [PubMed] [Google Scholar]
  15. Su JQ, Wei LJ. Nonparametric estimation for the difference or ratio of median failure times. Biometrics. 1993;49:603–607. [PubMed] [Google Scholar]
  16. Wang J-L, Hettmansperger TP. Two-sample inference for median survival times based on one-sample procedures for censored survival data. Journal of the American Statistical Association. 1990;85:529–536. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES