Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Aug 30.
Published in final edited form as: J Biopharm Stat. 2016 Feb 16;27(1):124–134. doi: 10.1080/10543406.2016.1148711

Sample Size Calculation for Testing Differences Between Cure Rates with the Optimal Log-rank Test

Jianrong Wu 1
PMCID: PMC5575886  NIHMSID: NIHMS898238  PMID: 26882262

Abstract

In this article, sample size calculations are developed for use when the main interest is in the differences between the cure rates of two groups. Following the work of Ewell and Ibrahim, the asymptotic distribution of the weighted log-rank test is derived under the local alternative. The optimal log-rank test under the proportional distributions alternative is discussed, and sample size formulae for the optimal and standard log-rank tests are derived. Simulation results show that the proposed formulae provide adequate sample size estimation for trial designs and that the optimal log-rank test is more efficient than the standard log-rank test, particularly when both cure rates and percentages of censoring are small.

Keywords: clinical trial, cure model, log-rank test, optimal test, sample size

1 Introduction

When survival data include a portion of cured patients or long-term survivors, cure models are useful for analyzing the data and designing clinical trials. Recently, various parametric and semiparametric cure models have been proposed by Farewell (1982), Peng et al. (1998), and Kuk and Chen (1992). A maximum-likelihood expectation-maximization (EM) algorithm for parametric and semiparametric cure models has been proposed by Peng and Dear (2000) and Sy and Taylor (2000). A SAS macro PSPMCM, developed by Corbiere and Joly (2007), is available to fit both parametric and semiparametric cure models. Thus, survival data in which a portion of patients are cured can be analyzed using these methods for the purpose of designing clinical trials using the selected cure models.

In a cancer clinical trial in which a portion of patients experience long-term survival, the main interest is often in the differences between cure rates. Examples from the Children’s Cancer Group trials are given by Lee and Sather (1995). To develop an appropriate test for testing the differences between cure rates in a two-arm randomized trial, Gray and Tsiatis (1989) proposed a family of cure models with a proportional distributions alternative. The optimal log-rank test was discussed under the proportional distributions alternative, which has the form of a Gρ test where ρ = −1 (Harrington and Fleming, 1982), and its efficacy relative to that of the standard log-rank test was also investigated. Ewell and Ibrahim (1997) extended the work of Gray and Tsiatis by deriving the large sample distribution of the weighted log-rank test under a more general sequence of local alternatives that allows for treatment effects on both short- and long-term survival. They also derived a power calculation for the weighted log-rank test assuming exponential failure times.

In this article, we focus on the situation where the main interest is in the differences between the cure rates of two groups. Following the work of Ewell and Ibrahim, sample size formulae are derived for both the standard log-rank test and the optimal weighted log-rank test. The relative efficacy of the two tests is also discussed.

The rest of the paper is organized as follows. A mixture cure model is introduced in Section 2. The sample size formula for the weighted log-rank test is derived in Section 3. The optimal log-rank test and its sample size formula are obtained in Section 4. In section 5, comparisons of the efficiency and robustness of the two tests are presented, and simulations are conducted to study the performance of the proposed sample size formulae. Section 6 illustrates clinical trial design using the proposed methods. Conclusions are presented in section 7.

2 Cure Models

The failure time, T, is assumed to be vT +(1−v)∞, where v is an indicator of whether a subject will eventually (v = 1) or never (v = 0) experience treatment failure, and T denotes the failure time if the subject is not cured, with a survival distribution S(t), which is the conditional distribution for patients who will experience treatment failure and is often called the latency distribution. Thus, the unconditional survival distribution of T is a mixture model of a cure rate π = P(v = 0) and a latency distribution S(t) given by

S(t)=π+(1π)S(t).

Let λ(t) and λ(t) be the hazard functions of T and T, respectively. We then have the following relation between the two hazard functions:

λ(t)=(1π)S(t)π+(1π)S(t)λ(t).

For a two-arm randomized survival trial, let Tij and Cij denote the survival and censoring times, respectively, of patient i in the jth group, where j = 1, 2 (1 for the control group and 2 for the treatment group). The observed data then consist of {Xij; Δij; i = 1, …, nj, j = 1, 2}, where Xij=TijCij and Δij=I(TijCij). It is commonly assumed that {Tij,Cij,i=1,,nj} are independent and identically distributed samples of (Tj, Cj) for control (j = 1) and treatment (j = 2) and that Tij is independent of Cij. Let Sj(t) denote the unconditional survival distribution and let λj(t) denote its hazard function for the jth group. When the main interest is in testing for differences between cure rates, it is reasonable to assume that the conditional survival distributions are the same for the two groups and are denoted by S(t), with the hazard function and cumulative hazard function being denoted by λ(t) and Λ(t), respectively. The cure rate for the jth group is defined by πj, where 0 ≤ πj < 1. Then, the survival distribution of the mixture cure model for the jth group is given by

Sj(t)=πj+(1πj)S(t), (1)

and the hazard function for the jth group is given by

λj(t)=(1πj)S(t)πj+(1πj)S(t)λ(t).

We are interested in testing the following hypothesis:

H0:π1=π2vs.H1:π1π2. (2)

Furthermore, we define the parameters γ and π0 as follows:

γ=12log1π21π1,
π0=1[(1π1)(1π2)]1/2,

where γ is the half-log ratio of the failure rates, and π0 is the proportion of cured patients under the null hypothesis. Then, hypothesis (2) is equivalent to the following hypothesis:

H0:γ=0vs.H1:γ0. (3)

The mixture cure model (1) can be written as

Sj(t)=1e(1)jγ(1π0){1S(t)}, (4)

and the corresponding hazard function is given by

λj(t)=e(1)jγ(1π0)S(t)1e(1)jγ(1π0)+e(1)jγ(1π0)S(t)λ(t). (5)

This alternative implies that the unconditional failure distributions for two groups are proportional; it is called a proportional distributions alternative by Gray and Tsiatis (1989).

To test hypothesis (2) or (3), or the difference in the unconditional failure distributions, a weighted score test can be used, which is given by

Uw=n1/20W(t){Y1(t)Y(t)dN2(t)Y2(t)Y(t)dN1(t)},

where n = n1 + n2 is the total sample size of two groups, W(t) is a weight function that converges in probability to w(t), Nj(t) is the number of observed failures by time t, Yj(t) is the number of subjects at risk just prior to t in groups j = 1, 2, and Y (t) = Y1(t) + Y2(t). By the martingale central limit theorem (Fleming and Harrington, 1991), under the null hypothesis, Uw converges in distribution to a normal variable with a mean of zero and variance

σw2=p(1p)(1π0)0w2(t)G(t)S(t)dΛ(t), (6)

where p = limn→∞ n1/n, and G(t) is the common survival distribution of the censoring time of two groups (see appendix). The variance σw2 in (6) can be estimated by

σ^w2=n10W2(t)Y1(t)Y2(t)Y2(t)dN(t),

where N(t) = N1(t) + N2(t). Therefore, under the null hypothesis, the weighted log-rank test Lw=Uw/σ^w is asymptotically standard normal distributed. Thus, given a significance level α, we reject the null hypothesis if |Lw| > z1−α/2, where z1−α/2 is the 100(1 − α/2)th percentile of the standard normal distribution.

3 Sample Size Formula

To derive the sample size formula, we need to know the asymptotic distribution of the weighted log-rank test under the alternative hypothesis. Consider a sequence of local alternatives

Sj(n)(t)=1e(1)jγn(1π0){1S(t)},

where n1/2γn = γa. Under the local alternatives, as shown in the appendix, the weighted log-rank test Lw=Uw/σ^w converges in distribution to a normal variable with unit variance and mean μ(w, γa)w, where σw2 is given by (6), and

μ(w,γa)=2p(1p)(1π0)γa0w(t){S0(t)}1G(t)S(t)dΛ(t), (7)

for which S0(t)=π0+(1π0)S(t).

Therefore, on the basis of the limiting distribution of Lw under the local alternative, given a type I error of α, to achieve a power of 1 − β, the total sample size n of two groups must approximately satisfy the following equation:

1βΦ{μ(w,γa)/σwz1α/2}.

For a local alternative γ, we replace γa by n1/2γ. Then, the sample size required to detect a local alternative γ can be determined by

n=(z1α/2+z1β)2σw2μ(w,γ)2. (8)

Substituting equations (6) and (7) into (8), the total sample size for the weighted log-rank test can be calculated by

n=(z1α/2+z1β)20w2(t)G(t)S(t)dΛ(t)4p(1p)(1π0)γ2[0w(t){S0(t)}1G(t)S(t)dΛ(t)]2. (9)

4 Optimal Log-rank Test

It is well known that the log-rank test is optimal against the proportional hazards model. However, the cure model (1) does not satisfy the proportional hazards assumption; thus, the log-rank test is not an optimal test, and a study design based on the log-rank test is not fully efficient. Therefore, it is desirable to find an optimal test for the cure model (1) under the local proportional distributions alternative. As the mean of the weighted log-rank test is proportional to

0w(t){S0(t)}1h(t)dt,

where h(t) = G(t)S(t)λ(t), by using the Cauchy-Schwartz inequality, we obtain the following inequality:

0w(t){S0(t)}1h(t)dt{0w2(t)h(t)dt0{S0(t)}2h(t)dt}1/2,

with equality if only if w(t) is proportional to {S0(t)}1. That is, the optimal weight function w(t) is proportional to {S0(t)}1, which minimizes the sample size given by formula (9). Thus, taking the weight function W (t) = {K(t)}−1, where K(t) is the left-continuous version of the Kaplan-Meier estimate computed from the pooled sample of two groups, gives the asymptotically optimal test for the proportional distributions alternative. Hence, by substituting w(t)={S0(t)}1 into formula (9), the sample size for the optimal log-rank test LK is given by

nK=(z1α/2+z1β)24p(1p)(1π0)γ20{S0(t)}2G(t)S(t)dΛ(t), (10)

and by substituting w(t) = 1 into formula (9), the sample size for the standard log-rank test L is given by

n=(z1α/2+z1β)20G(t)S(t)dΛ(t)4p(1p)(1π0)γ2[0{S0(t)}1G(t)S(t)dΛ(t)]2. (11)

The asymptotic relative efficiency ρ = n/nK (Randales and Wolfe, 1979) of the optimal test compared to the standard log-rank test is given by

ρ=0{S0(t)}2G(t)S(t)dΛ(t)0G(t)S(t)dΛ(t)[0{S0(t)}1G(t)S(t)dΛ(t)]2. (12)

In the special case when there is no censoring, that is, when G(t) = 1, the asymptotic relative efficiency ρ in (12) is reduced to

ρ=(1π0)2π0[log(π0)]2.

5 Comparison

We investigated three important issues. First, we studied the relative efficiency of the optimal log-rank test versus the standard log-rank test. Second, we evaluated the robustness of the optimal and standard log-rank tests when the hazard parameter was misspecified in the trial design. Third, we investigated the performance of the two sample size formulae under various design scenarios.

The relative efficiency ρ given in equation (12) was calculated for selected cure rates under the exponential cure model with an uncured hazard parameter λ = 1. Assume a uniform accrual over [0, τ] and no follow-up period, where τ is determined by the percentage of censoring ranging from 0% to 50%. The results (Table 1) showed that when the cure rate π0 was at most 10% and there was no censoring, the gain in efficiency of the optimal log-rank test versus the standard log-rank test was more than 50%, whereas if the cure rate π0 was at least 50%, the gain in efficiency was less than 5%. If the percentage of censoring was more than 50%, then the gain in efficiency was less than 10%, regardless of the cure rate. We also investigated the relative efficiency through the sample size calculations. Under the same assumptions, sample sizes were calculated under various combinations of the cure rates of two groups. Similarly, the largest gain in efficiency was achieved when both the cure rate and percentage of censoring were small (Table 2).

Table 1.

The relative efficiency ρ of the optimal log-rank test compared to the standard log-rank test under the exponential model with a hazard parameter λ = 1 and a uniform accrual over the interval [0, τ], where τ is determined by the percentage of censoring.

Cure rate π0

Cens 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
None 1.528 1.235 1.127 1.072 1.041 1.022 1.011 1.004 1.001
10% 1.490 1.221 1.120 1.068 1.039 1.021 1.010 1.004 1.001
20% 1.399 1.190 1.105 1.061 1.035 1.019 1.009 1.004 1.001
30% 1.272 1.144 1.084 1.050 1.029 1.016 1.008 1.003 1.001
40% 1.166 1.099 1.061 1.037 1.022 1.012 1.006 1.002 1.001
50% 1.095 1.061 1.040 1.026 1.016 1.009 1.005 1.002 1.000

Censoring time was uniformly distributed over [0, τ], with the value of τ being chosen so that the probability of the failure time being censored for a subject who was not cured was the specified censoring percentage. Abbreviation: Cens: censoring.

Table 2.

Sample sizes for the optimal and standard log-rank tests for various cure rates in two groups with a nominal type I error of 5% and power of 90%. Here, sample sizes were calculated under the exponential model, with a hazard parameter λ = 1 and a uniform accrual over the interval [0, τ], where τ is determined by the percentage of censoring.

Cure rate (π1, π2)

(.05, .15) (.05, .2) (.1, .2) (.1, .3) (.2, .4) (.3, .5) (.4, .6)

Test Cens Sample size
L None   598   301 738 217 257 279 281
  10%   766   379 916 263 304 323 321
  20% 1067   513 1218 338 375 388 379
  30% 1566   730 1697 453 481 483 460
  40% 2323 1058 2415 623 632 616 573
  50% 3479 1577 3509 881 860 813 740

LK None   394   215 554 177 230 261 270
  10%   517   275 698 217 272 303 310
  20%   766   391 964 286 341 367 367
  30% 1233   597 1423 398 445 461 448
  40% 1993   926 2144 569 597 594 562
  50% 3180 1437 3262 831 828 794 729

Censoring time was uniformly distributed over [0, τ], with the value of τ being chosen so that the probability of the failure time being censored for a subject who was not cured was the specified censoring percentage. Abbreviations: Cens: censoring; L: standard log-rank test; LK: optimal log-rank test.

To evaluate the robustness of the two tests, sample sizes (n) were calculated under exponential models with hazard parameters λ = 0.1 and 1. Cure rates were set to π1 = 0.1 and π2=π1eγ0/(1π1+π1eγ0), where γ0 ranged from 1.5 to 2.0, accrual time ta = 1, and follow-up time tf = 2. Sample sizes (n) were also calculated under misspecification of the hazard parameter within a range of λ±20%λ. The %diff = 100(nn)/n was calculated for the evaluation of robustness. The results showed that both tests were sensitive to the misspecification of the hazard parameter. However, the %diff was similar for both tests, and the optimal test was slightly more sensitive than the standard log-rank test (Table 3).

Table 3.

Sample sizes for the exponential cure models under misspecification of the hazard parameter λ, with cure rates π1 = 0.1 and π2 = π1eγ0/(1−π1 + π1eγ0), uniform accrual with accrual time ta = 1 and follow-up time tf = 2, and nominal type I error of 5% and power of 90%.

True λ Misspecified λ

λ = 0.1 λ = 0.08 λ = 0.12

Test γ0 n n %diff n %diff
L 1.5 2282 2885 26.4 1880 −17.6
1.6 1873 2367 26.4 1545 −17.5
1.7 1554 1963 26.3 1283 −17.4
1.8 1302 1643 26.2 1075 −17.4
1.9 1100 1387 26.1 909 −17.4
2.0 938 1181 25.9 776 −17.3

LK 1.5 2274 2879 26.6 1872 −17.7
1.6 1868 2363 26.5 1538 −17.7
1.7 1550 1959 26.4 1278 −17.5
1.8 1298 1640 26.3 1071 −17.5
1.9 1097 1385 26.3 906 −17.4
2.0 935 1179 26.1 773 −17.3
λ = 1 λ = 0.8 λ = 1.2

Test γ0 n n %diff n %diff
L 1.5 219 259 18.3 197 −10.0
1.6 185 218 17.8 167 −9.7
1.7 158 185 17.1 144 −8.9
1.8 137 159 16.1 124 −9.5
1.9 119 138 16.0 109 −8.4
2.0 105 121 15.2 96 −8.6

LK 1.5 193 233 20.7 170 −11.9
1.6 164 198 20.7 146 −11.0
1.7 142 169 19.0 127 −10.6
1.8 123 146 18.7 111 −9.8
1.9 108 128 18.5 98 −9.3
2.0 96 112 16.7 87 −9.4

%diff: change in sample size through misspecified hazard parameter λ, i.e., %diff =100 × (nn)/n, where n is the sample size calculated under the true λ and n is the sample size calculated under the misspecified λ. Abbreviations: L: standard log-rank test; LK: optimal log-rank test.

To investigate the performance of the sample size formulae for the optimal and standard log-rank tests, we calculated sample sizes under the cure model (1), where cure rates were set as in Table 3, and the conditional survival distribution was Weibull, S(t)=eλtκ, or log-logistic, S(t)=11+λtκ. The scale parameter λ was set to 0.4, and the shape parameter κ was set to 0.5, 1, or 2, reflecting a decreasing, constant, and increasing hazard function, respectively, for the Weibull distribution; and a decreasing and single-mode hazard function for the log-logistic distribution. We assumed that subjects were recruited with a uniform distribution over the accrual period ta = 1, with a follow-up period tf = 2. We further assumed that no subject was lost to follow-up during the study. Then, the censoring time was uniformly distributed over the interval [tf, ta + tf], that is, the censoring survival distribution G(t) = 1 if ttf; = (ta + tf − t)/ta if tftta + tf; = 0 otherwise. Therefore, given a two-sided nominal significance level of 0.05 and power of 90%, the required sample sizes were calculated for each design scenario under each distribution. The empirical type I errors and powers of the corresponding designs were simulated based on 100,000 runs. The simulation results presented in Table 4 can be summarized as follows. First, the empirical powers of both the optimal and standard log-rank tests were close to the nominal level of 90%. Thus, the sample sizes were adequately estimated. Second, the empirical type I errors of both tests were close to the nominal level of 5%. Thus, both tests preserved type I error well. Third, the sample sizes calculated from the optimal test were smaller than those calculated for the standard log-rank test.

Table 4.

Sample sizes (n) and corresponding simulated empirical type I errors (α^) and powers (1β^) for the optimal and standard log-rank tests under the Weibull and log-logistic distributions, with a scale parameter λ = 0.4, cure rates π1 = 0.1 and π2=π1eγ0/(1π1+π1eγ0), nominal type I error of 0.05, power of 90%, and uniform accrual with accrual time ta = 1 and follow-up time tf = 2.

κ = 0.5 κ=1 κ = 2

Dist Test γ0 n
α^
1β^
n
α^
1β^
n
α^
1β^
WB L 1.5 841 .048 .905 510 .053 .905 222 .052 .914
1.6 695 .049 .900 424 .050 .899 188 .052 .914
1.7 580 .051 .901 355 .045 .906 161 .050 .922
1.8 488 .050 .903 301 .050 .906 139 .051 .924
1.9 415 .049 .905 258 .048 .907 121 .052 .921
2.0 356 .051 .907 222 .053 .906 106 .050 .925

LK 1.5 827 .049 .901 490 .053 .904 195 .051 .919
1.6 683 .045 .901 408 .048 .910 166 .055 .919
1.7 571 .051 .902 343 .051 .902 143 .048 .925
1.8 481 .049 .905 291 .050 .906 125 .052 .928
1.9 410 .053 .904 250 .052 .909 110 .052 .926
2.0 351 .052 .906 216 .047 .910 97 .052 .932

LG L 1.5 1112 .048 .900 762 .052 .908 404 .048 .906
1.6 916 .050 .907 630 .049 .903 337 .049 .908
1.7 763 .047 .908 526 .053 .904 284 .051 .908
1.8 641 .047 .905 443 .050 .907 241 .050 .916
1.9 544 .049 .903 377 .050 .907 207 .049 .915
2.0 465 .048 .900 324 .054 .907 180 .050 .912

LK 1.5 1100 .048 .902 746 .051 .903 382 .053 .907
1.6 907 .049 .906 617 .045 .897 319 .053 .909
1.7 755 .050 .908 516 .052 .898 270 .051 .914
1.8 635 .048 .903 436 .051 .906 230 .050 .911
1.9 539 .049 .908 371 .056 .903 198 .051 .910
2.0 461 .053 .904 319 .049 .910 172 .050 .916

Abbreviations: Cens: censoring; Dist: distribution; WB: Weibull; LG: log-logistic; L: standard log-rank test; LK: optimal log-rank test.

Overall, the results showed that the derived sample size formulae provide adequate sample size estimation for trial design if the main interest is to detect the differences between the cure rates of two groups and that the optimal test is more efficient than the standard log-rank test, particular when both cure rates and percentage censoring are small.

6 Example

We illustrate study design under a parametric cure model by using the data from the Eastern Cooperative Oncology Group (ECOG) trial e1684. The ECOG trial e1684 was a two-arm phase III clinical trial to compare the relapse-free survival (RFS) of patients with melanoma who were treated with high-dose interferon alpha-2b or placebo as postoperative adjuvant therapy. The trial accrued patients between 1984 and 1990 and remained blinded under analysis until 1993 (Kirkwood, et al., 1996). Researchers have studied this dataset extensively using cure models (Corbiere and Joly, 2007). There were 92 deaths among the 146 patients in the treatment group. The SAS macro PSPMCM was applied to this data to fit the treatment arm data under the Weibull cure model (Figure 1), with an estimated shape parameter κ of 1.018, scale parameter λ of 0.836, and a cure rate of 35%. Suppose we wish to design a two-arm randomized phase III trial to detect a 20% difference between the cure rate in the arm that receives the new treatment and that in the control arm that receives the same therapy as the treatment arm of the ECOG trial, with a two-sided type I error of 0.05, power of 90% at the alternative, a uniform accrual with a 5-year accrual period and 5-year of follow-up, no loss to follow-up, and equal allocation between the two groups. Then, the required sample sizes calculated using formulae (10) and (11) under the Weibull cure model are 266 and 280 patients, respectively. The corresponding simulated empirical type I error and power are 0.05 and 91.4% for the optimal log-rank test, and 0.05 and 90.7% for the standard log-rank test. As the cure rate is relatively high, the gain in efficiency is only approximately 5% in this example.

Figure 1.

Figure 1

Relapse-free survival for ECOG e1864 data. The step function is the Kaplan-Meier survival curve. The solid curve is the fitted Weibull cure model.

7 Conclusion

For cancer clinical trials in which a portion of patients are cured, the main interest is in demonstrating the differences between the cure rates in the two treatment groups. In this article, sample size formulae are derived for both the optimal and standard log-rank tests. Because the proposed cure model is not a proportional hazards model, the standard log-rank test is not fully efficient. Thus, a sample size calculation derived under the optimal test can ensure the efficacy of the study design. The optimal log-rank test is implemented in the standard statistical software R by using the survdiff function with the option rho = −1. The simulation results demonstrated that the sample size formula for the optimal test provides adequate sample size estimation and is more efficient than the formula for the standard log-rank test. Finally, if trials are planned to include interim analyses to enable them to be halted early if futility or efficacy is demonstrated, then the group sequential methods developed by Lee and Sather (1995) can be used.

Acknowledgments

The author acknowledges an anonymous reviewer for his/her valuable comments that improved an earlier version of the paper. This work was supported in part by the National Cancer Institute support grant CA21765 and ALSAC.

Appendix: Derivation of the asymptotic distribution of the weighted log-rank test

The weighted score test is given by

Uw=n1/20W(t){Y1(t)Y(t)dN2(t)Y2(t)Y(t)dN1(t)},

where n = n1 + n2 is the total sample size of two groups, W (t) is a weight function that converges in probability to w(t), Nj(t) is the number of observed failures by time t, Yj(t) is the number of subjects at risk just prior to t in groups j = 1, 2, and Y (t) = Y1(t) + Y2(t). If we define martingale processes such that Mj(t)=Nj(t)0tλj(t)Yj(t)dt, j = 1, 2, where λj(t) is given in equation (5), then the weighted score test can be written as

Uw=n1/20W(t){Y1(t)Y(t)dM2(t)Y2(t)Y(t)dM1(t)}+0W(t)Y1(t)Y2(t)nY(t)n1/2{λ2(t)λ1(t)}dt.

Under the null hypothesis H0 : γ = 0, we have λ1(t)=λ2(t)=λ0(t), where

λ0(t)=(1π0)S(t)π0+(1π0)S(t)λ(t).

Hence, by the martingale property, the mean of Uw is 0 and the variance of Uw is given by

Var(Uw)=n1E0W2(t)Y1(t)Y2(t)Y(t)dΛ0(t),

where Λ0(t)=0tλ0(u)du. As

n1Y1(t)Y2(t)Y(t)=n1n2n2{Y1(t)/n1}{Y2(t)/n2}Y(t)/np(1p)π1(t)π2(t)π(t),

where p=limnn1/n,πj(t)=P(Tij>t) and π(t) = 1(t) + (1 − p)π2(t). Thus, by the martingale central limit theorem (Fleming and Harrington, 1991), UwN(0,σw2), where

σw2=p(1p)0w2(t)G(t)S0(t)λ0(t)dt,

for which S0(t)=π0+(1π0)S(t) and G(t) is the common survival distribution of the censoring time of the two groups. By noting that S0(t)λ0=(1π0)S(t)λ(t), we have

σw2=p(1p)(1π0)0w2(t)G(t)S(t)λ(t)dt. (13)

The variance σw2 can be estimated by

σ^w2=n10W2(t)Y1(t)Y2(t)Y(t)dΛ^0(t),

where dΛ^0(t)=dN(t)/Y(t) and N(t) = N1(t) + N2(t). Therefore, the weighted log-rank test Lw=Uw/σ^w is asymptotically standard normal distributed under the null hypothesis.

To derive the asymptotic distribution of the weighted log-rank test under the alternative, consider a sequence of local alternatives H1(n):Sj(n)(t)=1e(1)jγn(1π0){1S(t)}, or

λj(n)(t)=e(1)jγn(1π0)S(t)1e(1)jγn(1π0)+e(1)jγn(1π0)S(t)λ(t),

where n1/2γn = γa < ∞, and define martingale processes as Mj(n)(t)=Nj(t)0tYj(u)λj(n)(u)du. Then, Uw = U1w+U1w+U2w, where

U1w=n1/20W(t){Y2(t)Y(t)dM1(n)(t)Y1(t)Y(t)dM2(n)(t)},

and

U2w=n1/20W(t)Y1(t)Y2(t)Y(t){λ1(n)(t)λ2(n)(t)}dt.

As γn → 0, H1(n)H0 and, λj(n)(t)λ0(t), and by the martingale central limiting theorem, U1w converges to a normal variable with mean EU1w = 0 and variance

EU1w2=n1E0W2(t){Y22(t)Y2(t)Y1(t)λ1(n)(t)+Y12(t)Y2(t)Y2(t)λ2(n)(t)}dup(1p)0w2(t){(1p)π22(t)π1(t)π2(t)λ0(t)+pπ12(t)π2(t)π2(t)λ0(t)}dt=p(1p)0w2(t)π1(t)π2(t)π(t)λ0(t)du=p(1p)0w2(t)G(t)S0(t)λ0(t)dt=σw2.

By Taylor’s expansion of λj(t) at γn = 0, we have

λj(t)(1π0)S(t)π0+(1π0)S(t)λ(t)+(1π0)S(t){π0+(1π0)S(t)}2λ(t)(1)jγn.

It then follows that

limnn1/2{λ2(t)λ1(t)}=2γa(1π0)S(t)λ(t){π0+(1π0)S(t)}2.

By substituting this into U2w, we have shown that U2w converges in probability to μ(w, γa), where

μ(w,γa)=2p(1p)(1π0)γa0w(t){S0(t)}1G(t)S(t)λ(t)dt.

Thus, under the local alternatives H1(n), the weighted log-rank test is asymptotically normal distributed with mean μww and unit variance, that is,

Lw=Uw/σ^wN(μ(w,γa)/σw,1).

References

  1. Corbiere F, Joly P. A SAS macro for parametric and semiparametric mixture cure models. Computer Methods and Programs in Biomedicine. 2007;85:173–180. doi: 10.1016/j.cmpb.2006.10.008. [DOI] [PubMed] [Google Scholar]
  2. Ewell M, Ibrahim JG. The large sample distribution of the weighted log rank statistic under general local alternatives. Lifetime Data Analysis. 1997;3:5–12. doi: 10.1023/a:1009690200504. [DOI] [PubMed] [Google Scholar]
  3. Farewell VT. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics. 1982;38:1041–1046. [PubMed] [Google Scholar]
  4. Fleming TR, Harrington DP. Counting processes and survival analysis. John Wiley and Sons; New York: 1991. [Google Scholar]
  5. Gray RJ, Tsiatis AA. A linear rank test for use when the main interest is in differences in cure rates. Biometrics. 1989;45:899–904. [PubMed] [Google Scholar]
  6. Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982;69:553–566. [Google Scholar]
  7. Kirkwood JM, Straderman MH, Ernstoff MS, Smith TJ, Borden EC, Blum RH. Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. Journal of Clinical Oncology. 1996;14:7–17. doi: 10.1200/JCO.1996.14.1.7. [DOI] [PubMed] [Google Scholar]
  8. Kuk AYC, Chen CH. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541. [Google Scholar]
  9. Lee JW, Sather HN. Group sequential methods for comparison of cure rates in clinical trials. Biometrics. 1995;51:756–763. [PubMed] [Google Scholar]
  10. Peng Y, Dear KBG. A nonparametric mixture model for cure rate estimation. Biometrics. 2000;56:237–243. doi: 10.1111/j.0006-341x.2000.00237.x. [DOI] [PubMed] [Google Scholar]
  11. Peng Y, Dear KBG, Denham JW. A generalized F mixture model for cure rate estimation. Statistics in Medicine. 1998;17:813–830. doi: 10.1002/(sici)1097-0258(19980430)17:8<813::aid-sim775>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
  12. Randales RH, Wolfe DA. Introduction to the theory of nonparametric statistics. John Wiley & Sons; New York: 1979. [Google Scholar]
  13. Sy JP, Taylor JMG. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56:227–236. doi: 10.1111/j.0006-341x.2000.00227.x. [DOI] [PubMed] [Google Scholar]

RESOURCES