Sample Size Calculation for Testing Differences Between Cure Rates with the Optimal Log-rank Test

Jianrong Wu

doi:10.1080/10543406.2016.1148711

. Author manuscript; available in PMC: 2017 Aug 30.

Published in final edited form as: J Biopharm Stat. 2016 Feb 16;27(1):124–134. doi: 10.1080/10543406.2016.1148711

Sample Size Calculation for Testing Differences Between Cure Rates with the Optimal Log-rank Test

Jianrong Wu ¹

PMCID: PMC5575886 NIHMSID: NIHMS898238 PMID: 26882262

Abstract

In this article, sample size calculations are developed for use when the main interest is in the differences between the cure rates of two groups. Following the work of Ewell and Ibrahim, the asymptotic distribution of the weighted log-rank test is derived under the local alternative. The optimal log-rank test under the proportional distributions alternative is discussed, and sample size formulae for the optimal and standard log-rank tests are derived. Simulation results show that the proposed formulae provide adequate sample size estimation for trial designs and that the optimal log-rank test is more efficient than the standard log-rank test, particularly when both cure rates and percentages of censoring are small.

Keywords: clinical trial, cure model, log-rank test, optimal test, sample size

1 Introduction

When survival data include a portion of cured patients or long-term survivors, cure models are useful for analyzing the data and designing clinical trials. Recently, various parametric and semiparametric cure models have been proposed by Farewell (1982), Peng et al. (1998), and Kuk and Chen (1992). A maximum-likelihood expectation-maximization (EM) algorithm for parametric and semiparametric cure models has been proposed by Peng and Dear (2000) and Sy and Taylor (2000). A SAS macro PSPMCM, developed by Corbiere and Joly (2007), is available to fit both parametric and semiparametric cure models. Thus, survival data in which a portion of patients are cured can be analyzed using these methods for the purpose of designing clinical trials using the selected cure models.

In a cancer clinical trial in which a portion of patients experience long-term survival, the main interest is often in the differences between cure rates. Examples from the Children’s Cancer Group trials are given by Lee and Sather (1995). To develop an appropriate test for testing the differences between cure rates in a two-arm randomized trial, Gray and Tsiatis (1989) proposed a family of cure models with a proportional distributions alternative. The optimal log-rank test was discussed under the proportional distributions alternative, which has the form of a G^ρ test where ρ = −1 (Harrington and Fleming, 1982), and its efficacy relative to that of the standard log-rank test was also investigated. Ewell and Ibrahim (1997) extended the work of Gray and Tsiatis by deriving the large sample distribution of the weighted log-rank test under a more general sequence of local alternatives that allows for treatment effects on both short- and long-term survival. They also derived a power calculation for the weighted log-rank test assuming exponential failure times.

In this article, we focus on the situation where the main interest is in the differences between the cure rates of two groups. Following the work of Ewell and Ibrahim, sample size formulae are derived for both the standard log-rank test and the optimal weighted log-rank test. The relative efficacy of the two tests is also discussed.

The rest of the paper is organized as follows. A mixture cure model is introduced in Section 2. The sample size formula for the weighted log-rank test is derived in Section 3. The optimal log-rank test and its sample size formula are obtained in Section 4. In section 5, comparisons of the efficiency and robustness of the two tests are presented, and simulations are conducted to study the performance of the proposed sample size formulae. Section 6 illustrates clinical trial design using the proposed methods. Conclusions are presented in section 7.

2 Cure Models

The failure time, T^∗, is assumed to be vT +(1−v)∞, where v is an indicator of whether a subject will eventually (v = 1) or never (v = 0) experience treatment failure, and T denotes the failure time if the subject is not cured, with a survival distribution S(t), which is the conditional distribution for patients who will experience treatment failure and is often called the latency distribution. Thus, the unconditional survival distribution of T^∗ is a mixture model of a cure rate π = P(v = 0) and a latency distribution S(t) given by

S * (t) = π + (1 - π) S (t) .

Let λ^∗(t) and λ(t) be the hazard functions of T^∗ and T, respectively. We then have the following relation between the two hazard functions:

λ * (t) = \frac{(1 - π) S (t)}{π + (1 - π) S (t)} λ (t) .

For a two-arm randomized survival trial, let $T_{i j}^{*}$ and C_ij denote the survival and censoring times, respectively, of patient i in the j^th group, where j = 1, 2 (1 for the control group and 2 for the treatment group). The observed data then consist of {X_ij; Δ_ij; i = 1, …, n_j, j = 1, 2}, where $X_{i j} = T_{i j}^{*} \land C_{i j}$ and $Δ_{i j} = I (T_{i j}^{*} \leq C_{i j})$ . It is commonly assumed that ${T_{i j}^{*}, C_{i j}, i = 1, \dots, n_{j}}$ are independent and identically distributed samples of (T_j, C_j) for control (j = 1) and treatment (j = 2) and that T_ij is independent of C_ij. Let $S_{j}^{*} (t)$ denote the unconditional survival distribution and let $λ_{j}^{*} (t)$ denote its hazard function for the j^th group. When the main interest is in testing for differences between cure rates, it is reasonable to assume that the conditional survival distributions are the same for the two groups and are denoted by S(t), with the hazard function and cumulative hazard function being denoted by λ(t) and Λ(t), respectively. The cure rate for the j^th group is defined by π_j, where 0 ≤ π_j < 1. Then, the survival distribution of the mixture cure model for the j^th group is given by

S_{j}^{*} (t) = π_{j} + (1 - π_{j}) S (t),

(1)

and the hazard function for the j^th group is given by

λ_{j}^{*} (t) = \frac{(1 - π_{j}) S (t)}{π_{j} + (1 - π_{j}) S (t)} λ (t) .

We are interested in testing the following hypothesis:

H_{0} : π_{1} = π_{2} v s . H_{1} : π_{1} \neq π_{2} .

(2)

Furthermore, we define the parameters γ and π₀ as follows:

γ = \frac{1}{2} \log \frac{1 - π_{2}}{1 - π_{1}},

π_{0} = 1 - {[(1 - π_{1}) (1 - π_{2})]}^{1 / 2},

where γ is the half-log ratio of the failure rates, and π₀ is the proportion of cured patients under the null hypothesis. Then, hypothesis (2) is equivalent to the following hypothesis:

H_{0} : γ = 0 v s . H_{1} : γ \neq 0.

(3)

The mixture cure model (1) can be written as

S_{j}^{*} (t) = 1 - e^{{(- 1)}^{j} γ} (1 - π_{0}) {1 - S (t)},

(4)

and the corresponding hazard function is given by

λ_{j}^{*} (t) = \frac{e^{{(- 1)}^{j} γ} (1 - π_{0}) S (t)}{1 - e^{{(- 1)}^{j} γ} (1 - π_{0}) + e^{{(- 1)}^{j} γ} (1 - π_{0}) S (t)} λ (t) .

(5)

This alternative implies that the unconditional failure distributions for two groups are proportional; it is called a proportional distributions alternative by Gray and Tsiatis (1989).

To test hypothesis (2) or (3), or the difference in the unconditional failure distributions, a weighted score test can be used, which is given by

U_{w} = n^{- 1 / 2} \int_{0}^{\infty} W (t) {\frac{Y_{1} (t)}{Y (t)} d N_{2} (t) - \frac{Y_{2} (t)}{Y (t)} d N_{1} (t)},

where n = n₁ + n₂ is the total sample size of two groups, W(t) is a weight function that converges in probability to w(t), N_j(t) is the number of observed failures by time t, Y_j(t) is the number of subjects at risk just prior to t in groups j = 1, 2, and Y (t) = Y₁(t) + Y₂(t). By the martingale central limit theorem (Fleming and Harrington, 1991), under the null hypothesis, U_w converges in distribution to a normal variable with a mean of zero and variance

σ_{w}^{2} = p (1 - p) (1 - π_{0}) \int_{0}^{\infty} w^{2} (t) G (t) S (t) d Λ (t),

(6)

where p = lim_n_→∞ n₁/n, and G(t) is the common survival distribution of the censoring time of two groups (see appendix). The variance $σ_{w}^{2}$ in (6) can be estimated by

{\hat{σ}}_{w}^{2} = n^{- 1} \int_{0}^{\infty} W^{2} (t) \frac{Y_{1} (t) Y_{2} (t)}{Y^{2} (t)} d N (t),

where N(t) = N₁(t) + N₂(t). Therefore, under the null hypothesis, the weighted log-rank test $L_{w} = U_{w} / {\hat{σ}}_{w}$ is asymptotically standard normal distributed. Thus, given a significance level α, we reject the null hypothesis if |L_w| > z₁₋_α/₂, where z₁₋_α/₂ is the 100(1 − α/2)^th percentile of the standard normal distribution.

3 Sample Size Formula

To derive the sample size formula, we need to know the asymptotic distribution of the weighted log-rank test under the alternative hypothesis. Consider a sequence of local alternatives

S_{j}^{* (n)} (t) = 1 - e^{{(- 1)}^{j} γ_{n}} (1 - π_{0}) {1 - S (t)},

where n¹^/²γ_n = γ_a. Under the local alternatives, as shown in the appendix, the weighted log-rank test $L_{w} = U_{w} / {\hat{σ}}_{w}$ converges in distribution to a normal variable with unit variance and mean μ(w, γ_a)/σ_w, where $σ_{w}^{2}$ is given by (6), and

μ (w, γ_{a}) = 2 p (1 - p) (1 - π_{0}) γ_{a} \int_{0}^{\infty} w (t) {S_{0}^{*} (t)}^{- 1} G (t) S (t) d Λ (t),

(7)

for which $S_{0}^{*} (t) = π_{0} + (1 - π_{0}) S (t)$ .

Therefore, on the basis of the limiting distribution of L_w under the local alternative, given a type I error of α, to achieve a power of 1 − β, the total sample size n of two groups must approximately satisfy the following equation:

1 - β ≃ Φ {μ (w, γ_{a}) / σ_{w} - z_{1 - α / 2}} .

For a local alternative γ, we replace γ_a by n¹^/²γ. Then, the sample size required to detect a local alternative γ can be determined by

n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2} σ_{w}^{2}}{μ {(w, γ)}^{2}} .

(8)

Substituting equations (6) and (7) into (8), the total sample size for the weighted log-rank test can be calculated by

n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2} \int_{0}^{\infty} w^{2} (t) G (t) S (t) d Λ (t)}{4 p (1 - p) (1 - π_{0}) γ^{2} {[\int_{0}^{\infty} w (t) {S_{0}^{*} (t)}^{- 1} G (t) S (t) d Λ (t)]}^{2}} .

(9)

4 Optimal Log-rank Test

It is well known that the log-rank test is optimal against the proportional hazards model. However, the cure model (1) does not satisfy the proportional hazards assumption; thus, the log-rank test is not an optimal test, and a study design based on the log-rank test is not fully efficient. Therefore, it is desirable to find an optimal test for the cure model (1) under the local proportional distributions alternative. As the mean of the weighted log-rank test is proportional to

\int_{0}^{\infty} w (t) {S_{0}^{*} (t)}^{- 1} h (t) d t,

where h(t) = G(t)S(t)λ(t), by using the Cauchy-Schwartz inequality, we obtain the following inequality:

{\int_{0}^{\infty} w (t) {S_{0}^{*} (t)}^{- 1} h (t) d t \leq {\int_{0}^{\infty} w^{2} (t) h (t) d t \int_{0}^{\infty} {S_{0}^{*} (t)}^{- 2} h (t) d t}}^{1 / 2},

with equality if only if w(t) is proportional to ${S_{0}^{*} (t)}^{- 1}$ . That is, the optimal weight function w(t) is proportional to ${S_{0}^{*} (t)}^{- 1}$ , which minimizes the sample size given by formula (9). Thus, taking the weight function W (t) = {K(t⁻)}⁻¹, where K(t⁻) is the left-continuous version of the Kaplan-Meier estimate computed from the pooled sample of two groups, gives the asymptotically optimal test for the proportional distributions alternative. Hence, by substituting $w (t) = {S_{0}^{*} (t)}^{- 1}$ into formula (9), the sample size for the optimal log-rank test L_K is given by

n_{K} = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2}}{4 p (1 - p) (1 - π_{0}) γ^{2} \int_{0}^{\infty} {S_{0}^{*} (t)}^{- 2} G (t) S (t) d Λ (t)},

(10)

and by substituting w(t) = 1 into formula (9), the sample size for the standard log-rank test L is given by

n = \frac{{(z_{1 - α / 2} + z_{1 - β})}^{2} \int_{0}^{\infty} G (t) S (t) d Λ (t)}{4 p (1 - p) (1 - π_{0}) γ^{2} {[\int_{0}^{\infty} {S_{0}^{*} (t)}^{- 1} G (t) S (t) d Λ (t)]}^{2}} .

(11)

The asymptotic relative efficiency ρ = n/n_K (Randales and Wolfe, 1979) of the optimal test compared to the standard log-rank test is given by

ρ = \frac{\int_{0}^{\infty} {S_{0}^{*} (t)}^{- 2} G (t) S (t) d Λ (t) \int_{0}^{\infty} G (t) S (t) d Λ (t)}{{[\int_{0}^{\infty} {S_{0}^{*} (t)}^{- 1} G (t) S (t) d Λ (t)]}^{2}} .

(12)

In the special case when there is no censoring, that is, when G(t) = 1, the asymptotic relative efficiency ρ in (12) is reduced to

ρ = \frac{{(1 - π_{0})}^{2}}{π_{0} {[\log (π_{0})]}^{2}} .

5 Comparison

We investigated three important issues. First, we studied the relative efficiency of the optimal log-rank test versus the standard log-rank test. Second, we evaluated the robustness of the optimal and standard log-rank tests when the hazard parameter was misspecified in the trial design. Third, we investigated the performance of the two sample size formulae under various design scenarios.

The relative efficiency ρ given in equation (12) was calculated for selected cure rates under the exponential cure model with an uncured hazard parameter λ = 1. Assume a uniform accrual over [0, τ] and no follow-up period, where τ is determined by the percentage of censoring ranging from 0% to 50%. The results (Table 1) showed that when the cure rate π₀ was at most 10% and there was no censoring, the gain in efficiency of the optimal log-rank test versus the standard log-rank test was more than 50%, whereas if the cure rate π₀ was at least 50%, the gain in efficiency was less than 5%. If the percentage of censoring was more than 50%, then the gain in efficiency was less than 10%, regardless of the cure rate. We also investigated the relative efficiency through the sample size calculations. Under the same assumptions, sample sizes were calculated under various combinations of the cure rates of two groups. Similarly, the largest gain in efficiency was achieved when both the cure rate and percentage of censoring were small (Table 2).

Table 1.

The relative efficiency ρ of the optimal log-rank test compared to the standard log-rank test under the exponential model with a hazard parameter λ = 1 and a uniform accrual over the interval [0, τ], where τ is determined by the percentage of censoring.

Cure rate π₀

Cens	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
None	1.528	1.235	1.127	1.072	1.041	1.022	1.011	1.004	1.001
10%	1.490	1.221	1.120	1.068	1.039	1.021	1.010	1.004	1.001
20%	1.399	1.190	1.105	1.061	1.035	1.019	1.009	1.004	1.001
30%	1.272	1.144	1.084	1.050	1.029	1.016	1.008	1.003	1.001
40%	1.166	1.099	1.061	1.037	1.022	1.012	1.006	1.002	1.001
50%	1.095	1.061	1.040	1.026	1.016	1.009	1.005	1.002	1.000

Open in a new tab

Censoring time was uniformly distributed over [0, τ], with the value of τ being chosen so that the probability of the failure time being censored for a subject who was not cured was the specified censoring percentage. Abbreviation: Cens: censoring.

Table 2.

Sample sizes for the optimal and standard log-rank tests for various cure rates in two groups with a nominal type I error of 5% and power of 90%. Here, sample sizes were calculated under the exponential model, with a hazard parameter λ = 1 and a uniform accrual over the interval [0, τ], where τ is determined by the percentage of censoring.

Cure rate (π₁, π₂)

		(.05, .15)	(.05, .2)	(.1, .2)	(.1, .3)	(.2, .4)	(.3, .5)	(.4, .6)

Test	Cens			Sample size
L	None	598	301	738	217	257	279	281
	10%	766	379	916	263	304	323	321
	20%	1067	513	1218	338	375	388	379
	30%	1566	730	1697	453	481	483	460
	40%	2323	1058	2415	623	632	616	573
	50%	3479	1577	3509	881	860	813	740

L_K	None	394	215	554	177	230	261	270
	10%	517	275	698	217	272	303	310
	20%	766	391	964	286	341	367	367
	30%	1233	597	1423	398	445	461	448
	40%	1993	926	2144	569	597	594	562
	50%	3180	1437	3262	831	828	794	729

Open in a new tab

To evaluate the robustness of the two tests, sample sizes (n) were calculated under exponential models with hazard parameters λ = 0.1 and 1. Cure rates were set to π₁ = 0.1 and $π_{2} = π_{1} e^{γ_{0}} / (1 - π_{1} + π_{1} e^{γ_{0}})$ , where γ₀ ranged from 1.5 to 2.0, accrual time t_a = 1, and follow-up time t_f = 2. Sample sizes (n^∗) were also calculated under misspecification of the hazard parameter within a range of λ±20%λ. The %diff = 100(n^∗−n)/n was calculated for the evaluation of robustness. The results showed that both tests were sensitive to the misspecification of the hazard parameter. However, the %diff was similar for both tests, and the optimal test was slightly more sensitive than the standard log-rank test (Table 3).

Table 3.

Sample sizes for the exponential cure models under misspecification of the hazard parameter λ, with cure rates π₁ = 0.1 and π₂ = π₁e^γ⁰/(1−π₁ + π₁e^γ⁰), uniform accrual with accrual time t_a = 1 and follow-up time t_f = 2, and nominal type I error of 5% and power of 90%.

	True λ		Misspecified λ

		λ = 0.1	λ = 0.08		λ = 0.12

Test	γ₀	n	n^∗	%diff	n^∗	%diff
L	1.5	2282	2885	26.4	1880	−17.6
	1.6	1873	2367	26.4	1545	−17.5
	1.7	1554	1963	26.3	1283	−17.4
	1.8	1302	1643	26.2	1075	−17.4
	1.9	1100	1387	26.1	909	−17.4
	2.0	938	1181	25.9	776	−17.3

L_K	1.5	2274	2879	26.6	1872	−17.7
	1.6	1868	2363	26.5	1538	−17.7
	1.7	1550	1959	26.4	1278	−17.5
	1.8	1298	1640	26.3	1071	−17.5
	1.9	1097	1385	26.3	906	−17.4
	2.0	935	1179	26.1	773	−17.3

		λ = 1	λ = 0.8		λ = 1.2

Test	γ₀	n	n^∗	%diff	n^∗	%diff
L	1.5	219	259	18.3	197	−10.0
	1.6	185	218	17.8	167	−9.7
	1.7	158	185	17.1	144	−8.9
	1.8	137	159	16.1	124	−9.5
	1.9	119	138	16.0	109	−8.4
	2.0	105	121	15.2	96	−8.6

L_K	1.5	193	233	20.7	170	−11.9
	1.6	164	198	20.7	146	−11.0
	1.7	142	169	19.0	127	−10.6
	1.8	123	146	18.7	111	−9.8
	1.9	108	128	18.5	98	−9.3
	2.0	96	112	16.7	87	−9.4

Open in a new tab

%diff: change in sample size through misspecified hazard parameter λ, i.e., %diff =100 × (n^∗ − n)/n, where n is the sample size calculated under the true λ and n^∗ is the sample size calculated under the misspecified λ. Abbreviations: L: standard log-rank test; L_K: optimal log-rank test.

To investigate the performance of the sample size formulae for the optimal and standard log-rank tests, we calculated sample sizes under the cure model (1), where cure rates were set as in Table 3, and the conditional survival distribution was Weibull, $S (t) = e^{- λ t^{κ}}$ , or log-logistic, $S (t) = \frac{1}{1 + λ t^{κ}}$ . The scale parameter λ was set to 0.4, and the shape parameter κ was set to 0.5, 1, or 2, reflecting a decreasing, constant, and increasing hazard function, respectively, for the Weibull distribution; and a decreasing and single-mode hazard function for the log-logistic distribution. We assumed that subjects were recruited with a uniform distribution over the accrual period t_a = 1, with a follow-up period t_f = 2. We further assumed that no subject was lost to follow-up during the study. Then, the censoring time was uniformly distributed over the interval [t_f, t_a + t_f], that is, the censoring survival distribution G(t) = 1 if t ≤ t_f; = (t_a + t_f − t)/t_a if t_f ≤ t ≤ t_a + t_f; = 0 otherwise. Therefore, given a two-sided nominal significance level of 0.05 and power of 90%, the required sample sizes were calculated for each design scenario under each distribution. The empirical type I errors and powers of the corresponding designs were simulated based on 100,000 runs. The simulation results presented in Table 4 can be summarized as follows. First, the empirical powers of both the optimal and standard log-rank tests were close to the nominal level of 90%. Thus, the sample sizes were adequately estimated. Second, the empirical type I errors of both tests were close to the nominal level of 5%. Thus, both tests preserved type I error well. Third, the sample sizes calculated from the optimal test were smaller than those calculated for the standard log-rank test.

Table 4.

Sample sizes (n) and corresponding simulated empirical type I errors $(\hat{α})$ and powers $(1 - \hat{β})$ for the optimal and standard log-rank tests under the Weibull and log-logistic distributions, with a scale parameter λ = 0.4, cure rates π₁ = 0.1 and $π_{2} = π_{1} e^{γ_{0}} / (1 - π_{1} + π_{1} e^{γ_{0}})$ , nominal type I error of 0.05, power of 90%, and uniform accrual with accrual time t_a = 1 and follow-up time t_f = 2.

κ = 0.5

κ=1

κ = 2

Dist

Test

γ₀

\hat{α}

1 - \hat{β}

\hat{α}

1 - \hat{β}

\hat{α}

1 - \hat{β}

1.5

841

.048

.905

510

.053

.905

222

.052

.914

1.6

695

.049

.900

424

.050

.899

188

.052

.914

1.7

580

.051

.901

355

.045

.906

161

.050

.922

1.8

488

.050

.903

301

.050

.906

139

.051

.924

1.9

415

.049

.905

258

.048

.907

121

.052

.921

2.0

356

.051

.907

222

.053

.906

106

.050

.925

L_K

1.5

827

.049

.901

490

.053

.904

195

.051

.919

1.6

683

.045

.901

408

.048

.910

166

.055

.919

1.7

571

.051

.902

343

.051

.902

143

.048

.925

1.8

481

.049

.905

291

.050

.906

125

.052

.928

1.9

410

.053

.904

250

.052

.909

110

.052

.926

2.0

351

.052

.906

216

.047

.910

.052

.932

1.5

1112

.048

.900

762

.052

.908

404

.048

.906

1.6

916

.050

.907

630

.049

.903

337

.049

.908

1.7

763

.047

.908

526

.053

.904

284

.051

.908

1.8

641

.047

.905

443

.050

.907

241

.050

.916

1.9

544

.049

.903

377

.050

.907

207

.049

.915

2.0

465

.048

.900

324

.054

.907

180

.050

.912

L_K

1.5

1100

.048

.902

746

.051

.903

382

.053

.907

1.6

907

.049

.906

617

.045

.897

319

.053

.909

1.7

755

.050

.908

516

.052

.898

270

.051

.914

1.8

635

.048

.903

436

.051

.906

230

.050

.911

1.9

539

.049

.908

371

.056

.903

198

.051

.910

2.0

461

.053

.904

319

.049

.910

172

.050

.916

Open in a new tab

Abbreviations: Cens: censoring; Dist: distribution; WB: Weibull; LG: log-logistic; L: standard log-rank test; L_K: optimal log-rank test.

Overall, the results showed that the derived sample size formulae provide adequate sample size estimation for trial design if the main interest is to detect the differences between the cure rates of two groups and that the optimal test is more efficient than the standard log-rank test, particular when both cure rates and percentage censoring are small.

6 Example

We illustrate study design under a parametric cure model by using the data from the Eastern Cooperative Oncology Group (ECOG) trial e1684. The ECOG trial e1684 was a two-arm phase III clinical trial to compare the relapse-free survival (RFS) of patients with melanoma who were treated with high-dose interferon alpha-2b or placebo as postoperative adjuvant therapy. The trial accrued patients between 1984 and 1990 and remained blinded under analysis until 1993 (Kirkwood, et al., 1996). Researchers have studied this dataset extensively using cure models (Corbiere and Joly, 2007). There were 92 deaths among the 146 patients in the treatment group. The SAS macro PSPMCM was applied to this data to fit the treatment arm data under the Weibull cure model (Figure 1), with an estimated shape parameter κ of 1.018, scale parameter λ of 0.836, and a cure rate of 35%. Suppose we wish to design a two-arm randomized phase III trial to detect a 20% difference between the cure rate in the arm that receives the new treatment and that in the control arm that receives the same therapy as the treatment arm of the ECOG trial, with a two-sided type I error of 0.05, power of 90% at the alternative, a uniform accrual with a 5-year accrual period and 5-year of follow-up, no loss to follow-up, and equal allocation between the two groups. Then, the required sample sizes calculated using formulae (10) and (11) under the Weibull cure model are 266 and 280 patients, respectively. The corresponding simulated empirical type I error and power are 0.05 and 91.4% for the optimal log-rank test, and 0.05 and 90.7% for the standard log-rank test. As the cure rate is relatively high, the gain in efficiency is only approximately 5% in this example.

Relapse-free survival for ECOG e1864 data. The step function is the Kaplan-Meier survival curve. The solid curve is the fitted Weibull cure model.

7 Conclusion

For cancer clinical trials in which a portion of patients are cured, the main interest is in demonstrating the differences between the cure rates in the two treatment groups. In this article, sample size formulae are derived for both the optimal and standard log-rank tests. Because the proposed cure model is not a proportional hazards model, the standard log-rank test is not fully efficient. Thus, a sample size calculation derived under the optimal test can ensure the efficacy of the study design. The optimal log-rank test is implemented in the standard statistical software R by using the survdiff function with the option rho = −1. The simulation results demonstrated that the sample size formula for the optimal test provides adequate sample size estimation and is more efficient than the formula for the standard log-rank test. Finally, if trials are planned to include interim analyses to enable them to be halted early if futility or efficacy is demonstrated, then the group sequential methods developed by Lee and Sather (1995) can be used.

Acknowledgments

The author acknowledges an anonymous reviewer for his/her valuable comments that improved an earlier version of the paper. This work was supported in part by the National Cancer Institute support grant CA21765 and ALSAC.

Appendix: Derivation of the asymptotic distribution of the weighted log-rank test

The weighted score test is given by

U_{w} = n^{- 1 / 2} \int_{0}^{\infty} W (t) {\frac{Y_{1} (t)}{Y (t)} d N_{2} (t) - \frac{Y_{2} (t)}{Y (t)} d N_{1} (t)},

where n = n₁ + n₂ is the total sample size of two groups, W (t) is a weight function that converges in probability to w(t), N_j(t) is the number of observed failures by time t, Y_j(t) is the number of subjects at risk just prior to t in groups j = 1, 2, and Y (t) = Y₁(t) + Y₂(t). If we define martingale processes such that $M_{j} (t) = N_{j} (t) - \int_{0}^{t} λ_{j}^{*} (t) Y_{j} (t) d t$ , j = 1, 2, where $λ_{j}^{*} (t)$ is given in equation (5), then the weighted score test can be written as

U_{w} = n^{- 1 / 2} \int_{0}^{\infty} W (t) {\frac{Y_{1} (t)}{Y (t)} d M_{2} (t) - \frac{Y_{2} (t)}{Y (t)} d M_{1} (t)} + \int_{0}^{\infty} W (t) \frac{Y_{1} (t) Y_{2} (t)}{n Y (t)} n^{1 / 2} {λ_{2}^{*} (t) - λ_{1}^{*} (t)} d t .

Under the null hypothesis H₀ : γ = 0, we have $λ_{1}^{*} (t) = λ_{2}^{*} (t) = λ_{0}^{*} (t)$ , where

λ_{0}^{*} (t) = \frac{(1 - π_{0}) S (t)}{π_{0} + (1 - π_{0}) S (t)} λ (t) .

Hence, by the martingale property, the mean of U_w is 0 and the variance of U_w is given by

Var (U_{w}) = n^{- 1} E \int_{0}^{\infty} W^{2} (t) \frac{Y_{1} (t) Y_{2} (t)}{Y (t)} d Λ_{0}^{*} (t),

where $Λ_{0}^{*} (t) = \int_{0}^{t} λ_{0}^{*} (u) d u$ . As

n^{- 1} \frac{Y_{1} (t) Y_{2} (t)}{Y (t)} = \frac{n_{1} n_{2}}{n^{2}} \frac{{Y_{1} (t) / n_{1}} {Y_{2} (t) / n_{2}}}{Y (t) / n} \to p (1 - p) \frac{π_{1} (t) π_{2} (t)}{π (t)},

where $p = {lim}_{n}_{\to \infty} n_{1} / n, π_{j} (t) = P (T_{i j}^{*} > t)$ and π(t) = pπ₁(t) + (1 − p)π₂(t). Thus, by the martingale central limit theorem (Fleming and Harrington, 1991), $U_{w} \to N (0, σ_{w}^{2})$ , where

σ_{w}^{2} = p (1 - p) \int_{0}^{\infty} w^{2} (t) G (t) S_{0}^{*} (t) λ_{0}^{*} (t) d t,

for which $S_{0}^{*} (t) = π_{0} + (1 - π_{0}) S (t)$ and G(t) is the common survival distribution of the censoring time of the two groups. By noting that $S_{0}^{*} (t) λ_{0}^{*} = (1 - π_{0}) S (t) λ (t)$ , we have

σ_{w}^{2} = p (1 - p) (1 - π_{0}) \int_{0}^{\infty} w^{2} (t) G (t) S (t) λ (t) d t .

(13)

The variance $σ_{w}^{2}$ can be estimated by

{\hat{σ}}_{w}^{2} = n^{- 1} \int_{0}^{\infty} W^{2} (t) \frac{Y_{1} (t) Y_{2} (t)}{Y (t)} d {\hat{Λ}}_{0}^{*} (t),

where $d {\hat{Λ}}_{0}^{*} (t) = d N (t) / Y (t)$ and N(t) = N₁(t) + N₂(t). Therefore, the weighted log-rank test $L_{w} = U_{w} / {\hat{σ}}_{w}$ is asymptotically standard normal distributed under the null hypothesis.

To derive the asymptotic distribution of the weighted log-rank test under the alternative, consider a sequence of local alternatives $H_{1}^{(n)} : S_{j}^{* (n)} (t) = 1 - e^{(- 1) j_{γ n}} (1 - π_{0}) {1 - S (t)}$ , or

λ_{j}^{* (n)} (t) = \frac{e^{(- 1) j_{γ n}} (1 - π_{0}) S (t)}{1 - e^{(- 1) j_{γ n}} (1 - π_{0}) + e^{(- 1) j_{γ n}} (1 - π_{0}) S (t)} λ (t),

where n¹^/²γ_n = γ_a < ∞, and define martingale processes as $M_{j}^{(n)} (t) = N_{j} (t) - \int_{0}^{t} Y_{j} (u) λ_{j}^{* (n)} (u) d u$ _. Then, U_w = U₁_w+U₁_w+U₂_w, where

U_{1 w} = n^{- 1 / 2} \int_{0}^{\infty} W (t) {\frac{Y_{2} (t)}{Y (t)} d M_{1}^{(n)} (t) - \frac{Y_{1} (t)}{Y (t)} d M_{2}^{(n)} (t)},

and

U_{2 w} = n^{- 1 / 2} \int_{0}^{\infty} W (t) \frac{Y_{1} (t) Y_{2} (t)}{Y (t)} {λ_{1}^{* (n)} (t) - λ_{2}^{* (n)} (t)} d t .

As γ_n → 0, $H_{1}^{(n)} \to H_{0}$ and, $λ_{j}^{* (n)} (t) \to λ_{0}^{*} (t)$ , and by the martingale central limiting theorem, U₁_w converges to a normal variable with mean EU₁_w = 0 and variance

E U_{1 w}^{2} = n^{- 1} E \int_{0}^{\infty} W^{2} (t) {\frac{Y_{2}^{2} (t)}{Y^{2} (t)} Y_{1} (t) λ_{1}^{* (n)} (t) + \frac{Y_{1}^{2} (t)}{Y^{2} (t)} Y_{2} (t) λ_{2}^{* (n)} (t)} d u \to p (1 - p) \int_{0}^{\infty} w^{2} (t) {(1 - p) \frac{π_{2}^{2} (t) π_{1} (t)}{π^{2} (t)} λ_{0}^{*} (t) + p \frac{π_{1}^{2} (t) π_{2} (t)}{π^{2} (t)} λ_{0}^{*} (t)} d t = p (1 - p) \int_{0}^{\infty} w^{2} (t) \frac{π_{1} (t) π_{2} (t)}{π (t)} λ_{0}^{*} (t) d u = p (1 - p) \int_{0}^{\infty} w^{2} (t) G (t) S_{0}^{*} (t) λ_{0}^{*} (t) d t = σ_{w}^{2} .

By Taylor’s expansion of $λ_{j}^{*} (t)$ at γ_n = 0, we have

λ_{j}^{*} (t) ≃ \frac{(1 - π_{0}) S (t)}{π_{0} + (1 - π_{0}) S (t)} λ (t) + \frac{(1 - π_{0}) S (t)}{{π_{0} + (1 - π_{0}) S (t)}^{2}} λ (t) {(- 1)}^{j} γ_{n} .

It then follows that

\lim_{n \to \infty} n^{1 / 2} {λ_{2}^{*} (t) - λ_{1}^{*} (t)} = \frac{2 γ_{a} (1 - π_{0}) S (t) λ (t)}{{π_{0} + (1 - π_{0}) S (t)}^{2}} .

By substituting this into U₂_w, we have shown that U₂_w converges in probability to μ(w, γ_a), where

μ (w, γ_{a}) = 2 p (1 - p) (1 - π_{0}) γ_{a} \int_{0}^{\infty} w (t) {S_{0}^{*} (t)}^{- 1} G (t) S (t) λ (t) d t .

Thus, under the local alternatives $H_{1}^{(n)}$ , the weighted log-rank test is asymptotically normal distributed with mean μ_w/σ_w and unit variance, that is,

L_{w} = U_{w} / {\hat{σ}}_{w} \to N (μ (w, γ_{a}) / σ_{w}, 1) .

References

Corbiere F, Joly P. A SAS macro for parametric and semiparametric mixture cure models. Computer Methods and Programs in Biomedicine. 2007;85:173–180. doi: 10.1016/j.cmpb.2006.10.008. [DOI] [PubMed] [Google Scholar]
Ewell M, Ibrahim JG. The large sample distribution of the weighted log rank statistic under general local alternatives. Lifetime Data Analysis. 1997;3:5–12. doi: 10.1023/a:1009690200504. [DOI] [PubMed] [Google Scholar]
Farewell VT. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics. 1982;38:1041–1046. [PubMed] [Google Scholar]
Fleming TR, Harrington DP. Counting processes and survival analysis. John Wiley and Sons; New York: 1991. [Google Scholar]
Gray RJ, Tsiatis AA. A linear rank test for use when the main interest is in differences in cure rates. Biometrics. 1989;45:899–904. [PubMed] [Google Scholar]
Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982;69:553–566. [Google Scholar]
Kirkwood JM, Straderman MH, Ernstoff MS, Smith TJ, Borden EC, Blum RH. Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. Journal of Clinical Oncology. 1996;14:7–17. doi: 10.1200/JCO.1996.14.1.7. [DOI] [PubMed] [Google Scholar]
Kuk AYC, Chen CH. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541. [Google Scholar]
Lee JW, Sather HN. Group sequential methods for comparison of cure rates in clinical trials. Biometrics. 1995;51:756–763. [PubMed] [Google Scholar]
Peng Y, Dear KBG. A nonparametric mixture model for cure rate estimation. Biometrics. 2000;56:237–243. doi: 10.1111/j.0006-341x.2000.00237.x. [DOI] [PubMed] [Google Scholar]
Peng Y, Dear KBG, Denham JW. A generalized F mixture model for cure rate estimation. Statistics in Medicine. 1998;17:813–830. doi: 10.1002/(sici)1097-0258(19980430)17:8<813::aid-sim775>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
Randales RH, Wolfe DA. Introduction to the theory of nonparametric statistics. John Wiley & Sons; New York: 1979. [Google Scholar]
Sy JP, Taylor JMG. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56:227–236. doi: 10.1111/j.0006-341x.2000.00227.x. [DOI] [PubMed] [Google Scholar]

[R1] Corbiere F, Joly P. A SAS macro for parametric and semiparametric mixture cure models. Computer Methods and Programs in Biomedicine. 2007;85:173–180. doi: 10.1016/j.cmpb.2006.10.008. [DOI] [PubMed] [Google Scholar]

[R2] Ewell M, Ibrahim JG. The large sample distribution of the weighted log rank statistic under general local alternatives. Lifetime Data Analysis. 1997;3:5–12. doi: 10.1023/a:1009690200504. [DOI] [PubMed] [Google Scholar]

[R3] Farewell VT. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics. 1982;38:1041–1046. [PubMed] [Google Scholar]

[R4] Fleming TR, Harrington DP. Counting processes and survival analysis. John Wiley and Sons; New York: 1991. [Google Scholar]

[R5] Gray RJ, Tsiatis AA. A linear rank test for use when the main interest is in differences in cure rates. Biometrics. 1989;45:899–904. [PubMed] [Google Scholar]

[R6] Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982;69:553–566. [Google Scholar]

[R7] Kirkwood JM, Straderman MH, Ernstoff MS, Smith TJ, Borden EC, Blum RH. Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. Journal of Clinical Oncology. 1996;14:7–17. doi: 10.1200/JCO.1996.14.1.7. [DOI] [PubMed] [Google Scholar]

[R8] Kuk AYC, Chen CH. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541. [Google Scholar]

[R9] Lee JW, Sather HN. Group sequential methods for comparison of cure rates in clinical trials. Biometrics. 1995;51:756–763. [PubMed] [Google Scholar]

[R10] Peng Y, Dear KBG. A nonparametric mixture model for cure rate estimation. Biometrics. 2000;56:237–243. doi: 10.1111/j.0006-341x.2000.00237.x. [DOI] [PubMed] [Google Scholar]

[R11] Peng Y, Dear KBG, Denham JW. A generalized F mixture model for cure rate estimation. Statistics in Medicine. 1998;17:813–830. doi: 10.1002/(sici)1097-0258(19980430)17:8<813::aid-sim775>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]

[R12] Randales RH, Wolfe DA. Introduction to the theory of nonparametric statistics. John Wiley & Sons; New York: 1979. [Google Scholar]

[R13] Sy JP, Taylor JMG. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56:227–236. doi: 10.1111/j.0006-341x.2000.00227.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Sample Size Calculation for Testing Differences Between Cure Rates with the Optimal Log-rank Test

Jianrong Wu

Abstract

1 Introduction

2 Cure Models

3 Sample Size Formula

4 Optimal Log-rank Test

5 Comparison

Table 1.

Table 2.

Table 3.

Table 4.

6 Example

Figure 1.

7 Conclusion

Acknowledgments

Appendix: Derivation of the asymptotic distribution of the weighted log-rank test

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Sample Size Calculation for Testing Differences Between Cure Rates with the Optimal Log-rank Test

Jianrong Wu

Abstract

1 Introduction

2 Cure Models

3 Sample Size Formula

4 Optimal Log-rank Test

5 Comparison

Table 1.

Table 2.

Table 3.

Table 4.

6 Example

Figure 1.

7 Conclusion

Acknowledgments

Appendix: Derivation of the asymptotic distribution of the weighted log-rank test

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases