Single-Arm Phase II Group Sequential Trial Design with Survival Endpoint at a Fixed Time Point

Jianrong Wu; Xiaoping Xiong

doi:10.1080/19466315.2014.923325

. Author manuscript; available in PMC: 2017 Sep 7.

Published in final edited form as: Stat Biopharm Res. 2014 May 27;6(4):289–301. doi: 10.1080/19466315.2014.923325

Single-Arm Phase II Group Sequential Trial Design with Survival Endpoint at a Fixed Time Point

Jianrong Wu ¹, Xiaoping Xiong ¹

PMCID: PMC5589207 NIHMSID: NIHMS899029 PMID: 28890756

Abstract

In this paper, three non-parametric test statistics are proposed to design single-arm phase II group sequential trials for monitoring survival probability. The small-sample properties of these test statistics are studied through simulations. Sample size formulas are derived for the fixed sample test. The Brownian motion property of the test statistics allowed us to develop a flexible group sequential design using a sequential conditional probability ratio test procedure (Xiong, 1995). An example is given to illustrate the trial design by using the proposed method.

Keywords: Brownian motion, Interim analysis, Nelson-Aalen estimate, Single-arm phase II trial, Sample size, Survival probability

1 Introduction

Phase II trial evaluation is an important step in the development of new anti-cancer drugs and treatments. The goal of phase II cancer clinical trials is assessing the efficacy and toxicity of a new agent or treatment. Evidence of anti-cancer efficacy demonstrated in a phase II trial often determines whether the new agent or treatment warrants further study in a large and confirmatory randomized phase III trial. Phase II trials are often designed as single-arm studies with a limited number of subjects. Due to ethical concerns, interim monitoring rules are frequently used to terminate the trial early for futility, thereby reducing the number of subjects exposed to an ineffective therapy. The proportion of subjects achieving complete or partial response is often chosen as a surrogate endpoint of the study. Such trials with binary endpoints are often designed and monitored using Simon’s optimal or minimax two-stage design. However, for many phase II cancer trials, the probability of a subject surviving after some clinically meaningful landmark point x is the primary endpoint. Designing such trials with interim monitoring rules is challenging because of incomplete follow-up for some subjects at the time of the interim analyses. A naive approach is to estimate the survival probability as a simple proportion of all subjects surviving the required time at the interim analyses. There are several disadvantages to this approach. First, subjects who have not been followed for the required time will not contribute to the estimate of the survival probability, resulting in an inefficient estimate. Second, simply treating subjects who are lost to follow-up as treatment failures or excluding them from the analysis will result in a biased estimate. Finally, suspending accrual to wait for all the subjects to be followed up for the required length of time for an interim analysis is often impractical. Long trial suspensions can ruin a trial’s momentum, increase study length, and increase costs (Case and Morgan, 2003).

To design a trial with incomplete follow-up data, survival analysis can be used. There is much literature available to assist in the design of randomized phase III survival trials with multi-stage interim analyses, such as Jennison and Turnbull (2000) and references therein. However there are few such methods for the design of single-arm phase II survival trials with interim monitoring rules. Recently, several methods have been proposed for monitoring the survival probabilities at a fixed time point for single-arm phase II trials. For example, Case and Morgan (2003) proposed an optimal two-stage phase II design to minimize the expected sample size or expected study length under the null hypothesis. Huang et al. (2010) modified Case and Morgan’s approach to protect the type I error rate and improve the robustness of the design. Zhao et al. (2011) proposed a simulation based Bayesian decision theoretic two-stage design for a phase II clinical trial with survival endpoint. All these designs are restricted to a two-stage design only. Lin et al. (1996) developed a multi-stage sequential procedure for monitoring the survival probabilities, but it was developed for the purpose of testing a specific umbrella hypothesis of the National Wilms Tumor Study.

In phase II drug testing trials, it is also important to know as early as possible if the drug is highly promising. For this purpose and to allow more flexible monitoring of the trial, a sequential conditional probability ratio test (SCPRT) procedure (Xiong, 1995; Xiong et al., 2003) can be used to design a study with multi-stage interim analyses, and the trial can be monitored or stopped early for either futility or efficacy. The maximum sample size required for the SCPRT test is the same as for a fixed sample test, and the SCPRT boundaries have analytical solutions. Therefore, the proposed method can be easily applied to design a group sequential trial for monitoring the survival probability for a single-arm phase II study.

The rest of the paper is organized as follows: Section 2 discusses three non-parametric test statistics based on Nelson-Aalen estimate. Sample size formulas for a fixed sample test are derived in Section 3. A group sequential design using the SCPRT procedure is discussed in Section 4. The empirical type I error and power of the three test statistics are studied in Section 5 through simulations. An example is given in Section 6 to illustrate trial design using the proposed method. Section 7 is the concluding discussion.

2 Test Statistics

In a single-arm phase II trial with survival outcome, such as the time-to-failure (disease progression, relapse, or death) at some clinically meaningful landmark point x is often the primary interest. Let S(x) be the survival probability at x and S₀(x) be the level of survival probability at x at which investigators are no longer interested in the treatment. Then the design of the single-arm phase II trial can be based on testing the following one-sided hypotheses:

H_{0} : S (x) \leq S_{0} (x) vs H_{a} : S (x) > S_{0} (x),

for a given x; it is equivalent to test the hypotheses based on the cumulative hazard function Λ(x) = −logS(x)

H_{0}^{†} : Λ (x) \geq Λ_{0} (x) vs H_{a}^{†} : Λ (x) < Λ_{0} (x),

(1)

where Λ₀(x) = −logS₀(x)

Suppose during the accrual phase of the trial, n subjects are enrolled in the study at calendar times Y₁, ⋯, Y_n which are measured from the start of the study. Let T_i and C_i denote, respectively, the failure time and censoring time of the i^th subject, with both being measured from the time of study entry of this subject. We assume that the failure time T_i is independent of the censoring time C_i and entry time Y_i, and {(Y_i, T_i, C_i); i = 1, ⋯, n} are independent and identically distributed.

When the data are examined at calendar time t ≤ τ, where t is measured from the start of study and τ is the study duration, including the periods of enrollment and follow-up, we observe the time to failure X_i(t) = T_i∧C_i∧(t − Y_i)⁺ and failure indicator Δ_i(t) = I(T_i ≤ C_i ∧ (t − Y_i)⁺), i = 1, ⋯, n. Based on the observed data {X_i(t), Δ_i(t), i = 1, ⋯, n}, the Nelson-Aalen estimate for the cumulative hazard function $Λ (x) = \int_{0}^{x} λ (u) du$ is

\hat{Λ} (x; t) = \sum_{{i : X_{i} (t) \leq x}} \frac{Δ_{i} (t)}{R_{i} (t)},

(2)

where $R_{i} (t) = \sum_{j = 1}^{n} I {X_{j} (t) \geq X_{i} (t)}$ . It has been shown (Lin et al., 1996) that the process

n^{1 / 2} {Λ_{0} (x) - \hat{Λ} (x; t)}; x < t

converges weakly to a Gaussian process with time variable t and independent increments and variance

σ^{2} (x; t) = \int_{0}^{x} \frac{λ (u)}{π (u; t)} du,

(3)

as a function of t, where π(u; t) = P(X₁(t) ≥ u), and σ²(x; t) can be estimated by

{\hat{σ}}^{2} (x; t) = \sum_{{i : X_{i} (t) \leq x}} \frac{Δ_{i} (t)}{R_{i}^{2} (t) / n}

(4)

(Lin et al., 1996; Breslow and Crowley, 1974; Fleming and Harrington, 1991).

However, the distribution of Λ̂(x; t) is often skewed, and it requires large sample sizes to maintain its type I error rate. This is partly because Λ̂(x; t) is restricted to nonnegative values. However, since a logarithmic transformation of the Nelson-Aalen estimate logΛ̂(x; t) takes the value over the entire real line, the asymptotic normality is expected to be more accurate. Therefore, Lin et al. (1996) suggested testing the hypothesis (1) using the following test statistic:

Z_{1} (x; t) = \frac{n^{1 / 2} {log Λ_{0} (x) - log \hat{Λ} (x; t)} \hat{Λ} (x; t)}{\hat{σ} (x; t)}

(5)

and showed that Z₁(x; t) as a process in t can be approximated by a Gaussian process. Several other transformations have also been discussed in the literature (Klein et al., 2007), and the corresponding test statistics will be introduced as follows. The Nelson-Aalen estimate for the survival function is given as

\hat{S} (x; t) = e^{- \hat{Λ} (x; t)},

(6)

which is asymptotically equivalent to the Kaplan-Meier estimator. The next test statistic is constructed based on an arcsine-square root transformation on Ŝ(x; t), which is given by

Z_{2} (x; t) = \frac{n^{1 / 2} {arcsin \sqrt{\hat{S} (x; t)} - arcsin \sqrt{S_{0} (x)}}}{\hat{ν} (x; t)},

(7)

where

{\hat{ν}}^{2} (x; t) = \frac{\hat{S} (x; t) {\hat{σ}}^{2} (x; t)}{4 (1 - \hat{S} (x; t))} .

The third test statistic discussed in the paper is based on the logit transformation, which is given by

Z_{3} (x; t) = \frac{n^{1 / 2} {logit \hat{S} (x; t) - logit S_{0} (x)} (1 - \hat{S} (x; t))}{\hat{σ} (x; t)} .

(8)

The small-sample properties of these three test statistics Z₁ – Z₃ will be studied and compared through simulations in Section 5.

3 Sample Size for Fixed Sample Test

A key step in designing a clinical trial is to calculate the required minimal sample size with constrained type I error and power. If the study is interested in testing hypotheses (1), then given type I error rate α and power 1 − β at the alternative Λ₁(x)(< Λ₀(x)) for a given x, the sample sizes for fixed sample tests can be derived as follows. The power of a test is the probability of rejecting the null hypothesis when the alternative is true. Hence, for a fixed sample test, the power of the test statistic Z₁ = Z₁(x; τ) under the alternative Λ(x) = Λ₁(x) is determined by

1 - β = P (n^{1 / 2} {\hat{σ}}^{- 1} {log Λ_{0} (x) - log \hat{Λ} (x; τ)} \hat{Λ} (x; τ) > z_{1 - α} | Λ (x) = Λ_{1} (x)) ≃ Φ {n^{1 / 2} σ_{1}^{- 1} (log Λ_{0} (x) - log Λ_{1} (x)) Λ_{1} (x)) - z_{1 - α}},

where Φ(·) is the standard normal distribution function, and $σ_{1}^{2} = σ_{1}^{2} (x; τ)$ is the asymptotic variance of Λ̂(x; τ) for the alternative at the end of study duration t = τ. Thus, the sample size for a fixed sample test for test statistic Z₁ is given by

n = \frac{{(z_{1 - α} + z_{1 - β})}^{2} σ_{1}^{2} (x; τ)}{{(log Λ_{0} (x) - log Λ_{1} (x))}^{2} Λ_{1}^{2} (x)} .

(9)

Similarly, the sample size for Z₂ is given by

n = \frac{{(z_{1 - α} + z_{1 - β})}^{2} ν_{1}^{2} (x; τ)}{{arcsin \sqrt{S_{1} (x)} - arcsin \sqrt{S_{0} (x)}}^{2}},

(10)

where

ν_{1}^{2} (x; τ) = \frac{S_{1} (x) σ_{1}^{2} (x; τ)}{4 (1 - S_{1} (x))},

and for Z₃ is given by

n = \frac{{(z_{1 - α} + z_{1 - β})}^{2} σ_{1}^{2} (x; τ)}{{logit S_{1} (x) - logit S_{0} (x)}^{2} {(1 - S_{1} (x))}^{2}} .

(11)

To calculate the sample size n given in above formulas, we have to calculate the asymptotic variance $σ_{1}^{2} (x; τ)$ under the alternative hypothesis, which is given by (3) with t = τ. Assuming subjects were recruited with a uniform distribution over the accrual period t_a and followed for an additional period of length t_f, then the study duration τ = t_a + t_f. We further assume that the survival function of censoring time is G₁(u). Then under the alternative, π(u; t) = P(T₁ ∧ C₁ ∧ (t − Y₁)⁺ ≥ u) = P(T₁ ≥ u)P(C₁ ≥ u)(P(Y₁ < t − u) = S₁(u)G₁(u)min{(t − u)/t_a, 1}, t > u. Thus from equation (3), we have

σ_{1}^{2} (x; t) = \int_{0}^{x} \frac{λ_{1} (u)}{S_{1} (u) G_{1} (u) min {(t - u) / t_{a}, 1}} du, x < t \leq τ,

(12)

where λ₁(·) and S₁(·) are the hazard and survival distribution function at the alternative, respectively.

As a special case, if there is no loss to follow-up, then G₁(u) = P(C₁ ≥ u) = 1 and min{(t_a + t_f − u)/t_a, 1} = 1 for any 0 < u ≤ x ≤ t_f; hence equation (12) can be simplified as

σ_{1}^{2} (x; τ) = \int_{0}^{x} \frac{λ_{1} (u)}{S_{1} (u)} du = S_{1}^{- 1} (x) - 1 .

(13)

This implies that if there is no loss to follow-up and x ≤ t_f, then the sample size for a fixed sample test depends only on the single value S₁(x), but not on the specification of the underlying survival distribution S₁(·).

In traditional randomized oncology trials with survival as the endpoint, the study power is determined by the number of events (see, e.g., Collett, 2003). Similarly here, equation (12) showed that the variance $σ_{1}^{2} (x; τ)$ depends on the survival probability and the censoring distribution. The more events in [0, x], the smaller is S(u) for u < x; the heavier is censoring, the smaller is G(u) for u < x; and then the larger is variance $σ_{1}^{2} (x, τ)$ , which leads to larger sample size n. Therefore, even though the power of the test statistic is expressed explicitly in terms of sample size n, it is actually determined by the number of events. Unfortunately, there is no explicit formula available here to express the power in terms of number of events; however, the number of events is not as convenient as the sample size to be used for the design and operation of a planned study with survival data, in general.

4 Group Sequential Procedure

In this section, we will apply an SCPRT procedure (Xiong, 1995) to the test statistic Z₁(x; t) as a sequential test statistic with time variable t based on its Brownian motion property (Lin et al., 1996; Breslow and Crowley, 1974). The SCPRT has some unique features: (1) the maximum sample size of the sequential test is the same as the sample size of the fixed sample test; (2) the probability of discordance, or the probability that the conclusion of the sequential test would be reversed if the experiment were not stopped according to the stopping rule but continued to the planned end, can be controlled to an arbitrarily small level; (3) the power function of the SCPRT is virtually the same as that of the fixed sample test (Xiong et al., 2007; Xiong, 1995). All these features make the SCPRT more practical and attractive to be used for designing a sequential trial.

Letting {B_t : 0 < t ≤ 1} be a Brownian motion B_t ~ N(θt, t) and B₁ be the B_t at the final stage with full information t = 1, then the joint distribution of (B_t, B₁) has a bivariate normal distribution with mean μ = (θt, θ) and variance matrix Σ = (σ_ij)_2×2 with σ₁₁ = σ₁₂ = σ₂₁ = t and σ₂₂ = 1. Therefore, according to multivariate normal conditional distribution theory (see, e.g., Anderson, 1958), the conditional density f(B_t|B₁) is normal density of N(B₁t, (1 − t)t). Let s₀ = z_1−α be the critical value of B₁ to reject the null for the fixed sample test. Then the conditional maximum likelihood ratio for the stochastic process on information time t is (Xiong, 1995; Xiong, et al., 2003)

L (t, B_{t} | z_{1 - α}) = \frac{{max}_{{s > s_{0}}} f (B_{t} | B_{1} = s)}{{max}_{{s \leq s_{0}}} f (B_{t} | B_{1} = s)} .

Taking the logarithm, the log likelihood ratio can be simplified as

log (L (t, B_{t} | z_{1 - α})) = \pm \frac{{(B_{t} - z_{1 - α} t)}^{2}}{2 (1 - t) t},

which has a positive sign if B_t > z_1−αt and a negative sign if B_t < z_1−αt. This equation leads to lower and upper boundaries for B_{t_k} as

a_{k} = z_{1 - α} t_{k} - {2 {at}_{k} (1 - t_{k})}^{1 / 2}; b_{k} = z_{1 - α} t_{k} + {2 {at}_{k} (1 - t_{k})}^{1 / 2},

(14)

for k = 1, …, K, where K is the total number of looks, and t₁, t₂, ⋯, t_K(= 1) are the information times of interim looks and the final look. The a in (14) is the boundary coefficient, and it is crucial to choose an appropriate a for the design such that the probability of conclusion by the sequential test being reversed by the test at the planned end is small but not unnecessarily small. The larger is a, the smaller is the discordance probability, and the wider apart are the upper and lower boundaries, making it harder for the sample path to reach boundaries and stop early and resulting in larger expected sample sizes. An appropriate a can be determined by choosing an appropriate discordance probability (Xiong, 1995; Xiong et al., 2003). As an illustration, the calculations of the operating characteristics of a multi-stage group sequential design are given in the Appendix.

We now apply the SCPRT to the test statistic Z₁(x; t) in (5). Let

U (x; t) = n^{1 / 2} I (x; t) {log Λ_{0} (x) - log \hat{Λ} (x; t)} \hat{Λ} (x; t),

(15)

where $I (x; t) = 1 / σ_{1}^{2} (x; t)$ . For a fixed x, U(x; t) is asymptotically a Brownian motion in the time scale of the information I(x; t), with mean E{U(x; t)} = δI(x; t) and covariance cov{U(x; t), U(x; s)} = I(x; t) for any t and s such that x < t < s < τ, where δ = n^1/2(logΛ₀(x) − logΛ₁(x))Λ₁(x) (Lin et al., 1996; Breslow and Crowley, 1974). Then we can transform this process into a process with time variable t* on [0, 1] by letting

t^{*} = I (x; t) / I (x; τ) = σ_{1}^{2} (x; τ) / σ_{1}^{2} (x; t)

(16)

be the information time, where $σ_{1}^{2} (x; τ)$ is calculated by (3) or (12) or (13), depending on given conditions in the design, and letting

B (x; t^{*}) = U (x; t) / I^{1 / 2} (x; τ) .

(17)

Then the sequential test statistic B(x; t*) ~ N(θt*, t*) is approximately a Brownian motion in information time t*, where the drift parameter $θ = δ I^{1 / 2} (x; τ) = δ / σ_{1}^{2} (x; τ)$ . Based on the SCPRT procedure presented above, the lower and upper boundaries for $B_{t_{k}^{*}} = B (x; t_{k}^{*})$ at the k^th look are given by

a_{k} = z_{1 - α} t_{k}^{*} - {2 {at}_{k}^{*} (1 - t_{k}^{*})}^{1 / 2}; b_{k} = z_{1 - α} t_{k}^{*} + {2 {at}_{k}^{*} (1 - t_{k}^{*})}^{1 / 2},

(18)

for k = 1, …, K, where $t_{k}^{*} = I (x; t_{k}) / I (x; τ)$ is the information time at the k^th look with calendar time t_k. The nominal critical p-values for testing hypotheses (1) are

P_{a_{k}} = 1 - Φ (a_{k} / \sqrt{t_{k}^{*}}); P_{b_{k}} = 1 - Φ (b_{k} / \sqrt{t_{k}^{*}}) .

(19)

The observed p-value at the k^th look is

P_{B_{t_{k}^{*}}} = 1 - Φ (B_{t_{k}^{*}} / \sqrt{t_{k}^{*}}) .

The stopping rule for monitoring the trial can be executed by stopping the trial when, for the first time, $P_{B_{t_{k}^{*}}} \geq P_{a_{k}}$ (accept H₀ and stop for futility) or $P_{B_{t_{k}^{*}}} \leq P_{b_{k}}$ (reject H₀ and stop for efficacy). Since Z₁(x; t_k) and $B_{t_{k}^{*}} / \sqrt{t_{k}^{*}}$ are asymptotically the same standard normal distribution under the null hypothesis, the observed p-value at k-stage can be calculated from the test statistic Z₁(x; t_k) by applying all observations up to stage k. The derived sequential boundaries (18) are also applied to the test statistics Z₂ in (7) and Z₃ in (8) because the two sequential test statistics obviously have the same Brownian process property based on arguments by Lin et al. (1996). The maximum sample sizes (n) for the three sequential test statistics Z₁(x; t), Z₂(x; t), and Z₃(x; t) are same as the sample sizes for corresponding fixed test statistics and can be obtained by equations (9), (10), and (11), respectively. The operating characteristics of a multi-stage group sequential design can be calculated using computation methods illustrated in the Appendix.

For convenience, an SCPRT procedure can be summarized as follows: a) Compute the sample size n for a fixed sample test using the formulas given in Section 2. b) Given the number of looks, K, and the maximum conditional discordance probability ρ, determine the boundary coefficient a (Xiong et al., 2003). c) Given calendar times t₁, …, t_K, calculate information times using formulas (16) and (3) or (12) or (13), depending on the given conditions of the design. d) Calculate SCPRT boundaries using equation (18). e) Calculate the nominal critical p-values using formula (19).

5 Simulation Study

To show the small-sample properties of the three test statistics discussed in Section 2, we conducted simulation studies to compare the type I errors and powers for the three test statistics Z₁, Z₂, and Z₃ under various scenarios. In simulation studies, the survival distribution was taken as S(x) = e^{−(ωx)^κ}, which is the Weibull distribution W(ω, κ) with scale parameter ω and shape parameter κ, where the shape parameter is set to be κ = 0.5, 1, and 2. The physical meaning of κ for the shape of the survival function by the Weibull distribution can be understood by observing plots of survival functions in Figure 1. The scale parameter ω is determined by the given values of S₀(2) and S₁(2) under null and alternative hypotheses, where x = 2 is the clinically meaningful landmark point (e.g., 2 months or 2 years) selected for the simulations. The simulations were performed for a variety of study designs of the fixed sample test and are shown in Table 1. The survival functions under null S₀(x) and alternative S₁(x) of the Weibull distribution with the various settings of simulation parameters in Table 1 are plotted in Figure 1 for illustration and for helping one to choose an appropriate shape parameter κ for sample size calculations in real applications. The null and alternative hypothesis value of survival probabilities S₀(2) and S₁(2) are set up to be in a normal hypothesis range 0.2–0.8 to detect a difference of 0.1–0.2 between S₀(2) and S₁(2).

Survival functions for the Weibull distribution under the null and alternative hypotheses for different design scenarios.

Table 1.

Sample size and simulated empirical type I error (α) and power (1 − β) based on 100,000 simulation runs for Weibull and log-normal distributions with nominal level 0.05 and power of 80% for fixed test statistics Z₁ − Z₃

Dist.	Design		0.2 vs 0.35			0.2 vs 0.4			0.3 vs 0.45

W(ω, κ)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	0.5	Z₁	63	.055	.837	38	.056	.849	77	.051	.840
		Z₂	59	.058	.825	35	.060	.834	70	.056	.821
		Z₃	51	.069	.803	30	.078	.817	65	.062	.811

	1	Z₁	65	.054	.837	40	.055	.854	79	.051	.839
		Z₂	61	.058	.825	36	.061	.833	72	.057	.821
		Z₃	52	.071	.802	31	.077	.818	68	.063	.816

	2	Z₁	67	.056	.838	41	.055	.854	82	.051	.841
		Z₂	63	.057	.823	37	.059	.831	74	.057	.820
		Z₃	54	.068	.802	31	.077	.811	70	.062	.816

LN(μ, σ²)	σ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	2	Z₁	63	.056	.837	39	.056	.855	77	.051	.838
		Z₂	59	.056	.821	35	.059	.832	70	.058	.820
		Z₃	51	.071	.804	30	.078	.816	66	.063	.813

	1	Z₁	65	.053	.834	40	.055	.854	80	.051	.839
		Z₂	62	.056	.827	36	.060	.831	73	.057	.822
		Z₃	53	.071	.806	31	.076	.815	68	.062	.813

	0.5	Z₁	67	.054	.836	41	.054	.852	82	.052	.838
		Z₂	63	.057	.823	37	.059	.828	75	.056	.823
		Z₃	54	.071	.798	32	.076	.817	70	.062	.813

Dist.	Design		0.5 vs 0.65			0.6 vs 0.75			0.7 vs 0.8

W(ω, κ)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	0.5	Z₁	86	.048	.856	82	.048	.869	152	.045	.857
		Z₂	72	.058	.817	65	.060	.825	124	.058	.816
		Z₃	77	.055	.833	74	.055	.852	143	.051	.847

	1	Z₁	89	.049	.838	85	.049	.869	157	.046	.857
		Z₂	75	.058	.820	67	.058	.825	128	.059	.815
		Z₃	80	.055	.835	77	.055	.852	148	.050	.847

	2	Z₁	92	.048	.854	87	.048	.866	162	.046	.856
		Z₂	77	.057	.819	69	.058	.823	132	.058	.815
		Z₃	82	.055	.833	79	.055	.851	153	.051	.847

LN(μ, σ²)	σ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	2	Z₁	88	.048	.856	84	.047	.872	156	.047	.858
		Z₂	74	.058	.822	66	.059	.822	127	.058	.815
		Z₃	78	.055	.830	76	.051	.850	147	.051	.847

	1	Z₁	91	.048	.855	87	.047	.869	162	.047	.855
		Z₂	76	.057	.818	68	.061	.825	132	.058	.815
		Z₃	81	.055	.834	79	.052	.852	152	.050	.847

	0.5	Z₁	93	.048	.852	89	.046	.869	166	.047	.857
		Z₂	78	.057	.818	70	.059	.825	135	.057	.813
		Z₃	83	.055	.832	81	.053	.852	157	.050	.847

Open in a new tab

We assumed that subjects were recruited with a uniform distribution over the accrual period t_a = 5 and followed for t_f = 3, with the study duration τ = t_a + t_f = 8. By the definition of X_i(t) and equation (2), at the interim look time t, a subject was censored if his/her failure time was longer than either x = 2, or censoring time C_i or t − u_i, where C_i follows an exponential distribution with hazard rate of 0.1 and u_i is an uniform distributed entry time on interval [0, t_a] for i^th subject. The empirical type I error and power of the test statistics by simulation were evaluated at the end of the study τ = 8 for the fixed sample test and at t₁ = 4 and t₂ = 8 for the two-stage group sequential test. The fixed sample sizes for each design were calculated using formula (9), (10), and (11), respectively, for Z₁(x; t), Z₂(x; t), and Z₃(x; t); the $σ_{1}^{2} (x; τ)$ in these formulas was calculated by (12) with t = τ. For each parameter configuration of the corresponding design, 100,000 observed samples of censored failure times were generated from the Weibull distribution W(ω, κ) to calculate the test statistics under the null or alternative hypothesis. The nominal type I error and power were set to be 5% and 80%, respectively. The proportions rejecting the null under the true null hypothesis represent the estimated empirical type I error. The proportions rejecting the null under the alternative hypothesis represent the estimated empirical power. The simulated empirical type I errors and powers based on samples (excluding those with Ŝ(2; τ) = 0 and Ŝ(2; τ) = 1) are summarized in Tables 1–4 for Z₁–Z₃ in various scenarios.

Table 4.

Operating characteristics of the three-stage SCPRT design based on 100,000 simulation runs under Weibull distribution W(ω, κ) with nominal level 0.05 and power of 80% of the test Z₁ for the example in Section 6

At k^th interim look	k = 1	k = 2	k = 3	total
Type I error
Nominal	0.0026	0.0028	0.0452	0.0507
Empirical	0.0019	0.0025	0.0472	0.0516

Power
Nominal	0.1274	0.1554	0.5165	0.7993
Empirical	0.1174	0.1608	0.5297	0.8079

Probability of stopping under null
Nominal	0.2759	0.2585	0.4658	1.0000
Empirical	0.2472	0.2510	0.5017	1.0000

Probability of stopping under alternative
Nominal	0.1393	0.1712	0.6897	1.0000
Empirical	0.1244	0.1689	0.7067	1.0000

Expected stopping time and sample size	ET(0)	ET(θ_a)^*	EN(0)	EN(θ_a)
Nominal	0.7608	0.8659	77	87
Empirical	0.7791	0.8747	78	88

Open in a new tab

: θ_a = z_{1 − α} + z_{1 − β} is the drift parameter at the alternative hypothesis.

From Table 1, we may have following observations. First, under each hypothetical setting of the Weibull distribution, being the same for Z₁–Z₃, the calculated sample sizes of the three statistics could be quite different, with the largest for Z₁; the empirical type I errors by simulation for the three statistics are slightly different but all close to the targeted goal (0.05), with the smallest for Z₁; the empirical powers for the three statistics could be quite different, but all are above the targeted goal (0.80), with the largest for Z₁. Second, for each test statistic (e.g., Z₁) under the same hypothetical setting (e.g., 0.2 vs 0.35), the sample sizes under different values of κ are slightly different, increasing with the largest for κ = 2; for the same test statistic, the empirical type I errors and powers of this statistic under three values of κ are very close to one another. Third, for each test statistic (e.g., Z₁) under the same value of κ setting (e.g., κ = 1), the sample sizes under different hypothetical settings (e.g., (0.2 vs 0.3) and (0.5 vs 0.65)) could be quite different; the sample size increases as the difference between S₀(2) and S₁(2) gets smaller.

In summary, the simulation results for empirical type I errors and powers showed that the test statistic Z₁ is slightly liberal (e.g., α = 0.055) when the null survival probability S₀(2) is set low (e.g., S₀(2) = 0.2), and slightly conservative (e.g., α = 0.045) when S₀(2) is set high (e.g., S₀(2) = 0.7); whereas the empirical type I errors for Z₂ and Z₃ are liberal in most scenarios. The Z₁ requires largest sample size and it is overpowered in all scenarios. Therefore, the sample size calculated based on Z₁ is overestimated. Z₂ and Z₃ require relatively small sample size, but both have inflated type I error.

Sample size calculations and simulations were also done under the log-normal distribution LN(μ, σ²) using the same design scenarios as discussed above for the Weibull distribution. The sample sizes and simulation results under the log-normal are almost identical to those under the Weibull distribution when σ = 2, 1, and 0.5 for the log-normal distribution are set corresponding to κ = 0.5, 1, and 2 for the Weibull distribution. To help us understood this phenomenon, the Weibull distribution W(ω, κ) and log-normal distribution LN(μ, σ²) are plotted in Figure 2 with scale parameters (ω and μ) equal to survival probability under the null hypothesis (e.g., S₀(2) = 0.4) and shape parameters (κ and σ) ranging from 0.5 to 2.

Survival functions for the Weibull and log-normal distributions under the null hypothesis for different shape parameters.

To study the properties of the sequential tests with Z₁–Z₃, simulations were done for a two-stage SCPRT design under the Weibull distribution using the same fixed design scenarios as discussed above (Tables 3–4). The simulated empirical type I error of test Z₁ were slightly liberal when the null hypothesis survival probability was set low, and slightly conservative when the null hypothesis survival probability was set high, in particular at the first stage with a small sample size, whereas the empirical type I error for Z₂ and Z₃ was liberal for most scenarios.

Table 3.

Simulated empirical type I error and power of the two-stage SCPRT design based on 100,000 simulation runs for Weibull distribution W(ω, κ) with nominal level 0.05 and power of 80% for fixed test statistics Z₁ − Z₃

				Type I error			Power

Design	κ	Test	At k^th interim look	k = 1	k = 2	total	k = 1	k = 2	total
0.2 vs 0.35	0.5	Z₁	Empirical	.0058	.0511	.0569	.2883	.5507	.8384
		Z₂	Empirical	.0082	.0512	.0594	.3118	.5112	.8230
		Z₃	Empirical	.0149	.0603	.0752	.3423	.4659	.8082

			Nominal	.0046	.0462	.0507	.2091	.5901	.7992

	1	Z₁	Empirical	.0050	.0506	.0557	.2434	.5957	.8391
		Z₂	Empirical	.0082	.0518	.0599	.2720	.5542	.8262
		Z₃	Empirical	.0137	.0625	.0762	.2988	.5066	.8054

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0046	.0520	.0566	.2060	.6307	.8367
		Z₂	Empirical	.0067	.0532	.0599	.2402	.5840	.8242
		Z₃	Empirical	.0133	.0622	.0755	.2706	.5376	.8082

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

0.2 vs 0.4	0.5	Z₁	Empirical	.0060	.0519	.0579	.2935	.5556	.8495
		Z₂	Empirical	.0103	.0541	.0643	.3396	.4955	.8352
		Z₃	Empirical	.0181	.0653	.0834	.3675	.4549	.8224

			Nominal	.0046	.0461	.0507	.2120	.5871	.7991

	1	Z₁	Empirical	.0049	.0524	.0573	.2473	.6084	.8557
		Z₂	Empirical	.0105	.0542	.0647	.2960	.5375	.8334
		Z₃	Empirical	.0171	.0668	.0839	.3263	.4961	.8224

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0042	.0530	.0573	.2037	.6507	.8543
		Z₂	Empirical	.0092	.0542	.0634	.2621	.5704	.8325
		Z₃	Empirical	.0149	.0678	.0827	.2838	.5286	.8124

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

0.3 vs 0.45	0.5	Z₁	Empirical	.0052	.0483	.0535	.2712	.5698	.8411
		Z₂	Empirical	.0080	.0502	.0582	.3073	.5173	.8246
		Z₃	Empirical	.0100	.0547	.0647	.3112	.5046	.8158

			Nominal	.0046	.0461	.0507	.2145	.5848	.7992

	1	Z₁	Empirical	.0044	.0490	.0534	.2257	.6137	.8394
		Z₂	Empirical	.0078	.0508	.0586	.2674	.5565	.8239
		Z₃	Empirical	.0096	.0563	.0659	.2744	.5448	.8192

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0034	.0492	.0525	.1904	.6504	.8408
		Z₂	Empirical	.0073	.0523	.0596	.2347	.5880	.8227
		Z₃	Empirical	.0083	.0574	.0656	.2412	.5757	.8169

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

0.5 vs 0.65	0.5	Z₁	Empirical	.0035	.0446	.0481	.2538	.6013	.8551
		Z₂	Empirical	.0085	.0512	.0605	.3110	.5102	.8211
		Z₃	Empirical	.0060	.0504	.0564	.2780	.5566	.8366

			Nominal	.0047	.0460	.0507	.2217	.5756	.7993

	1	Z₁	Empirical	.0032	.0464	.0495	.2008	.6536	.8544
		Z₂	Empirical	.0084	.0530	.0614	.2690	.5571	.8261
		Z₃	Empirical	.0053	.0523	.0576	.2334	.6031	.8365

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0025	.0451	.0477	.1633	.6910	.8543
		Z₂	Empirical	.0080	.0529	.0608	.2359	.5868	.8227
		Z₃	Empirical	.0052	.0516	.0568	.1980	.6349	.8329

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

0.6 vs 0.75	0.5	Z₁	Empirical	.0030	.0451	.0482	.2403	.6283	.8686
		Z₂	Empirical	.0100	.0524	.0623	.3239	.5040	.8280
		Z₃	Empirical	.0049	.0488	.0536	.2645	.5879	.8524

			Nominal	.0047	.0459	.0507	.2275	.5718	.7993

	1	Z₁	Empirical	.0026	.0444	.0470	.1818	.6883	.8701
		Z₂	Empirical	.0100	.0532	.0632	.2789	.5477	.8266
		Z₃	Empirical	.0038	.0495	.0533	.2069	.6452	.8521

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0017	.0461	.0478	.1397	.7285	.8682
		Z₂	Empirical	.0097	.0530	.0627	.2477	.5783	.8260
		Z₃	Empirical	.0037	.0498	.0535	.1671	.6833	.8504

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

0.7 vs 0.8	0.5	Z₁	Empirical	.0033	.0443	.0477	.2396	.6174	.8570
		Z₂	Empirical	.0087	.0509	.0596	.3054	.5128	.8181
		Z₃	Empirical	.0042	.0474	.0516	.2531	.5931	.8462

			Nominal	.0047	.0459	.0506	.2288	.5705	.7993

	1	Z₁	Empirical	.0024	.0457	.0480	.1857	.6702	.8559
		Z₂	Empirical	.0079	.0520	.0600	.2609	.5571	.8180
		Z₃	Empirical	.0032	.0470	.0502	.2006	.6461	.8466

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

	2	Z₁	Empirical	.0021	.0449	.0470	.1456	.7083	.8539
		Z₂	Empirical	.0085	.0534	.0619	.2311	.5881	.8191
		Z₃	Empirical	.0032	.0480	.0511	.1676	.6782	.8458

			Nominal	.0043	.0469	.0512	.1454	.6534	.7987

Open in a new tab

Overall, the test statistic Z₁ has a satisfactory type I error rate and hence is recommended for use in phase II trial design. However, the sample size calculated based on Z₁ is overestimated. A further simulation (Table 2) showed that by reducing sample size by about 10%, the power of the test Z₁ will stay at the nominal 80% level without inflating the type I error. The sample size calculation results also showed that even one misspecified κ value in design stage for the Weibull distribution or a misspecified underlying distribution from the Weibull to the log-normal had little impact on the study power or sample size.

Table 2.

Simulated empirical type I error (α) and power (1 − β) based on 100,000 simulation runs for the Weibull and log-normal distributions with the reduced sample size, adjusting the empirical power of Z₁ to nearly 80%, for fixed tests with Z₁ − Z₃

Dist.	Design		0.2 vs 0.35			0.2 vs 0.4			0.3 vs 0.45

W(ω, κ)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	0.5	Z₁	56	.055	.800	33	.057	.806	70	.051	.808
		Z₂	56	.057	.807	33	.061	.815	70	.056	.822
		Z₃	56	.070	.830	33	.073	.841	70	.062	.833

	1	Z₁	58	.054	.803	34	.056	.802	72	.052	.807
		Z₂	58	.059	.810	34	.060	.814	72	.057	.821
		Z₃	58	.072	.831	34	.075	.843	72	.063	.831

	2	Z₁	60	.055	.802	35	.055	.806	74	.052	.808
		Z₂	60	.058	.810	35	.059	.814	74	.057	.819
		Z₃	60	.069	.833	35	.074	.842	74	.062	.833

LN(μ, σ²)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	2	Z₁	56	.055	.801	33	.055	.804	70	.052	.806
		Z₂	56	.057	.807	33	.060	.815	70	.058	.820
		Z₃	56	.070	.830	33	.074	.838	70	.063	.832

	1	Z₁	58	.054	.799	34	.056	.802	72	.051	.805
		Z₂	58	.058	.810	34	.059	.812	72	.057	.817
		Z₃	58	.070	.831	34	.075	.841	72	.062	.829

	0.5	Z₁	60	.055	.802	35	.057	.804	74	.051	.805
		Z₂	60	.056	.809	35	.058	.811	74	.057	.817
		Z₃	60	.070	.831	35	.073	.839	74	.062	.831

Dist.	Design		0.5 vs 0.65			0.6 vs 0.75			0.7 vs 0.8

W(ω, κ)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	0.5	Z₁	75	.048	.805	68	.046	.806	132	.047	.807
		Z₂	75	.058	.830	68	.059	.839	132	.058	.837
		Z₃	75	.054	.826	68	.052	.821	132	.050	.819

	1	Z₁	77	.048	.805	70	.047	.804	136	.046	.801
		Z₂	77	.057	.830	70	.059	.837	136	.058	.833
		Z₃	77	.055	.822	70	.052	.821	136	.050	.811

	2	Z₁	80	.048	.808	72	.046	.801	139	.046	.801
		Z₂	80	.058	.830	72	.058	.835	139	.058	.833
		Z₃	80	.055	.826	72	.053	.820	139	.049	.813

LN(μ, σ²)	κ	Test	n	α	1 − β	n	α	1 − β	n	α	1 − β

	2	Z₁	75	.048	.801	68	.046	.796	132	.047	.797
		Z₂	75	.058	.826	68	.062	.832	132	.058	.826
		Z₃	75	.056	.821	68	.053	.814	132	.050	.808

	1	Z₁	77	.048	.799	70	.045	.792	136	.047	.795
		Z₂	77	.058	.823	70	.059	.830	136	.058	.826
		Z₃	77	.055	.814	70	.052	.813	136	.050	.805

	0.5	Z₁	80	.048	.803	72	.046	.792	139	.046	.792
		Z₂	80	.058	.827	72	.060	.828	139	.057	.825
		Z₃	80	.055	.819	72	.053	.812	139	.051	.807

Open in a new tab

6 An Example

The example is to design a single-arm phase II trial for patients with refractory metastatic colorectal cancer. The primary objective of the study was to evaluate overall survival at 6 months from enrollment (Huang, et al., 2010). The null overall survival probability at 6 months was 0.45 based on historical data. An alternative survival probability of 0.60 at 6 months was selected as a success rate warranting further development of the new treatment. Assume that overall survival time is exponentially distributed (i.e., Weibull with κ = 1) and during the trial patients could be censored exponentially with a hazard rate of 0.1 in addition to the administrative censoring. Under these assumptions, the sample size for the fixed sample test calculated based on the test Z₁ is 112 patients with a nominal level α = 0.05 and power of 80% at the alternative. However, a simulation of 100,000 runs based on sample size 112 showed that the empirical type I error and power are 0.05 and 84.6%, respectively. Therefore, the study is overpowered. A further simulation showed that sample size of 100 for the fixed test would have 81% empirical power and type I error of 0.051. Hence the study with a sample size of 100 maintains the study power and does not inflate the type I error. If an average of 3.7 patients can be accrued per month, then the accrual period is t_a = 27 months for 100 patients. Assuming a follow-up period t_f = 6 months, then the study duration is τ = 33 months. The investigators are planning a three-stage group sequential design with two interim looks and a final look planned at calendar times t₁ = 18, t₂ = 24, and t₃ = 33 months. Then the corresponding variances at interim look t_k can be calculated by

σ_{1}^{2} (6; t_{k}) = \int_{0}^{6} \frac{κ ω^{κ} u^{κ - 1}}{e^{- {(ω u)}^{κ}} e^{- 0.1 u} min {(t_{k} - u) / t_{a}, 1}} du, 6 < t_{k}, k = 1, 2, 3,

where κ = 1 and ω is determined by the alternative S₁(6) = 0.6. By numerical integration, $[σ_{1}^{2} (6; 18), σ_{1}^{2} (6; 24), σ_{1}^{2} (6; 33)] = (2.105, 1.404, 0.937)$ and the information time t* = (0.445, 0.667, 1). Assuming a maximum conditional probability of discordance ρ = 0.02 (recommended by Xiong et al., 2003), we have a = 2.604 for K = 3 (Xiong et al., 2003). The lower boundary and upper boundary calculated by (6) are (a₁, a₂, a₃) = (−0.402, 0.0216, 1.645) and (b₁, b₂, b₃) = (1.866, 2.173, 1.645), respectively. The acceptance and rejection nominal critical significance levels are (0.727, 0.489, 0.05) and (0.0026, 0.0039, 0.05), respectively. Therefore, at the first interim look, if the observed p-value ≤ 0.0026, then reject the null (declaim or stop for efficacy), or if the observed p-value ≥ 0.727, then accept the null (stop for futility); otherwise, the trial continues to the second stage. At the second interim look, if the observed p-value ≤ 0.0039, then reject the null (declaim or stop for efficacy), or if the observed p-value ≥ 0.489, then accept the null (stop for futility); otherwise, the trial continues to the final stage. We performed 100,000 simulation runs under the exponential model to evaluate the operating characteristics of the proposed group sequential design. The empirical (nominal) type I error and power of the three-stage sequential test were 0.0516 (0.0507) and 0.8079 (0.799), respectively. The empirical (nominal) probabilities of stopping under null and alternative hypothesis were 0.247 (0.276) and 0.124 (0.139) at the first look; 0.251 (0.259) and 0.169 (0.171) at the second look, respectively. The details of the operating characteristics for the proposed group sequential design are shown in Table 4.

7 Conclusion

A single-arm phase II group sequential design is proposed for monitoring the survival probability at a fixed time point. A practical issue to design such a study is to select the time point at which the survival probability to be tested. First, because the trial is monitored to stop the futility and/or efficacy, therefore, the fixed time point can be chosen such that the survival probability at fixed time point should not be too high or too low so that there is a room to detect a specified effect size of the study design. Second, there is often a landmark time point for the survival probability of the historical data in a particular disease research area. Therefore, the survival probability at such a landmark time point can be chosen for the trial design. Third, the landmark time point could be obtained after careful deliberation as a compromising of accrual rate, urgency of investigation, expectation of improvement, and prevention of worst situation.

Three non-parametric test statistics for testing survival probability at a fixed time point are discussed in this paper. The sample size formulas for fixed sample tests are also derived. These test statistics are non-parametric and applicable to test any survival function; however, the sample size calculation may depend on the underlying failure time distribution, censoring distribution, and accrual distribution, as shown by equation (12). Simulation results showed that the underlying failure time distribution has little impact on the study power or sample size. However, the censoring distribution and accrual distribution could have large impacts on the study power. In this paper, we have studied the property of three test statistics under the assumption of an exponential censoring distribution and an uniform accrual distribution. It would be worth exploring the property of the test statistics under other censoring distribution and accrual distribution assumptions.

The common basis of the three test statistics is the cumulative hazard function Λ(x; t), and its estimate Λ̂(x; t) in equation (2) shows that any of the test statistics does include subjects who have not been followed for the required time x. This implies that these test statistics should be more efficient than the naive approach of estimating the survival probability as a simple proportion of all subjects surviving the required time at the interim analyses.

Among the three test statistics, test statistic Z₁ is recommended. The sample size calculated based on Z₁ is robust against the underlying distribution assumption; however, it is overestimated. Roughly a 10% sample size reduction could be appropriate to keep the study power at the design level but we recommend that a simulation study should be done to reduce the sample size while keeping the type I error and power at the nominal levels. The trial can be monitored by a multi-stage group sequential design based on the SCPRT procedure. A practically useful and unique feature of SCPRT is that the investigators may have freedom to ignore any predetermined interim stopping during the study without harming the integrity of study because the skipping does not change the overall significance level and power of the testing. This feature is useful when a study has reached significance or futility at an interim look but investigators wish to continue the study to the planned end for some reasons, e.g., either wanting to collect more data for later analysis or wanting to convince others (or even themselves) that the interim result can be trusted. For this situation, one may continue the study and take the result at the final end; they should ignore the interim result if there is contradiction between the two results. For an SCPRT design, the probability of this contradiction is very small, e.g., its maximum is ρ_max = 0.0054 if the conditional discordant probability ρ = 0.02 is used for the design. Therefore, the proposed method can be easily used to design a single-arm phase II group sequential trial for monitoring survival probability.

Acknowledgments

The work was supported in part by National Cancer Institute (NCI) support grant CA21765 and the American Lebanese Syrian Associated Charities (ALSAC).

Appendix

Computation for the sequential test normalized with information time

Assumption

Let B(t) ~ N(θt, t) be a Gaussian process with the time variable t on interval [0, 1] and drift θ. Let 0 < t₁ < ⋯ < t_K = 1 be the information times of the looks for a sequential test with K looks. Let a_k < b_k be lower and upper boundaries for B(t) at time t_k for k = 1, ⋯, K − 1, and a_K = b_K.

Function $l_{t_{k}}^{*} (\cdot)$

Define $l_{t_{k}}^{*} (s)$ as a function of s on interval (a_k, b_k), for k = 1, ⋯, K − 1; this series of functions can be determined recursively as follows. Let $l_{t_{1}}^{*} (s) = 1$ for s on (a₁, b₁); for k = 2, ⋯, K, for s in (a_k, b_k),

l_{t_{k}}^{*} (s) = \frac{1}{\sqrt{t_{k - 1} (1 - t_{k - 1} / t_{k})}} \int_{a_{k - 1}}^{b_{k - 1}} l_{t_{k - 1}}^{*} (x) ϕ (\frac{x - {st}_{k - 1} / t_{k}}{\sqrt{t_{k - 1} (1 - t_{k - 1} / t_{k})}}) dx,

(20)

where ϕ(·) in (20) is the density function of the standard normal distribution.

Function l_{t_k}(·)

Define l_{t_k}(s) as a function of s on interval (−∞, a_k)∪(b_k, ∞), for k = 1, ⋯, K; this series of functions can be determined using functions $l_{t_{k}}^{*} (\cdot)$ defined in (20) as follows. Let l_t₁(s) = 1 for s on (−∞, a₁) ∪ (b₁, ∞); for k = 2, ⋯, K, for s in (−∞, a_k) ∪ (b_k, ∞), let

l_{t_{k}} (s) = \frac{1}{\sqrt{t_{k - 1} (1 - t_{k - 1} / t_{k})}} \int_{a_{k - 1}}^{b_{k - 1}} l_{t_{k - 1}}^{*} (x) ϕ (\frac{x - {st}_{k - 1} / t_{k}}{\sqrt{t_{k - 1} (1 - t_{k - 1} / t_{k})}}) dx,

(21)

where ϕ(·) in (21) is the density function of the standard normal distribution.

Power Function

For testing H₀ : θ ≤ 0 vs. H_a : θ > 0, with functions l_{t_k}(·) by (21), the power function P(θ) or the probability to reject H₀ under true mean θ is

P (θ) = \sum_{k = 1}^{K} (1 / \sqrt{t_{k}}) \int_{b_{k}}^{\infty} l_{t_{k}} (x) ϕ ((x - θ t_{k}) / \sqrt{t_{k}}) dx .

(22)

For the sequential test design, the significance level is α = P(0) and the power is 1 − β = P(θ_a), where θ_a is the value of θ under H_a.

Probability of Stopping

The probability of stopping at time t_k is a function of θ as

P_{t_{k}} (θ) = (1 / \sqrt{t_{k}}) (\int_{- \infty}^{a_{k}} + \int_{b_{k}}^{\infty}) l_{t_{k}} (x) ϕ ((x - θ t_{k}) / \sqrt{t_{k}}) dx,

(23)

with which the probability of stopping at t_k is P_{t_k}(0) for the null hypothesis and P_{t_k}(θ_a) for the alternative hypothesis.

Expected Stopping Time

With the probability of stopping P_{t_k}(θ) by (23), the expected stopping ET(θ) is a function of θ as

ET (θ) = \sum_{k = 1}^{K} t_{k} P_{t_{k}} (θ),

(24)

with which the expected stopping time is ET(0) for the null hypothesis and ET (θ_a) for the alternative hypothesis.

Expected Sample Size

Suppose the maximum sample size for the sequential test is n. The expected sample size for the sequential test is a function of θ and can be obtained as

EN (θ) = n \times ET (θ),

(25)

with which the expected sample size is EN(0) for the null hypothesis and EN(θ_a) for the alternative hypothesis.

For SCPRT design

To test H₀ : θ ≤ 0 vs. H_a : θ > 0 with significance level α and power 1 − β by an SCPRT design, the cutoff value at final stage t_K = 1 is a_K = b_K = z_1−α, the drift at the null hypothesis is θ₀ = 0, and the drift at the alternative hypothesis is θ_a = z_1−α + z_1−β; which are the same as those for the fixed test at the final stage with information time t = 1. Including the cutoff value, θ₀ and θ_a into equations (22), (24), (23), and (25), we can compute type I error, power, probability of stopping at given t_k, and expected sample sizes under the null and alternative hypotheses for the SCPRT design.

For details of the derivation of these computational formulas, please refer to Xiong and Tan (1999, 2001) and Xiong et al. (2002).

References

Anderson TW. An introduction to multivariate statistical analysis. New York: Wiley; 1958. [Google Scholar]
Breslow N, Crowley J. A large sample study of the life table and product limit estimates under random censorship. The Annals of Statistics. 1974;2:437–453. [Google Scholar]
Case LD, Morgan TM. Design of phase II cancer trials evaluating survival probabilities. BMC Medical Research Methodology. 2003;3:1–12. doi: 10.1186/1471-2288-3-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Collett D. Modeling survival data in medical research. 2. London: Chapman and Hall; 2003. [Google Scholar]
Fleming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley and Sons; 1991. [Google Scholar]
Huang B, Talukder E, Thomas N. Optimal two-stage phase II designs with long-term endpoints. Statistics in Biopharmaceutical Research. 2010;2:51–61. [Google Scholar]
Jennison C, Turnbull WD. Group sequential methods with applications to clinical trials. New York: Chapman & Hall; 2000. [Google Scholar]
Klein JP, Logan B, Harhoff M, Andersen PK. Analyzing survival curves at a fixed point in time. Statistics in Medicine. 2007;26:4505–4519. doi: 10.1002/sim.2864. [DOI] [PubMed] [Google Scholar]
Lin DY, Shen L, Ying Z, Breslow NE. Group sequential designs for monitoring survival probabilities. Biometrics. 1996;52:1033–1042. [PubMed] [Google Scholar]
Tan M, Xiong X. A flexible multi-stage design for phase II oncology trials. Pharmaceutical Statistics. 2010;10:369–373. doi: 10.1002/pst.478. [DOI] [PubMed] [Google Scholar]
Xiong X. A class of sequential conditional probability ratio tests. Journal of American Statistical Association. 1995;15:1463–1473. [Google Scholar]
Xiong X, Tan M. Sufficiency-Based methods for evaluating sequential tests and estimations. ASA Proceedings, Biometrics Section. 1999:268–273. [Google Scholar]
Xiong X, Tan M. Evaluating sequential tests for a class of stochastic processes. Computing Science and Statistics. 2001;33:30–34. [Google Scholar]
Xiong X, Tan M, Boyett J. Sequential conditional probability ratio tests for normalized test statistic on information time. Biometrics. 2003;59:624–631. doi: 10.1111/1541-0420.00072. [DOI] [PubMed] [Google Scholar]
Xiong X, Tan M, Boyett J. SCPRT: a sequential procedure that gives another reason to stop clinical trials early. Advances in Statistical Methods for the Health Sciences Statistics for Industry and Technology. 2007:419–434. [Google Scholar]
Xiong X, Tan M, Kutner MH. Computation methods for evaluating sequential tests and post-estimations via sufficiency principle. Statistica Sinica. 2002;12:1027–41. [Google Scholar]

[R1] Anderson TW. An introduction to multivariate statistical analysis. New York: Wiley; 1958. [Google Scholar]

[R2] Breslow N, Crowley J. A large sample study of the life table and product limit estimates under random censorship. The Annals of Statistics. 1974;2:437–453. [Google Scholar]

[R3] Case LD, Morgan TM. Design of phase II cancer trials evaluating survival probabilities. BMC Medical Research Methodology. 2003;3:1–12. doi: 10.1186/1471-2288-3-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Collett D. Modeling survival data in medical research. 2. London: Chapman and Hall; 2003. [Google Scholar]

[R5] Fleming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley and Sons; 1991. [Google Scholar]

[R6] Huang B, Talukder E, Thomas N. Optimal two-stage phase II designs with long-term endpoints. Statistics in Biopharmaceutical Research. 2010;2:51–61. [Google Scholar]

[R7] Jennison C, Turnbull WD. Group sequential methods with applications to clinical trials. New York: Chapman & Hall; 2000. [Google Scholar]

[R8] Klein JP, Logan B, Harhoff M, Andersen PK. Analyzing survival curves at a fixed point in time. Statistics in Medicine. 2007;26:4505–4519. doi: 10.1002/sim.2864. [DOI] [PubMed] [Google Scholar]

[R9] Lin DY, Shen L, Ying Z, Breslow NE. Group sequential designs for monitoring survival probabilities. Biometrics. 1996;52:1033–1042. [PubMed] [Google Scholar]

[R10] Tan M, Xiong X. A flexible multi-stage design for phase II oncology trials. Pharmaceutical Statistics. 2010;10:369–373. doi: 10.1002/pst.478. [DOI] [PubMed] [Google Scholar]

[R11] Xiong X. A class of sequential conditional probability ratio tests. Journal of American Statistical Association. 1995;15:1463–1473. [Google Scholar]

[R12] Xiong X, Tan M. Sufficiency-Based methods for evaluating sequential tests and estimations. ASA Proceedings, Biometrics Section. 1999:268–273. [Google Scholar]

[R13] Xiong X, Tan M. Evaluating sequential tests for a class of stochastic processes. Computing Science and Statistics. 2001;33:30–34. [Google Scholar]

[R14] Xiong X, Tan M, Boyett J. Sequential conditional probability ratio tests for normalized test statistic on information time. Biometrics. 2003;59:624–631. doi: 10.1111/1541-0420.00072. [DOI] [PubMed] [Google Scholar]

[R15] Xiong X, Tan M, Boyett J. SCPRT: a sequential procedure that gives another reason to stop clinical trials early. Advances in Statistical Methods for the Health Sciences Statistics for Industry and Technology. 2007:419–434. [Google Scholar]

[R16] Xiong X, Tan M, Kutner MH. Computation methods for evaluating sequential tests and post-estimations via sufficiency principle. Statistica Sinica. 2002;12:1027–41. [Google Scholar]

PERMALINK

Single-Arm Phase II Group Sequential Trial Design with Survival Endpoint at a Fixed Time Point

Jianrong Wu

Xiaoping Xiong

Abstract

1 Introduction

2 Test Statistics

3 Sample Size for Fixed Sample Test

4 Group Sequential Procedure

5 Simulation Study

Figure 1.

Table 1.

Table 4.

Figure 2.

Table 3.

Table 2.

6 An Example

7 Conclusion

Acknowledgments

Appendix

Computation for the sequential test normalized with information time

Assumption

Function $l_{t_{k}}^{*} (\cdot)$

Function l_{t_k}(·)

Power Function

Probability of Stopping

Expected Stopping Time

Expected Sample Size

For SCPRT design

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Single-Arm Phase II Group Sequential Trial Design with Survival Endpoint at a Fixed Time Point

Jianrong Wu

Xiaoping Xiong

Abstract

1 Introduction

2 Test Statistics

3 Sample Size for Fixed Sample Test

4 Group Sequential Procedure

5 Simulation Study

Figure 1.

Table 1.

Table 4.

Figure 2.

Table 3.

Table 2.

6 An Example

7 Conclusion

Acknowledgments

Appendix

Computation for the sequential test normalized with information time

Assumption

Function ltk∗(·)

Function ltk(·)

Power Function

Probability of Stopping

Expected Stopping Time

Expected Sample Size

For SCPRT design

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Function $l_{t_{k}}^{*} (\cdot)$

Function l_{t_k}(·)