Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Jul 3.
Published in final edited form as: J Biopharm Stat. 2009;19(3):494–508. doi: 10.1080/10543400902802425

Selecting Promising Treatments in Randomized Phase II Cancer Trials with an Active Control

Ying Kuen Cheung 1
PMCID: PMC2896482  NIHMSID: NIHMS203115  PMID: 19384691

Abstract

The primary objective of phase II cancer trials is to evaluate the potential efficacy of a new regimen in terms of its antitumor activity in a given type of cancer. Due to advances in oncology therapeutics and heterogeneity in the patient population, such evaluation can be interpreted objectively only in the presence of a prospective control group of an active standard treatment. This paper deals with the design problem of phase II selection trials in which several experimental regimens are compared to an active control, with an objective to identify an experimental arm that is more effective than the control, or to declare futility if no such treatment exists. Conducting a multi-arm randomized selection trial is a useful strategy to prioritize experimental treatments for further testing when many candidates are available, but the sample size required in such a trial with an active control could raise feasibility concern. In this paper, we extend the sequential probability ratio test for normal observations to the multi-arm selection setting. The proposed methods, allowing frequent interim monitoring, offer high likelihood of early trial termination, and as such enhance enrollment feasibility. The termination and selection criteria have closed form solutions, and are easy to compute with respect to any given set of error constraints. The proposed methods are applied to design a selection trial in which combinations of sorafenib and erlotinib are compared to a control group in patients with non-small-cell lung cancer using a continuous endpoint of change in tumor size. The operating characteristics of the proposed methods are compared to that of a single-stage design via simulations: the sample size requirement is reduced substantially and is feasible at an early stage of drug development.

Keywords: Noninferiority test, Probability of correct selection, Sample size re-estimation, Sequential elimination, Sequential probability ratio test, Symmetric boundaries, Type I error

1. Introduction

Phase II cancer clinical trials are “proof of concept” studies in which a new regimen is evaluated for its potential efficacy in terms of antitumor activity in a given type of cancer. The objective has traditionally been achieved with a single-arm design using clinical response as the endpoint. The major drawback of this approach is that the interpretation of the results of a single-arm phase II trial relies on the specification of a reference response rate, which is often based on historical data from the literature or within an institution. As pointed out in several recent articles, assuming a known and constant reference is elusive mainly because of heterogeneity in the entire patient population; see for example Taylor et al. (2006) and Karrison et al. (2007). These authors reconsider and seek to revive the randomized phase II design in which the new regimen is compared prospectively against a standard treatment. Conducting a randomized trial with an active control no doubt requires an increased sample size from a single-arm study, and the increase is at times deemed too great to be feasible at the early phase of development. However, Taylor et al. (2006) find that even with a moderate number of patients available, having a concurrent control group is advantageous in terms of the probability of correctly identifying an effective experimental agent. In addition, the sample size requirement can be alleviated in a number of ways due to advances in oncology therapeutics and statistical methodology. First, the Response Evaluation Criteria in Solid Tumors is under criticism for its lack of sensitivity in evaluating the new class of non-cytotoxic agents, and different outcome measurements are advocated in early-phase trials (Michaelis and Ratain, 2006). Consequently, Karrison et al. (2007) propose using the continuous endpoint of tumor size change at 8 weeks for a phase II trial of the combination of sorafenib and erlotinib in non-small-cell lung cancer (NSCLC) patients. This will retrieve the information loss due to dichotomization of the clinical response, and keep the required sample size to a feasible number. Second, the prospect of enrollment feasibility can be improved by adopting sequential designs with futility interim; see Jung (2007) for example. Finally, a relatively large type I error rate or significance level may be used in phase II. Specifically, most investigators are willing to accept one-sided test at level ranging from 10% to 30%.

Motivated by the above considerations, the paper addresses the problem of selecting a promising experimental agent in multi-arm randomized phase II trials with a prospective control. As pointed out in Karrison et al. (2007), oncology has had the advantage of having a large number of agents available for screening. While the large number of candidate agents indicates great potential in developing effective therapies, it poses an immediate challenge for investigators in prioritizing the agents for phase III testing. For this reason, in oncology and other disease areas, conducting a randomized selection trial among several experimental treatments has been endorsed as an efficient way to remove inferior agents from further consideration. For example, Simon et al. (1985) advocated randomized phase II treatment selection without control at a time when there was a clear benchmark for a good response rate for solid tumor (e.g. 25%). Cheung et al. (2006) also considered selection without an active control in a disease area where the patient population was homogeneous and historical control was relatively reliable. These assumptions may not stand in the current therapeutic approaches for cancer patients. This paper investigates the enrollment feasibility of treatment selection with a control group in the context of the NSCLC trial in Karrison et al. (2007) who consider two sorafenib/erlotinib combinations versus the control arm of erlotinib alone.

There are several proposals of two-stage designs that deal with the selection problem; see Thall et al. (1988), Schaid et al. (1990), and Bischoff and Miller (2005) for example. These designs typically choose an empirically superior experimental arm in a selection stage, and randomize additional patients to the chosen arm and the control group in a second stage for final comparison. In addition, these designs have provisions for early stopping after the selection stage if none of the experimental treatments seems promising. In this regard, a two-stage design lessens the sample size requirement when nothing works. On the other hand, as pointed out by Cheung (2008), the sample size advantage of a two-stage design disappears if there is an effective treatment among the experimental arms. In this paper, we study sequential selection boundaries that allow frequent interim looks and early stopping due to either selection or futility.

The paper is organized as follows. Section 2 formulates the hypotheses for the selection problem, presents a single-stage design, and describes two novel sequential selection designs. Calibration of the sequential designs will be further discussed in Section 3. In Section 4, we will apply the proposed methods to design the sorafenib/erlotinib combination trial in NSCLC patients, and illustrate the advantages offered by the sequential designs. Practical aspects of design implementation will be discussed in Section 5.

2. Methods

2.1. Problem formulation and single-stage design

Consider a set of treatments {0, 1, …, K}, where 0 represents the control group. Assume that the 8-week decreases in tumor size (on log scale) are normally distributed with respective means μ0, …, μK and a common variance σ2. For brevity, we focus on decision making under two hypotheses:

H0:μ1==μK=μ0andH1:μ1==μK1=μ0,μK=μ0+δ

where δ > 0 is a prespecified clinically significant improvement. Note that the treatment labels are unknown. In other words, we do not know which treatment is associated with the arm K. Our design goal is to have high probability of correct selection (PCS) under the two hypotheses. Let P0 and P1 denote the respective PCS under H0 and H1. When the null H0 prevails, a type I error is committed when an experimental treatment is declared superior to the control; therefore, our goal is to keep P0, the selection probability for the control under H0, at or above 1 − α. Another design constraint is to maintain P1, the selection probability for treatment K under H1, at about 1 − β.

To begin, consider a simple single-stage design that randomizes n subjects to each of the K + 1 arms. Formally, let XijN(μi, σ2) denote the outcome of the jth patient in arm i, and i = i(n) be the observed mean response for arm i based on n subjects. A single-step test procedure (Hochberg and Tamhane, 1987) selects the control arm if

X¯iX¯0σ2ncαfor alli=1,,K,

and select arm k if

X¯k>X¯ifor allikandX¯kX¯0σ2n>cα.

In data analysis, σ2 in the above expressions is replaced with the pooled sample variance σ^n2. For design purposes, assuming σ2 known and equal to σ02, we are to specify cα and n so that P0 ≥ 1 − α and P1 ≥ 1 − β. Using Slepian's inequality (1962), we can show that cα = z(1−α)1/K satisfies the type I error constraint, and that

P1ΦK1(nδ2σ0)Φ(nδ2σ0cα),

where zp is the pth percentile of standard normal, and Φ is the cdf. Consequently, the sample size (per arm) in a single-stage design may be determined as

min{j1:ΦK1(jδ2σ0)Φ(jδ2σ0z(1α)1/K)1β}. (2.1)

This design approach achieves the nominal type I error rate α (i.e. P0 ≥ 1 − α) even if the assumed σ02 is not the true variance. To protect P1 against misspecification of the variance, one may re-estimate the sample size n′ by applying (2.1) with σ02 replaced by σ^n2 after (K + 1)n subjects have been randomized and observed. Continue randomization with an additional (K + 1)(n′n) subjects if n′ > n; stop the study otherwise. Sample size re-estimation based on the sample variance will not inflate the type I error rate as long as it is done before the comparison of the sample means takes place.

The single-stage design, with or without sample re-estimation, will serve as the reference design for the proposed methods.

2.2. An extension of the sequential probability ratio test (SPRT)

Define a shifted outcome Yij = Xij + ai for some specified ai so that Yij∼ N(μi + ai, σ2), and let Y¯i(ni)=ni1j=1niYij. Consider the sequential probability ratio test (SPRT) statistics (Robbins and Siegmund, 1974) based on the shifted outcomes:

Zki=nknink+ni[Y¯k(nk)Y¯i(ni)]fori,k=0,1,,K;ik.

The multi-arm SPRT will terminate the trial and select arm k at an interim when Zkid for all ik, where d > 0 is prespecified. For practical reasons, we consider truncated SPRT. That is, if the termination criterion is not reached when the total sample size reaches a prespecified maximum Nmax, the trial will be closed and the arm with the largest Ȳi(ni) will be selected. Subject allocation do not have to be exactly even among the K + 1 arms at each interim, although the results of Robbins and Siegmund (1974) seem to suggest that the expected total number of observations will be smallest when n0 = n1 = ⋯ = nK. We do not require the group sizes to be identical, because it is usually difficult to achieve in a real trial. However, we recommend using identical enrollment processes for all arms so that the group sizes are roughly equal. This can be achieved, for instance, by performing simple randomization to each new incoming subject.

THEOREM 1. Suppose the shifted outcome YijN(θi, σ2) and θkθi for i ≠ k. Then if the enrollment processes do not depend on θi's, an open-ended SPRT (i.e., Nmax = ∞) will correctly select arm k with probability bounded below by

{i=0Kexp[2dσ2(θiθk)]}1. (2.2)

If we run the SPRT with a0 = ⋯ = aK under H0, then θ0 = ⋯ = θK and the lower bound (2.2) is equal to (1 + K)−1. By symmetry, we can deduce that the probability of selecting arm k, or any arm, by the SPRT is equal to (1 + K)−1. Thus, the lower bound is exact in this case. Furthermore, we observe that the probability of selecting an experimental arm under H0 (i.e., type I error rate) will equal K/(K + 1) which is apparently too large to be considered in practice. Therefore, we need to choose the shifts a0, a1, …, aK differently so as to satisfy conventional error constraints.

COROLLARY 1. If the shifts are chosen such that a0 > ai for i = 1, …, K, then the selection probability for the control under H0 (i.e. P0) is bounded below by

LB0={i=0Kexp[2dσ2(aia0)]}1

which increases and converges to 1 as d → ∞.

COROLLARY 2. If the shifts can be chosen such that aK + δ > ai for i = 0, 1, …, K − 1, then the selection probability for treatment K under H1 (i.e. P1) is bounded below by

LB1={i=0K1exp[2dσ2(aiakδ)]+1}1

which increases and converges to 1 as d → ∞.

Corollary 1 and Corollary 2 guarantee we can always find a constant d for any given error constraints if the shifts are chosen to satisfy the conditions in the corollaries. Details and applications of these results are to be provided in Section 3. The two corollaries are easy consequences of Theorem 1. The proof of Theorem 1 can be undertaken in the same manner as in Levin and Robbins (1981) who extend the SPRT to the multi-arm selection problem with binomial data and equal sample sizes. The proof of Theorem 1 is available from the author upon request.

2.3. A sequential elimination procedure (ELIM0)

In this subsection, we consider eliminating inferior arm(s) at an interim based on the SPRT stopping boundaries {Zkid} even though not all boundaries have been reached. Specifically, the elimination procedure (denoted as ELIM0) proceeds as follows:

  1. At an interim, close arm i if Zkid for some open arm k.

  2. Terminate the trial under one of the following situations:

    1. select the remaining arm when there is only one open arm;

    2. select the arm with the largest Ȳi(ni) when the total sample size reaches Nmax;

    3. select the arm with the largest Ȳi(ni) when the control arm is eliminated.

The motivation of sequential elimination is to remove arms that will unlikely emerge successful against the other arms given the information accrued thus far, and to allocate the remaining subjects among the more promising contenders. Therefore, a small probability may be expected for the event in which an eliminated arm would have eventually been the best arm according to the SPRT had it remained in the trial. We thus argue heuristically that the lower bound (2.2) and the results in Corollary 1 and Corollary 2 will serve as good approximation of P0 and P1 by ELIM0.

Condition 2(c) in ELIM0 is in place to avoid randomizing additional patients to two equally efficacious arms once the control is eliminated. This is a reasonable option for selection trials in which the objective is to identify a treatment that is comparatively more effective than the control. In situations where ranking the experimental regimens is of interest, the elimination procedure may be applied without condition 2(c). As explained in Cheung (2008), applying condition 2(c) does not inflate the type I error rate, but may decrease P1 by selecting a suboptimal treatment under H1 more often. However, the impact has been found practically negligible via extensive simulations.

The ideas of extending SPRT to multi-arm setting and sequential elimination are not new. Levin and Robbins (1981) propose and study the multi-arm SPRT and sequential elimination procedure for binomial data. In the sequel, Leu and Levin (1999a, 1999b) provide a rigorous proof for a lower bound formula analogous to (2.2) for sequential elimination. Cheung (2008) apply these previous results to treatment selection with an active control and propose an analogous version of ELIM0 for binomial outcomes.

3. Design Calibration

3.1. Design parameters

In situations where the experimental regimens are exchangeable a priori, we may set a1 = ⋯ = aK. Then the lower bounds become LB0 = {1 + K exp [2d(a1a0)/σ2]}−1 and LB1 = {exp [2d(a0a1δ)/σ2] + (K − 1) exp (−2dδ/σ2) + 1}−1. We observe that LB0 and LB1 depend on a0 and a1 only through their difference, and therefore will set a1 = 0 without loss of generality. As a result, we need 0 < a0 < δ in order to satisfy the conditions in Corollary 1 and Corollary 2, which then give

LB0=[1+Kexp(2da0σ2)]1and LB1={exp[2d(a0δ)σ2]+(K1)exp(2dδσ2)+1}1. (3.1)

This constraint on the choice of a0 is intuitive: it needs to be positive so that the control will look favorable under H0 with μ0 + a0 > μi, but smaller than δ so that a treatment with a clinically significant improvement remains superior in terms of the shifted mean under H1.

With the clinician-defined parameters δ, K and the error constraints α, β specified, the lower bounds LB0 and LB1 for the probability of correct selection depends on a0, d, and the true variance σ2. In addition, the lower bound formulae derived from (2.2) are based on the open-ended SPRT. While we expect the theoretical results will hold if the truncation Nmax is sufficiently large, the choice of a sufficiently large Nmax apparently depends also on the variability of the outcome, i.e., σ2. Assuming σ2 known for design purposes, Section 3.2 will discuss the calibration of a0 and d, and Section 3.3 determination of Nmax. We then propose using interim data to estimate σ2 so that the design parameters (a0, d, Nmax) may be changed midcourse.

3.2. d-minimal design

For a given a0 that is between 0 and δ, LB0 and LB1 are increasing functions of d. This is expected because a larger value of d invokes trial termination or treatment elimination when more information has been accrued, and hence the decision is less likely to be error-prone. For the same reason, SPRT and ELIM0 with a smaller d are expected to conclude a trial with fewer patients than when a larger d is used. Therefore, we take the design approach in Cheung (2008) whereby the shift a0 is chosen to be d-minimal: a shift a0 is d-minimal when it minimizes the required termination constant d* for given error constraints LB0 ≥ 1 − α and LB1 ≥ 1 − β and a given set of clinician-defined parameters δ and K. In particular, we derive in the Appendix that

a0=δ[logKlogit(α)log(Kα1)logit(β)]. (3.2)

Correspondingly, the required termination constant is

d=σ22δ[log(Kα1)logit(β)]. (3.3)

Since a0 does not depend on the true variance σ2, each observation in the control arm will be shifted by the same constant throughout the trial. Note that 0<a0<δ when β < 0.5, or more precisely, β < (Kα)/{Kα + K(1 − α)}. In other words, the d-minimal criterion can be applied if the research team sets its goal to identify the superior experimental arm under H1 with a target probability greater than 0.5.

The termination constant d*, on the other hand, does depend on σ2. When implementing the SPRT or ELIM0, we could repeatedly estimate σ2 with the unbiased pooled sample variance σ̂2 throughout the trial. Therefore, the termination criteria will be slightly different at each interim; empirically, we find that the estimate of d* becomes quite stable when σ2 is estimated with at least 30 degrees of freedom.

3.3. Sample size determination

Since the motivation of this work is to improve the enrollment feasibility on the single-stage design, we may initially set to truncate the sequential procedures at Nmax so that Nmax/(K + 1) is determined according to (2.1) with respect to an assumed σ02. This guarantees the sequential designs adopted will always enroll fewer patients than the single-stage design. As will be seen in the next section, the truncation Nmax thus computed keeps the actual error rates at the target level, if the true variance is less than or equal to σ02. As in the single-stage design, we could re-estimate the truncation Nmax using (2.1) based on the pooled sample variance if the trial reaches Nmax without reaching the termination criteria. That is, continue randomization with an additional NmaxNmax subjects if Nmax>Nmax; stop the trial and select the arm with the largest observed mean if otherwise.

4. Application to the NSCLC Trial

4.1. Trial designs

We now apply the three designs to a selection trial with K = 2 combinations of sorafenib and erlotinib and a control arm of erlotinib alone. These regimens are considered for the second-line setting of NSCLC so that the subjects have been previously treated. As suggested in Karrison et al. (2007), we use the 8-week tumor shrinkage (on log scale) as the endpoint, and assume that it is normally distributed with mean -0.05 and standard deviation (σ0) 0.346 in the subjects treated with the control. Based on two-thirds of the historical effect size of sorafenib in renal cell cancer patients, Karrison et al. (2007) posit a superior combination in the NSCLC population will have a mean tumor shrinkage of 0.13 on the log scale. Hence, δ = 0.13 − (−0.05) = 0.18. The error constraints are to keep the type I error rate at α = 0.1 under H0 and the selection probability for the superior combination P1 ≥ 0.8 under H1, i.e., β = 0.2.

Applying (2.1) gives n = 46 for the single-stage design with c0.1 = 1.632. Thus, the total sample size is 138. This calculation is very close to that in Karrison et al. (2007) who obtained 48 per group based on a Tukey adjustment for multiple comparison. This sample size appears to be larger than what is generally expected in phase II trials, but is arguably feasible. The goal of the sequential designs is to further reduce the enrollment figure.

According to (3.2) and (3.3), the d-minimal design parameters for SPRT and ELIM0 are a0=0.120 and d* = 12.03σ2. Since σ2 is generally unknown, we use d* = 12.03σ̂2 as the termination cutoff, where σ̂2 is the pooled sample variance based on all data at an interim. To avoid unstable estimate of σ̂2, the termination criteria will be applied only when there are at least 10 patients in each arm. Thereafter, interim analysis will be conducted regularly after a small cohort of patients, until the termination criteria has been reached or when the total sample size has reached Nmax = 138.

4.2. Simulation study

The operating characteristics of all designs were examined via simulations under H0 and H1. For the single-stage design, each of the three arms would receive 46 subjects in each simulated trial. For the SPRT and ELIM0, each cohort of subjects would be allocated to the open arms by simple randomization between interims. We did not enforce exact balance among treatment arms within each cohort, because it would be difficult and impossible to accomplish in practice, and risk unblinding the treatment allocation when the cohort size was small. We considered cohort size ranging from 1 up to 10 in the current application. A small cohort size would represent a frequent monitoring schedule, but impose a great administrative burden.

In the simulated trials, patients' outcomes for arm i were generated as normal with mean μi and variance σ2, where μ0 = μ1 = −0.05 under both H0 and H1, and μ2 = −0.05 under H0 and = 0.13 under H1. To assess the effects of variance misspecification on the designs' performance, we considered various true σ in the simulation. For all three designs, we also considered the option of re-estimating the required sample size based on the pooled sample variance in the first 138 patients. Each method was run 20,000 times under each test scenario.

Table 1 gives the operating characteristics of SPRT and ELIM0 with cohort size 6, and that of the single-stage design.

Table 1.

Operating characteristics of the single-stage design (SS), the d-minimal SPRT and ELIM0. The designs are calibrated with respect to δ = 0.18 and σ0 = 0.346 for a selection trial with K = 2, α = 0.1, and β = 0.2. For SPRT and ELIM0, a0=0.120, d* = 12σ2, and Nmax = 138.

Re-estimate Under null Under alternative
True σ Design
Nmax
P0 Median N (IQR) P1 Median N (IQR)
0.311 SS No 0.91 138 (138, 138) 0.88 138 (138, 138)
Yes 0.91 138 (138, 138) 0.88 138 (138, 138)
SPRT No 0.92 72 (48, 108) 0.80 78 (54, 126)
Yes 0.92 72 (48, 108) 0.80 78 (54, 126)
ELIM0 No 0.92 66 (48, 90) 0.81 66 (48, 108)
Yes 0.92 66 (48, 90) 0.81 66 (48, 108)

0.346 SS No 0.91 138 (138, 138) 0.81 138 (138, 138)
Yes 0.91 138 (138, 150) 0.83 138 (138, 150)
SPRT No 0.90 84 (54, 132) 0.79 90 (60, 138)
Yes 0.92 84 (54, 132) 0.81 90 (60, 138)
ELIM0 No 0.91 78 (54, 108) 0.80 78 (54, 120)
Yes 0.91 78 (54, 108) 0.80 78 (54, 120)

0.415 SS No 0.91 138 (138, 138) 0.67 138 (138, 138)
Yes 0.91 198 (183, 216) 0.81 198 (183, 216)
SPRT No 0.85 120 (78, 138) 0.75 126 (78, 138)
Yes 0.94 120 (78, 180) 0.83 126 (78, 210)
ELIM0 No 0.86 102 (66, 138) 0.75 102 (66, 138)
Yes 0.90 102 (66, 144) 0.79 102 (66, 162)

Under the null, μ0 = μ1 = μ2 = −0.05.

Under the alternative, μ0 = μ1 = −0.05 and μ2 = 0.13.

The SPRT and ELIM0 truncated at Nmax = 138 achieve the target PCS under H0 and H1 when the true variance σ2σ02. This suggests that the single-stage sample size (2.1) provides an adequately large truncation for the sequential designs when σ0 is a good estimate of the true variability. When σ = 1.2σ0 = 0.415, the PCS of these truncated sequential designs fall below the target by about 5 percentage points. However, this problem disappears when we re-estimate sample size Nmax after randomizing 138 subjects. As indicated by the theoretical results, the single-stage design maintains the target PCS under H0 (i.e., P0) regardless of the true variance; however, when σ = 0.415, it suffers great loss of PCS under H1, with P1 being 13 percentage points below the target. This problem, again, is resolved by re-estimating the sample size after randomizing 138 subjects.

The sequential designs have a clear advantage over the single-stage design in terms of the required sample size, with ELIM0 being superior to SPRT in general. First, the truncated sequential designs (without sample size re-estimation) will never enroll more patients than the single-stage design. Second, the shift in the sample size distribution of the sequential designs is remarkable. Take ELIM0 under the scenario when σ = σ0 = 0.346 as illustration. The median sample size reduces by 43% from 138 to 78; the upper quartile is smaller than 138 indicating a high likelihood that the trial will be terminated early; the lower quartile is equal to 54 (39% of the maximum 138) highlighting the occasionally substantial reduction due to a frequent interim scehdule. Finally, the shift in the sample size distribution adjusts properly in accord with the true variability: when σ = 0.9σ0 = 0.311, the median number of subjects required by ELIM0 is 66, a 52% reduction from 138. In contrast, a single-stage design does not have this kind of adaptability, and will enroll more patients than necessary if σ is smaller than the assumed σ0.

Performing sample size re-estimation increases sample size. However, it affects only the extreme upper tail of the sample size distribution of the sequential designs when the original truncation Nmax is adequately large, as we see that the interquartile ranges are unaffected for SPRT and ELIM0 when σσ0. When the true variance is larger than σ02 and the original truncation Nmax appears to be inadequate, sample size re-estimation causes an upward shift in the final sample size for the sequential designs. This is expected and indeed desirable: sample size re-estimation is intended to guarantee adequate enrollment for a certain variability in the outcomes. This being said, the sequential designs are quite efficient at utilizing information in the first batch of 138 subjects, as the median sample sizes remain unchanged for the designs with sample size re-estimation. In contrast, the median sample size required by the single-stage analysis consequent to sample size re-estimation increases to 198, a 43% increase from 138.

Figure 1 plots the operating characteristics of the sequential designs against the cohort size between interims for scenarios where the true σ = 0.346. The top panel gives the PCS of SPRT and ELIM0, with and without sample size re-estimation. All four designs achieve the nominal type I error rate by having P0 staying about 0.90 for all cohort sizes, although the SPRT truncated at Nmax = 138 seems to yield P0 slightly below the target. Likewise, the PCS under H1 (i.e. P1) is close to 0.8 in all four designs.

Figure 1.

Figure 1

Effects of cohort size on the operating characteristics of the sequential designs when σ = 0.346. The designs are specified with a0=0.120 and d* = 12σ2. Th top panel shows the PCS of SPRT (dashed line) and ELIM0 (solid line) with a fixed Nmax = 138 (indicated by “F”) and with sample size re-estimation (indicated by “R”). The bottom panel shows the sample size distribution of ELIM0 with sample size re-estimation. Each box spans the interquartile range, the darker line inside is the mean, and the lighter median; the dashed line in each figure indicate the median sample size of the single-stage design.

Sample size re-estimation improves the PCS of the SPRT and ELIM0. It is interesting to note the pattern that, with a fixed truncation, ELIM0 has higher PCS than SPRT, and that the trend is reverse for the designs with sample size re-estimation. This illustrates the advantage of sequential elimination when resources (patients) are limited to an absolute maximum Nmax: a clearly inferior arm can be removed from consideration early on so that the remaining resources can be allocated to the promising arms for more precise comparison. On the other hand, when more resources can be made available via sample size re-estimation, the SPRT emerges superior to ELIM0 by avoiding early, potentially erroneous, elimination decisions in the trial. The price for such caution is the larger sample size required by the SPRT than that by ELIM0 (cf. Table 1).

The bottom panel of Figure 1 gives the sample size distribution of ELIM0 with sample size re-estimation for various cohort sizes. We observe that the sample size distribution gradually shifts upwards as cohort size becomes large. Such an increase seems to be the price for administrative ease; however, the effects of cohort size on the designs' operating characteristics are quite small in this range. We have run simulations (not reported here) using random cohort sizes (uniformly from 1 to 10) within a trial, and obtained very similar operating characteristics. This is a desirable feature that enables investigators to choose a cohort size that is administratively convenient; investigators may even opt to set a monitoring schedule in terms of calendar time, instead of scheduling an interim after every fixed number of new enrollments. Above all, with median sample size below 80 for all cohort sizes, ELIM0 imposes a feasible enrollment requirement at an early stage of drug development.

Figure 2 displays the results for the sequential designs for scenarios with true σ = 0.415. The figure shows similar patterns found in Figure 1, although the advantage of sample size re-estimation is clear in these scenarios. Re-estimating the sample size may lead to a number much larger Nmax than 138 in these scenarios, especially under H1 where the variability in sample size seems to be large. However, this increase in the final sample size should be noted in light of the fact that a single-stage design with an assumed σ0 = 0.415 requires n = 67 subjects per arm, thus a total of 201 subjects.

Figure 2.

Figure 2

Effects of cohort size on the operating characteristics of the sequential designs when σ = 0.415. The designs are specified with a0=0.120 and d* = 12σ2. Th top panel shows the PCS of SPRT (dashed line) and ELIM0 (solid line) with a fixed Nmax = 138 (indicated by “F”) and with sample size re-estimation (indicated by “R”). The bottom panel shows the sample size distribution of ELIM0 with sample size re-estimation. Each box spans the interquartile range, the darker line inside is the mean, and the lighter median; the dashed line in each figure indicate the median sample size of the single-stage design.

5. Discussion

This article proposes sequential designs for treatment selection with a prospective control group using a normal endpoint. We have illustrated through simulation the substantial gain in sample size by using sequential methods. In particular, we recommend using a truncated sequential elimination procedure, ELIM0, augmented with sample size re-estimation based on the variance estimate if the termination criteria cannot be reached within the prespecified Nmax subjects.

The proposed methods are extremely easy to calibrate for practical use: We can specify the mean shift a0 and termination constant d* using formulae (3.2) and (3.3) respectively, and a truncation Nmax using (2.1).

Such simplicity arises from the lower bound (2.2) given in Theorem 1, which can be applied to construct PCS lower bounds under a variety of hypothesis configurations. For example, if we are comparing several new cytostatic agents with a standard cytotoxic agent on the basis of tumor shrinkage, it may be appropriate to set up noninferiority test with the following hypotheses:

H0:μ1==μK=μ0δandH1:μ1==μK1=μ0δ,μK=μ0

where δ > 0 is the noninferiority margin. Analogous results to Corollary 1 and Corollary 2 can then be derived as a consequence of Theorem 1, and applied to calibrate a0 and d for noninferiority tests.

The proposed methods are easy to implement. First, we have demonstrated that the prespecified level of PCS can be achieved with a flexible monitoring schedule in terms of cohort size between interims. Second, the termination criteria for the SPRT and ELIM0 are characterized by symmetry decision rules with respect to the shifted outcomes Yij's, and the interim pooled sample variance can be reported without providing information about the primary comparison. Therefore, it is possible to make interim decisions without unblinding the treatment labels and avoid selection bias. In contrast, in order for a two-stage design to continue to a second stage, it is known that the chosen experimental arm has to be empirically superior to the control.

There are practical considerations when implementing the proposed designs in a cancer trial, namely, patients may die prior to the 8-week tumor size evaluation. To get round the missing outcomes due to death, our recommendation is to separately monitor death and the primary endpoint (e.g. tumor shrinkage) in patients who are alive at the followup. This approach is advantageous in that the primary comparison will evaluate the regimens purely by their anticancer mechanism via the choice of the endpoint. This being said, it is imperative to compare survival experience among the treatment arms: if an experimental regimen is selected, we may informally compare the survival curve of the selected regimen to that of the control group, and examine whether there is a drastic difference between the two groups. This informal approach seems to be adequate at the early stage of drug development. A second approach to the missing outcome problem is to assign the worst possible outcome for death and use a rank-based test for statistical inference, as suggested in Karrison et al. (2007). We are currently investigating analogous rank-based sequential elimination procedures. While this approach apparently solves the missing data problem, the interpretation of such a composite outcome is not always clear. Finally, in trials where a non-trivial number of subjects may die within a few months of diagnosis (e.g., patients with advanced pancreatic cancer), it may be appropriate to consider death or short-term progression-free survival as the primary endpoint and monitor the trial based on binary outcomes; see Cheung (2008). This approach is recommended only when there are reasons to believe the experimental treatments will provide a substantial improvement in survival; otherwise, the sample size requirement will prove to be prohibitive for a phase II trial.

Acknowledgments

This work was supported by NIH grant R01 NS055809-01.

Appendix

This Appendix derives a0 (Equation 3.2) and d* (Equation 3.3) for the d-minimal design. Define d0(a0) = min{d : LB0 ≥ 1 − α} and d1(a0) = min{d : LB1 ≥ 1 − β} as functions of a0, where LB0 and LB1 are given in Equation (3.1). Thus, d*(a0) = max{d0(a0), d1(a0)} is the smallest termination constant that satisfies the error constraints for a given a0.

By continuity and monotonicity of LB0 and LB1 in d, we can replace the inequality sign in the definitions of d0(a0) and d1(a0) with an equal sign. Then it is easy to show that d0(a0) is decreasing in a0, whereas d1(a0) is increasing in a0; and that the functions d0(a0) and d1(a0) intersect at some a0 between 0 and δ if β < 0.5 (or more precisely, β < (Kα)/{Kα + K(1 − α)}. Consequently, we can deduce that d*(a0) is minimized at the intersection a0. In other words, d(a0)=d0(a0)=d1(a0).

Now solving

[1+Kexp(2d0(a0)a0σ2)]1=1α

gives

d0(a0)=σ2(logKlogit(α))2a0.

Then by definition of d1(a0), we have

{exp[2d1(a0)(a0δ)σ2]+(K1)exp[2d1(a0)δσ2]+1}1=1β

Since d0(a0)=d1(a0), we can solve for a0 by solving (A.1) and (A.2) with a0 replaced by a0 in the equations. The desired results follow after some algebraic manipulation.

References

  1. Bischoff W, Miller F. Adaptive two-stage test procedures to find the best treatment in clinical trials. Biometrika. 2005;92:197–212. [Google Scholar]
  2. Cheung YK. Simple sequential boundaries for treatment selection in multi-armed randomized clinical trials with a control. Biometrics. 2008 doi: 10.1111/j.1541-0420.2007.00929.x. in press. [DOI] [PubMed] [Google Scholar]
  3. Cheung YK, Gordon PH, Levin B. Selecting promising ALS therapies in clinical trials. Neurology. 2006;67:1748–1751. doi: 10.1212/01.wnl.0000244464.73221.13. [DOI] [PubMed] [Google Scholar]
  4. Hochberg Y, Tamhane AC. Multiple Comparison Procedures. Wiley; 1987. [Google Scholar]
  5. Jung SH. Randomized phase II trials with a prospective control. Statistics in Medicine. 2007 doi: 10.1002/sim.2961. in press. [DOI] [PubMed] [Google Scholar]
  6. Karrison TG, Maitland ML, Stadler WM, Ratain MJ. Design of phase II cancer trials using a continuous endpoint of change in tumor size: application to a study of sorafenib and erlotinib in non-small-cell lung cancer. J Natl Cancer Inst. 2007;99:1455–1461. doi: 10.1093/jnci/djm158. [DOI] [PubMed] [Google Scholar]
  7. Leu CS, Levin B. On the probability of correct selection in the Levin-Robbins sequential elimination procedure. Statistica Sinica. 1999a;9:879–891. [Google Scholar]
  8. Leu CS, Levin B. Proof of a lower bound formula for the expected reward in the Levin-Robbins sequential elimination procedure. Sequential Analysis. 1999b;18:81–105. doi: 10.1080/07474946.2013.843321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Levin B, Robbins H. Selecting the highest probability in binomial or multinomial trials. Proceedings of the National Academy of Sciences of the United States of America. 1981;78:4663–4666. doi: 10.1073/pnas.78.8.4663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Michaelis LC, Ratain MJ. Measuring response in a post-RECIST world: from black and white to shades of grey. Nature Reviews Cancer. 2006;6:409–414. doi: 10.1038/nrc1883. [DOI] [PubMed] [Google Scholar]
  11. Robbins H, Siegmund DO. Sequential tests involving two populations. Journal of the American Statistical Association. 1974;69:132–139. [Google Scholar]
  12. Schaid DJ, Wieand S, Therneau TM. Optimal two-stage screening designs for survival comparisons. Biometrika. 1990;77:507–513. [Google Scholar]
  13. Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer Treatment Reports. 1985;69:1375–1381. [PubMed] [Google Scholar]
  14. Slepian D. The one-sided barrier problem for normal noise. Bell System Tech J. 1962;41:463–501. [Google Scholar]
  15. Taylor JMG, Braun TM, Li Z. Comparing an experimental agent to a standard agent: relative merits of a one-arm or randomized two-arm phase II design. Clinical Trial. 2006;3:335–348. doi: 10.1177/1740774506070654. [DOI] [PubMed] [Google Scholar]
  16. Thall PF, Simon R, Ellenberg SS. Two-stage selection and testing designs for comparative clinical trials. Biometrika. 1988;75:303–310. [Google Scholar]

RESOURCES