Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 7.
Published in final edited form as: J Biopharm Stat. 2022 Jan 25;32(5):671–691. doi: 10.1080/10543406.2021.2009498

A stochastically curtailed single-arm phase II trial design for binary outcomes

Martin Law a,b,, Michael J Grayling c, Adrian P Mander d
PMCID: PMC7614398  EMSID: EMS171111  PMID: 35077268

Abstract

Phase II clinical trials are a critical aspect of the drug development process. With drug development costs ever increasing, novel designs that can improve the efficiency of phase II trials are extremely valuable.

Phase II clinical trials for cancer treatments often measure a binary outcome. The final trial decision is generally to continue or cease development. When this decision is based solely on the result of a hypothesis test, the result may be known with certainty before the planned end of the trial. Unfortunately, there is often no opportunity for early stopping when this occurs.

Some existing designs do permit early stopping in this case, accordingly reducing the required sample size and potentially speeding up drug development. However, more improvements can be achieved by stopping early when the final trial decision is very likely, rather than certain, known as stochastic curtailment. While some authors have proposed approaches of this form, these approaches have various limitations.

In this work we address these limitations by proposing new design approaches for single-arm phase II binary outcome trials that use stochastic curtailment. We use exact distributions, avoid simulation, consider a wider range of possible designs and permit early stopping for promising treatments. As a result, we are able to obtain trial designs that have considerably reduced sample sizes on average.

Keywords: Adaptive design, cancer, continuous monitoring, interim analysis, oncology

1. Introduction

Most novel treatments are found to be inefficacious, which makes the average development cost associated with each successful treatment extremely high (Wong et al. 2019). Furthermore, trials themselves are expensive to run (Martin et al. 2017), and the nature of evaluating treatment response in oncology trials means that results are not immediately available, meaning that trials can take substantial time to complete. This makes novel designs that can improve the efficiency of clinical research extremely valuable.

Phase II clinical trials for cancer treatments often have a binary primary outcome, based on change in tumour size as measured by the RECIST criterion (Eisenhauer et al. 2009), and typically contain only a single arm. The aim of such a phase II trial is to gain enough information to decide whether a treatment should be carried forward for further testing (a go decision) or abandoned (a no go decision). In general, if a sufficient number of (positive) responses are observed, a go decision is made and some corresponding null hypothesis is rejected, otherwise a no go decision is made and the corresponding null hypothesis is not rejected. The most simple design to evaluate a treatment with a binary outcome is the single-stage design, described by A’Hern (A’Hern 2001). In a single-stage design, a fixed number of participants are recruited and once the trial is completed, a go or no go decision is made based on the number of responses.

A number of designs have been proposed that can reduce the expected sample size (ESS) of a single-arm binary outcome trial compared to a single-stage design.

For single-arm trials with a binary outcome, the most simple trial design is the single-stage trial described by A’Hern (A’Hern 2001), which is comprised of N participants. The trial is deemed a success if the final number of responses exceeds a specified boundary r.

Extending this approach, Simon’s design (Simon 1989) adds an interim analysis after n1 participants, at which point the trial proceeds to recruit a further Nn1 participants only if the number of responses is greater than a pre-specified r1, otherwise it stops for a no go decision. Mander and Thompson extended this to permit both a go and no go decision at the interim (Mander and Thompson 2010; Simon 1989). Chi and Chen extended the Simon design by permitting stopping as soon as a go or no go decision is certain, known as non-stochastic curtailment (Chi and Chen 2008). Consequently, we will refer to this design as the “NSC” design.

It is possible to end a trial early not only when a go decision is either certain or no longer possible, as in NSC above, but also when a go decision is either likely or unlikely. This is known as stochastic curtailment (SC). One approach to SC is based on the concept of conditional power (CP). Conditional power (or “assumed conditional power” (Kunzmann et al. 2020)) is the probability of rejecting some null hypothesis (and making a go decision), conditional on an anticipated treatment effect and the current number of participants and responses. The idea of CP can be used in conjunction with SC in the following way: if the CP is below some specified lower threshold, or exceeds some specified upper threshold, then a trial will end for a no go or go decision respectively. Ayanlowo and Redden (Ayanlowo and Redden 2007) and Kunz and Kieser (Kunz and Kieser 2012) proposed designs that allow early stopping for a no go decision if the CP is below some lower threshold, that is, by allowing SC. Denote this lower threshold θF.

Another design of interest is the sequential probability ratio test (SPRT) of Wald (Wald 1947). It contains upper and lower stopping boundaries for every possible number of participants. These upper and lower boundaries do not converge to the same value as the number of participants increases, and consequently the design has no maximum sample size.

We present two novel designs that use SC to obtain upper and lower stopping boundaries for a trial. The first design applies SC to what would otherwise be a single-stage design, while the second applies SC to what would otherwise be a Simon-type design. Novel aspects include permitting stopping for a go decision and avoiding simulation by using the exact distribution of a trial’s outcomes. We describe how the proposed designs differ from existing designs and compare the them to existing designs in a variety of ways. We summarise the results and make recommendations for investigators.

The remainder of the paper is organised as follows: Section 2 covers the background of curtailed designs and the limitations of existing similar designs. This section also introduces the details of the proposed designs, including information about the concept of conditional power, technical aspects about how we search for specific examples of the designs and how to compare such examples. Section 3 compares the proposed designs to existing designs, through both a real data example and other examples. The paper concludes with a discussion in Section 4.

2. Materials and methods

2.1. Background on curtailed designs

NSC and SC are typically described in terms of continuous monitoring, where the data are analysed after each participant’s results become available. This may be considered a special case of sequential monitoring, which describes any trial in which interim results are analysed. Sequential analysis is methodologically well established (Jennison and Turnbull 2000; Whitehead 1997). Continuous monitoring has been proposed not only in the single-arm approaches of Chi and Chen, Ayanlowo and Redden (second stage) and Kunz and Kieser, but also in designs for randomised binary outcome trials (Carsten and Chen 2016; Chen et al. 2018; Law et al. 2020). In terms of practicality, continuous monitoring may be easier when a trial’s recruitment rate is low (Wason et al. 2019), which is often the case in application: Campbell et al. (Campbell et al. 2007) found that early participant recruitment was slower than expected in 77 (63%) of 122 reviewed trials, and a review of 151 randomised controlled trials by Walters et al. (Walters et al. 2017) reported a median recruitment rate of 0.92 participants per centre per month. Furthermore, Campbell et al. (Campbell et al. 2007) found that only 38 (31%) of 122 trials reached their intended sample size and 66 (54%) requested a trial extension.

Continuous monitoring may be expected to be specified at the trial design stage; see for example, Todd et al. (Todd et al. 2016) and McCabe et al. (McCabe et al. 2020). However, continuous monitoring and subsequent curtailment may also take place in trials where no such monitoring is specified in advance. In particular, authors may acknowledge the use of curtailment (and thus continuous monitoring) in a trial without using such terms in the corresponding manuscript. For example, Simon’s design may be curtailed to make either a no go decision (Mego et al. 2016; Necchi et al. 2014; Santana et al. 0000; Wagner et al. 2015) or a go decision (Moskowitz et al. 2013; Stein et al. 2013; Yoon et al. 2016; Yu et al. 2017). A trial may be curtailed without prior planning, including when the endpoint length is long (Pedersen et al. 2015; Sepulveda-Sanchez et al. 2017). Investigators may even use SC informally, ending a Simon design when trial success is possible but “unlikely” (Odia et al. 2015). We give further details of all these trials in the Appendix.

The examples cited above also suggest that continuous monitoring and subsequent curtailment is more common than citations of the associated methodological literature indicates. An important additional consequence of unplanned monitoring is that any resulting inference may result in biased point estimates and confidence intervals with low coverage (Atkinson and Brown 1985; Whitehead 1986). Accordingly, by not anticipating and accounting for continuous monitoring and curtailment at the design stage, investigators are taking inferential risks. This lack of planning also has costs in terms of the ability to report accurate ESSs and trial duration at the design stage, which may directly affect expected financial costs and/or the ability to act quickly upon making a go or no go decision.

Define a design realisation as a particular instance of a design, for example, a single-stage design where a go decision will be made if more than five responses are observed from a total of 20 participants. A design realisation must be feasible, that is, it must satisfy some chosen type-I error-rate and power. A feasible design that is the best-performing design realisation with respect to some single optimality criterion is known as the optimal design realisation for that criterion.

In the setting of multiple optimality criteria, Jung et al. (Jung et al. 2004) determined the “best” design realisations by creating a loss function that was a weighted combination of two optimality criteria: maximum sample size and ESS under some null hypothesis. The authors describe a design realisation as admissible if it has the smallest expected loss of all considered design realisations, that is, it is superior to all other considered design realisations, for some weighted combination of optimality criteria. Our interest generally lies in finding the collection of design realisations that comprise the admissible designs across all combinations of weights. Mander et al. (Mander et al. 2012) extend the concept of the loss function and admissible designs by incorporating a third component to the loss function: ESS under an alternative hypothesis.

2.2. Limitations of existing designs

Some aspects of the existing curtailed designs described above have scope for improvement. Firstly, the operating characteristics of curtailed designs have often been estimated using simulation. However, such estimates are subject to simulation error, with the exact distribution of each trial’s possible outcomes remaining unknown. Secondly, the approaches of Ayanlowo and Redden (Ayanlowo and Redden 2007) and Kunz and Kieser (Kunz and Kieser 2012) use fixed or uniformly distributed thresholds for CP, which may reduce the number of meaningful design realisations searched over. Thirdly, in both Ayanlowo and Redden and Kunz and Kieser, rather than taking curtailment into account when searching for the optimal design (for some definition of “optimal”), the optimal non-curtailed design is found and then SC is applied to it. This again means that a narrower range of possible designs is examined. A second consequence of this is that the type-I error-rate and power is decreased compared to the selected optimal non-curtailed design (as SC is only permitted to make a no go decision). This further reduces the number of possible design realisations, as many will not reach the required type-I error-rate and power once curtailment has been applied. Finally, the design approaches of Ayanlowo and Redden and Kunz and Kieser give equations for evaluating CP, but these equations do not fully account for the early stopping caused by SC.

2.3. Proposed designs

The experimental treatment is assumed to have a true response rate p ∈ [0, 1] (i.e., each patient outcome is assumed to be distributed as Bern(p)). We test the null hypothesis H0 : pp0 against the alternative hypothesis H1 : p > p0. We assume that results from participants are independent and identically distributed. Consequently, the number of responses observed at each stage are also independent. For all analyses described in this paper, the test statistic used to undertake the hypothesis test after the first m participants is the current number of responses S(m).

The trial is powered to a level 1 − β under p = p1, and the type-I error-rate is controlled to α when p = p0. Available results (Shan et al. 2017) on the monotonicity of the power function in designs of the type considered here means that the type-I error-rate is then controlled to α over all of H0 (i.e., for all p < p0) and power is at least 1 − β for all pp1. Commonly, the value of p0 is chosen to be the greatest response rate that is deemed typical for standard of care, while p1 is chosen to be the smallest response rate that is large enough to warrant further study.

We propose two designs where the trial may stop not only for a no go decision if the CP is below some lower threshold, that is, CP < θF but also for a go decision if the CP is greater than some upper threshold denoted θE, that is, CP > θE. The first is a Simon-type design that allows SC after each participants’ results, for either a go or no go decision, and contains an interim analysis in the design to which SC is added. This will be referred to as the “SC” design. The second design we propose can be understood as an otherwise single-stage design that incorporates SC. This will be referred to as the “m-stage” design. The latter design will also be alterable to allow analyses that are less frequent than continuous monitoring, while still reducing the ESS. The m-stage design is the single-arm analogue to the two-arm design introduced by Law et al. (Law et al. 2020), which permits SC for go and no go decisions.

Note that it will ultimately be possible for all go and no go decisions to be concatenated into N-length vectors of stopping boundaries e = (e1, e2, …, eN) and f = (f1, f2, …, fN) respectively, using ei = ∞ and fi = −∞ at any points i ∈ [1, N] where stopping is not permitted/possible. Thus it is possible to characterise a realisation of any design type using e and f only. This demonstrates why the comparisons conducted between designs are fair: both previous and our newly proposed designs amount to methodologies for specifying e and f. This also means that for any single-arm design, all possible combinations of number of participants and responses so far can be represented in an easy-to-understand grid. Examples of this are shown in Figure 1, for the following designs: single-stage, Simon, Mander and Thompson, NSC, SC and m-stage. Here, all possible points, that is, all possible participant and response combinations, and whether at each point the trial will continue or stop for a go or no go decision, are shown.

Figure 1.

Figure 1

Illustrative diagrams of different trial designs, showing potential points where the study would end, known as terminal points. m: Number of participant results so far. S(m): Number of responses so far. All trials have N = 8, r = 4, with r1 = 1 in the two-stage designs and e1 = 3 in Mander and Thompson’s design. We may assume that (θF, θE) in the SC design are such that (e5, e6, e7, e8) = (5, 5, 5, 5), (f2, f4, f6, f7, f8) = (0, 1, 2, 3, 4) and that (θF, θE) in the m-stage design are such that (e4, e5, e7, e8) = (4, 4, 5, 5), (f2, f3, f6, f7, f8) = (0, 1, 2, 3, 4).

An objection to curtailment for a go decision could be that one would wish to obtain more data if a treatment appears promising. However, the current abundance of possible treatments to be tested among relatively few participants makes this argument less compelling than in the past.

The proposed designs use binding stopping rules, as do six designs to which the proposed designs are compared (Ayanlowo and Redden 2007; Chi and Chen 2008; Kunz and Kieser 2012; Mander and Thompson 2010; Simon 1989; Wald 1947), including a number of designs that use continuous monitoring. If binding stopping rules are used, it should be made clear that crossing a stopping boundary necessitates stopping the trial, as such stopping has already been agreed upon in the design stage (Pallmann et al. 2018). If binding stopping rules are deemed in advance to be difficult to implement in conjunction with continuous monitoring, less frequent monitoring can be employed (Section 2.5).

2.4. Obtaining exact distributions

For any design, there are a number of points (S(m), m) at which the trial would stop. Define these points “terminal points”, and 𝒯 as the set of all such points. The terminal points can be determined using the CP at each point in the trial (see Appendix). The ESS for response rate p can then be obtained by multiplying the number of participants m at each terminal point by the probability of reaching that point:

ESS(p)=m=1NS(m)=0m𝕀[{S(m),m}𝒯]mU(S(m),m|p,e,f) (1)

where U(S(m), m|p, e, f) denotes the probability of reaching the point (S(m), m) in a particular trial given true response rate p and vectors of stopping boundaries e and f. The indicator term ensures that only terminal points are considered. Consequently, for any type of design, all that is required to find the ESS is the probability of reaching each terminal point in 𝒯. The sample size of a trial can also be described in terms of quantiles, including the median, in the following way: sort the sample sizes of the terminal points, that is, each m in each {S(m),m}𝒯, in ascending order. The corresponding probabilities U(S(m), m|p, e, f) then comprise the cumulative density function, which can be used to calculate quantiles of the required sample size.

The probability of rejecting H0 is

R(p)=m=1NS(m)=0m𝕀[{S(m),m}𝒯]𝕀[S(m)em]U(S(m),m|p,e,f)

for some response rate p. This equation is similar to Equation (1) in that only terminal points are considered, and here a further indicator term ensures that only terminal points resulting in a go decision, that is, rejecting H0, are considered. R(p) is then the sum of the probabilities of reaching each terminal point that results in a go decision, for some response rate p. The type-I error-rate and power are obtained by finding R(p0) and R(p1) respectively. Being able to obtain this information means that both the exact distribution and the operating characteristics of the trial outcomes are known.

2.5. Conditional power

We define conditional probability CP(p, S(m), m) as the probability of rejecting H0 conditional on being at point (S(m), m), when the true response rate is p. Define the conditional power as the conditional probability when p = p1, that is, CP(p1, S(m), m). From here we refer only to conditional power rather than conditional probability and reiterate that “CP” is used to refer to conditional power. No sample size re-estimation takes place.

For the NSC design we have derived the following equation for calculating CP(p1, S(m), m) exactly:

CP(p1,S(m),m)={0,ifmS(m)>Nr1or(mS(m)>n1r11andmn1)j=rS(m)n1m1[A(j,r1)i=rS(m)N(j+m+1)1A(i,r)],ifmS(m)n1r11andmn1i=rS(m)Nm1A(i,r)ifmS(m)Nr1andm>n11,ifS(m)>r} (2)

where

A(x,y)=(xyS(m))p1yS(m)+1(1p1)x{yS(m)}.

A(x, y) describes the joint probability of observing yS(m) + 1 responses out of x results, where the last observation is a response.

Equation (2) can be understood component by component: if S(m) > r, success is guaranteed, and so CP is equal to 1. If failure is guaranteed, CP is equal to 0. This occurs during the first stage if the number of responses cannot exceed r1, or equivalently, if the number of non-responses reaches n1r1, that is, if mS(m) > n1r1 − 1. Similarly, failure is guaranteed during the second stage if the number of responses cannot exceed r, or equivalently, if the number of non-responses reaches Nr, that is, if mS(m) > Nr − 1. If the final decision is to reject H0, the final observed result must be a response. Consequently, at the point (S(m), m), m > n1, rejecting H0 involves observing rS(m) or more responses in the proceeding results but one, and then observing a response in the final result. The third component of Equation (2) sums the probability of reaching the required number of responses for all valid total numbers of results. The second component of Equation (2) works in the same way, and additionally accounts for the explicit interim analysis at n1.

The exact CP for the NSC design can also be written recursively as

CP(p1,S(m),m)={0,ifmS(m)>Nr1or(mS(m)>n1r11andmn1)D,ifmS(m)Nr1or(mS(m)n1r11andmn1)1,ifS(m)>r}, (3)

where

D=p1CP(p1,S(m+1),m+1)+(1p1)CP(p1,S(m),m+1).

For a single-stage trial incorporating NSC, the CP can be obtained using these equations by omitting the conditions relating to r1 and n1. In its recursive form, it can be seen that the CP at any point in a trial is a function of the CP at points with at least the same number of responses and more participants, among such points that are possible to reach in the trial.

For designs that incorporate NSC, the trial stops and a no go decision is taken if CP = 0. The trial stops and a go decision is taken if CP = 1. The CP at any point can be obtained using Equation (2) directly. For designs that incorporate SC, the trial will additionally end at any point where 0 < CP < θF or θE < CP < 1, for some specified θF, θE ∈ [0, 1], θF < θE. As the CP is a function of later points in the trial, the predetermined decision to end a trial at any point where 0 < CP < θF causes the CP of such points to become zero. Conversely, points where θE < CP < 1 then have a CP of one. This in turn affects the CP of earlier points in trial. As such, when incorporating SC, it is logical to calculate CP at each point using a recursive equation, one value at a time, starting at the point (S(m) = r, m = N − 1), where CP(p1, r, N − 1) = p1 by definition. All “earlier” points in the trial, i.e., points such that m < N − 1, are either a function of CP(p1, r, N − 1) or are terminal points. For points with more responses or more participants, CP(p1, a, N) = 0 if ar, a ∈ ℤ0+, and CP(p1, a, b) = 1 for any a > r and any ba, bN. Thus for the SC design, the CP at each point can be obtained using the following equation:

CP(p1,S(m),m)={0,ifD<θFormS(m)>Nr1or(mS(m)>n1r11andmn1)D,ifθFDθEandmS(m)>Nr1or(mS(m)>n1r11andmn1)1,ifD>θEorS(m)>r} (4)

Equation (4) can be used to obtain the CP for the m-stage design by omitting the conditions relating to r1 and n1:

CP(p1,S(m),m)={0,ifD<θFormS(m)>Nr1D,ifθFDθEand{mS(m)>Nr1}1,ifD>θEorS(m)>r}. (5)

Continuous monitoring may not always be possible in practice. We describe a design approach for SC using sequential monitoring in terms of specified block sizes B, though it is also possible to specify the number of stages instead. An interim analysis is undertaken after every block of B participants, at which point the number of responses is compared to corresponding lower and upper stopping boundaries. The recursive equation used to calculate CP, Equation (4), may still be used, with D now generalised to handle blocks of size B:

D(B)=i=0Bp1i(1p1)BiCP(p1,S(m+i),m+B).

With this generalisation, Equation (4) can now be used to obtain CP for every possible number of responses for n ∈ {B, 2B, …, N} participants, from which lower and upper stopping boundaries can be obtained.

2.6. Design search

We seek a set of values from which ordered pairs of θF and θE will be created and searched over to find optimal or admissible design realisations, for single optimality criteria or weighted multiple-optimality criteria, respectively. One could use a uniformly distributed set of possible thresholds to some specified degree of coarseness. However, in trials that use curtailment, the CP values are not uniformly distributed; instead, most of the mass is close to zero or one. To account for the lack of a uniform distribution, we propose searching over a set of thresholds chosen based on the CP at each point in each possible trial. We obtain every possible value of CP(p1, S(m), m), including zero and one, for a given combination of {r, N} (m-stage) or {r1, n1, r, N} (SC design). We allow an upper and lower limit for θF and θE respectively, termed θFMAX and θEMIN. Then, without loss of generality, a trial-specific set of thresholds can be defined as θ={CP(p1,S(m),m):S(m)=0,,r,m=1,,N:{r,N},CPθFMAX|CPθEMIN}, where is the family containing all possible sets {r, N} (or {r1, n1, r, N}).

We suggest setting an upper bound for the lower threshold of θFMAX=p1 and a lower bound for the upper threshold of θEMIN=0.95 An upper limit for the number of CP values used in each trial set in is given by Θ is was set to 106. The final rejection boundary r was constrained to the rounded interval [⌊Np0⌋,⌈Np1⌉] in the design search, using the rationale of A’Hern (A’Hern 2001). As an alternative, constraining r to the boundaries of Wald’s SPRT (Wald 1947) was also examined. We search over all combinations of θF, θEθ, N and r (and r1 and n1 for the SC design) subject to the above constraints and desired type-I error-rate and power.

The designs that minimise ESS(p0) and ESS(p1) respectively are termed the p0- and p1-optimal designs. The designs that minimise ESS(p0) and ESS(p1) respectively among the subset of designs that minimise N are termed the p0- and p1-minimax designs. These terms are analogous to the terms H0-and H1-optimal and H0- and H1-minimax used by Mander and Thompson (Mander and Thompson 2010). The ESSs of the p0- and p1-optimal and p0- and p1-minimax admissible designs of the proposed designs will be compared to those of Simon’s design, Mander and Thompson and the NSC design, and additionally to those of the designs found using the SPRT of Wald (Wald 1947) in the case of the p0- and p1-optimal criteria. More details regarding the design search are given in the Appendix.

2.7. Comparison of existing and proposed designs

Figure 2 shows one way of understanding the relationships between existing single- and two-stage designs and the two proposed designs. Our proposed designs are denoted by a grey background. The designs CC+AR, CC+KK, Simon+AR and Simon+KK were examined by Kunz and Kieser (Kunz and Kieser 2012), and are described more fully in Section 3.1. This figure shows that the proposed m-stage and SC designs provide a general framework for both stochastic and non-stochastic curtailment in single-stage and two-stage designs, and furthermore, introduce SC for a go decision. The designs achieve this generality though the thresholds θF and θE. Key differences between existing designs (Simon, Mander and Thompson, NSC, Kunz and Kieser and Ayanlowo and Redden) and our proposed designs are shown in an alternative manner in Table A1, and in a taxonomy of possible two-stage designs in Table A2 in the Appendix.

Figure 2.

Figure 2

CC: Chi and Chen. AR: Ayanlowo and Redden. KK: Kunz and Kieser. *The approach of Ayanlowo and Redden uses θF ∈ {0.05, 0.10}. **The approach of Kunz and Kieser uses θF ∈ {0, 0.01, …, 1}.

2.8. The loss function

The expected loss function of Mander et al. (Mander et al. 2012) is

L=w0ESS(p0)+w1ESS(p1)+(1w0w1)N,

where w0, w1 ∈ [0, 1] and w0 + w1 ≤ 1. In Mander et al., the admissible design was plotted on a grid of all possible combinations of weights. We extend this concept to allow the comparison of design realisations across differing design approaches: for each combination of weights, the design realisation with the lowest loss, L, across all design approaches (Simon, Mander and Thompson, SC, m-stage, etc.) is found, and this design realisation is termed the omni-admissible design realisation for that combination of weights. Across a grid of possible combinations of weights, the design approach to which each omni-admissible design belongs is plotted. In addition, the difference between the expected loss of admissible design realisations at each set of weights is quantified, for certain pairs of design types. From these plots, the number of admissible design realisations, and the range of weights for which each admissible design realisation has the lowest loss, can be seen. There are different ways of choosing a final design realisation using such a plot. One could choose a set of weights in advance; the design realisation with the lowest expected loss would then be shown at the corresponding point on the plot. Another way would be to select the design realisation that covers the largest area on the plot; this is the design realisation that has the lowest expected loss for most possible sets of weights.

2.9. Inference: estimation of response rate

The MLE of the response rate is the observed response rate, p^naive=S(m)/m, where m represents the number of participants after which the trial stopped. This naïve estimator is biased in trials that allow stopping at an interim analysis. However, there are a range of estimators available that aim to reduce this bias. In general, such estimators have only been previously presented for two-stage designs, with the notable exceptions of Girshick et al. (Girshick et al., 03 1946) and Jung and Kim (Jung and Kim 2004). We examine estimates of the response rate across existing and novel designs, for five estimators, extended here to the multi-stage case: the naïve estimator (MLE); the bias-adjusted estimator (Chang et al. 1989); the simplified bias subtraction estimator (Guo and Liu 2005); the median unbiased estimator (MUE) (Koyama and Chen 2008) and the uniformly minimum-variance unbiased estimator (UMVUE) (Jung and Kim 2004). We evaluate the bias and the root-mean-square error (RMSE).

The bias-subtracted and bias-adjusted estimators are described in terms of the expected value of the response rate and its bias. The expected estimate of the response rate, p^, can be obtained by taking the product of the observed response rate for each possible terminal point and its probability given some true p, and summing across all possible terminal points:

𝔼(p^|p,e,f,n)=j=1JS(nj)=0nj𝕀[{S(nj),nj}𝒯]p^(S(nj),nj)U(S(nj),nj|p,e,f,n),

where p^(S(nj),nj) is the observed response rate p^ at the point (S(nj), nj) and n = (n1, n2, …, nJ) is the vector of sample sizes in each stage. For continuous monitoring, n1 = n2 = … = nJ = 1. The bias, variance and RMSE are as follows:

Bias(p^|p,e,f,n)=𝔼(p^|p,e,f,n)p
Var(p^|p,e,f,n)=𝔼(p^2|p,e,f,n)𝔼(p^|p,e,f,n)2
RMSE(p^|p,e,f,n)=Bias(p^|p,e,f,n)2+Var(p^|p,e,f,n)

The bias-subtracted estimator is

p^biassub=p^naiveBias(p^naive|p^naive,e,f,n).

The bias-adjusted estimator is the numerical solution to

p^biasadj=p^naiveBias(p^naive|p^biasadj,e,f,n).

The median unbiased estimator, p^MUE is obtained by numerically searching for the value of p that would make the p-value equal to 0.5:

pval(S(m),m|p^MUE)=0.5,

where the p-value is computed as the sum of the probability of possible outcomes with a larger value of the UMVUE. The UMVUE for a single-arm multi-stage binomial outcome trial was derived by Jung and Kim (Jung and Kim 2004). At some point (S(m), m), the UMVUE is

p^UMVUE=𝔼(p^(m1)|S(m),m),

the expected value of the response rate after some m1 participants, denoting the first point at which a decision may be made. The estimates for all estimators are obtained using the R package singlearm (Grayling 2019). The comparison of these likelihood-based estimators is similar to the comparisons made by Porcher and Desseaux in the context of the Simon design (Porcher and Desseaux 2012). However, we note that there are Bayesian approaches available that can give superior inference if one wishes to assume an informative prior. Specifically, there is an opportunity for early phase trials to benefit either from expert opinion or historical information that can be used dynamically such as the equivalence probability weighted power prior methodology (Bennett et al. 2021).

3. Results

3.1. Real data example

Kunz and Kieser (Kunz and Kieser 2012) present a real data example from Sharma et al. (Sharma et al. 2012). In this trial, the following design parameters were chosen: α = 0.05, β = 0.1, p0 = 0.2, p1 = 0.4. Kunz and Kieser compare the following combinations of designs to the (p0-)optimal Simon design, with SC permitted for a no go decision only: NSC for no go only, SC in stage ∈ only (Simon + AR); NSC for no go only, SC in both stages (Simon + KK); NSC for go and no go, SC in stage ∈ only (CC + AR), and NSC for go and no go, SC in both stages (CC + KK), where AR is the design of Ayanlowo and Redden, KK is the design of Kunz and Kieser and CC is the design of Chi and Chen. The results for threshold θF = 0.4 from Kunz and Kieser (Kunz and Kieser 2012) are reported here, as the authors report ESS(p0) for only θF = 0.4 and θF = 0.6 and state that trials using θF = 0.6 do not achieve adequate power. Table 1 contains the operating characteristics for these designs to as great an extent as possible, and also the following designs: p0-optimal designs for Chi and Chen (CC), the SC design (SC 2) and the m-stage design (m-stage 2). Further designs included are SC and m-stage designs chosen for having a similar N to other design realisations (SC 1, m-stage 1) and also Wald’s SPRT (Wald 1947).

Table 1.

Comparison of designs, with design parameters (α, β, p0, p1) = (0.05, 0.10, 0.20, 0.40). CC: Chi and Chen. AR: Ayanloyo and Redden. KK: Kunz and Kieser. Blanks in a* and ESS(p1) due to data not being included in Kunz and Kieser and not being reproducible using the Stata package simontwostage. * Median values, from simulation.

Design r 1 n 1 r N a* 1 − β* ESS(p0) ESS(p1) θF θE
Simon 4 19 15 54 0.048 0.904 30.4 51.6
CC 4 19 15 54 0.048 0.904 28.2 37.6 0.000 1.000
Simon + AR 4 19 15 54 0.882* 26.6 0.400 1.000
Simon + KK 4 19 15 54 0.038 0.857* 21.2 0.400 1.000
CC + AR 4 19 15 54 0.882* 25.4 0.400 1.000
CC + KK 4 19 15 54 0.857* 21.0 0.400 1.000
SC 1 2 14 15 54 0.050 0.901 23.0 26.6 0.164 0.998
SC 2 4 21 16 58 0.050 0.900 22.6 25.5 0.199 0.998
m-stage 1 15 52 0.049 0.909 25.3 25.8 0.135 0.996
m-stage 2 26 94 0.049 0.902 22.1 23.3 0.228 0.998
Wald 0.050 0.900 21.8 22.7

The operating characteristics of Simon+AR, Simon+KK, CC+AR and CC+KK were obtained from Kunz and Kieser (Kunz and Kieser 2012) and from Stata using the simontwostage package (Kunz and Kieser 2011). The maximum N searched over for the SC and m-stage designs, respectively, is N = 58 and N = 94, due to limitations of computational power.

Table 1 shows that with the exception of Wald, the designs with the lowest ESS(p0) are Simon+KK and CC+KK, which use a threshold of θF = 0.4 and allow stopping at any point. However, these designs both have power 1 − β* = 0.857 < 1 − β = 0.9. The designs Simon+AR and CC+AR also have power less than 1 − β = 0.9.

The four design realisations obtained using the proposed designs achieve a lower ESS(p0) than all other feasible design realisations with the exception of Wald, while achieving the necessary type-I error-rate and power. Furthermore, the first m-stage design has a lower maximum sample size than all other designs.

The study by Sharma et al. (Sharma et al. 2012) ended at the first stage, with zero responses out of 19 participants. Using NSC only, the study would have ended after 15 participants. However, using m-stage 1 or m-stage 2 the study would have ended after 11 or 8 participants respectively.

3.2. Example trials

Three sets of design parameters, or scenarios, were used to compare five design approaches: Simon; Mander and Thompson’; NSC; SC and m -stage. For each scenario and design type, optimal design realisations were obtained that satisfy each of four single optimality criteria, and a set of admissible design realisations were obtained with regard to the loss function specified by Mander et al. (Mander et al. 2012) (Section 2.8). For Simon’s and Mander and Thompson’s design, the maximum sample size searched over was 20 % greater than the maximum sample size of the p0-optimal design, as in Mander et al. (Mander et al. 2012). For the NSC and m-stage designs, the maximum sample size searched over was set to 80, approximately 2–3 times greater than the maximum sample size for the optimal Simon design’s under the p0-optimal and p0-minimax criteria. For the SC design, the maximum sample size was 43 to 47 depending on the scenario, due to computational intensity. Also reported is ESS(p0) and ESS(p1) from Wald’s SPRT. For the first scenario, the design parameters are α = 0.05, β = 0.15, p0 = 0.1 and p1 = 0.3, chosen to match Jung et al. and Mander et al. (Jung et al. 2004; Mander et al. 2012).

Table 2 shows the optimal design realisation for each design approach, for four optimality criteria: p0-optimal, p1-optimal, p0-minimax and p1-minimax. For all four optimality criteria, the optimal design realisations of the proposed designs outperform those of the existing designs, and use thresholds of θF < 0.23 and θE > 0.98 in each case. The ESSs using Wald’s SPRT are ESS(p0) = 13.9, ESS(p1) = 13.9, comparable to those of the m-stage design, ESS(p0) = 14.1 under p0-optimality, ESS(p0) = 14.3 under p1-optimality and ESS(p1) = 14.4 under both p0- and p1-optimality.

Table 2.

Optimal design realisations for each design type, Scenario 1: (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30). For all designs, the requisite type-I error-rate and power is achieved. Columns %Sp0 and %Sp1 show ESS as a proportion of Simon's design under p = p0 and p = p1 respectively. MT: Mander and Thompson.

r 1 e 1 n 1 r N ESS(p0) %Sp0 ESS(p1) %Sp1 θF θE
p0-optimal
Simon 1 11 6 35 18.3 1.00 32.3 1.00
MT 1 4 11 6 35 18.2 1.00 27.2 0.84
NSC 1 13 5 28 17.6 0.97 18.5 0.57 0.000 1.000
SC 4 27 7 41 14.3 0.78 15.0 0.46 0.186 0.993
m-stage 13 80 14.1 0.77 14.4 0.45 0.226 0.997
p1 -optimal
Simon 2 18 5 27 20.4 1.00 26.5 1.00
MT 0 3 13 6 30 25.1 1.23 20.0 0.76
NSC 1 13 5 28 17.6 0.87 18.5 0.70 0.000 1.000
SC 4 24 8 43 15.5 0.76 14.6 0.55 0.126 0.984
m-stage 12 66 14.3 0.70 14.4 0.54 0.189 0.990
Wald’s SPRT 13.9 0.68 13.9 0.52
P0-minimax
Simon 2 18 5 27 20.4 1.00 26.5 1.00
MT 1 4 14 5 27 19.3 0.95 21.0 0.79
NSC 2 18 5 27 19.3 0.95 18.7 0.71 0.000 1.000
SC 0 10 5 27 17.1 0.84 16.3 0.62 0.070 0.990
m-stage 5 27 18.7 0.92 16.6 0.63 0.084 0.990
p1-minimax
Simon 2 18 5 27 20.4 1.00 26.5 1.00
MT 1 4 15 5 27 20.3 0.99 20.8 0.78
NSC 2 18 5 27 19.3 0.95 18.7 0.71 0.000 1.000
SC 4 24 5 27 18.8 0.92 15.8 0.60 0.050 0.986
m-stage 5 27 18.7 0.92 16.6 0.63 0.084 0.990

Figure 3 (left) shows the design approach to which the omni-admissible design realisation belongs, for each combination of weights. The omni-admissible design belongs to either the SC design or the m-stage design. The difference in expected loss between the SC and m-stage admissible design realisations for each combination of weights is shown in Figure 3 (right). It shows that the admissible m-stage design realisations have a slightly lower loss score than those of the SC design realisations near the triangle’s hypotenuse, that is, when there is low weight on maximum sample size N. For much of the surface of weight combinations, the difference in loss score is in favour of the SC design but negligible, including where both w0 and w1 are close to zero and the weight of N is close to one. Analogous results for two further scenarios are given in the Appendix.

Figure 3.

Figure 3

Type of design to which the omni-admissible design realisation belongs and difference in loss scores between the SC and m-stage admissible design realisations (positive favours m-stage), scenario 1 (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30).

3.2.1. Admissible design realisations by design type

In Figure 4, the scenario 1 admissible design realisations are shown for each design type and combination of weights. The plots of admissible design realisations for the Simon and Mander and Thompson’s designs, Figure 4 (top), match those obtained by Mander et al. The overall results are similar across all three scenarios (with scenarios 2 and 3 again in the Appendix): the proposed designs generally contain a greater number of admissible design realisations across the combinations of weights examined than the existing designs. This is expected as including SC thresholds necessarily results in an increased number of possible design realisations. For the proposed designs, the admissible design regions often contain slopes parallel to the hypotenuse, suggesting that the admissible design may be more dependent on the weight of N than ESS(p0) or ESS(p1) separately.

Figure 4.

Figure 4

Admissible design realisations for scenario 1 (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30). Format of design realisations: Simon, NSC: {r1/n1, r/N}; Mander and Thompson: {(r1 e1)/n1, r/N}; SC: {r1/n1, r/N, θF/θE}; m-stage: {r/N, θF/θE}.

3.3. Effect of reduced monitoring frequency

Table 3 shows optimal design realisations using blocks of size four, that is, permitting SC after every four participants, for the first scenario (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30). These are shown alongside the optimal design realisations for Simon’s design and the m-stage design. The m-stage design may be considered to be equivalent to using blocks of size one. Under the p0- and p1-optimality criteria and p0- and p1-minimax criteria, which are combined in the table as the optimal (minimax) design realisations are identical in this instance, the design realisations with block size four produce considerable savings in ESS(p1) and have comparable ESS(p0) compared to Simon’s design.

Table 3.

Selection of optimal design realisations, including stochastically curtailed designs with stopping permitted after every four participants, for scenario 1: (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30). For all design realisations, requisite type-I error-rate and power is reached. Columns %Sp0 and %Sp1 show ESS as a proportion of Simon's design under p = p0 and p = p1 respectively.

r 1 n 1 r n ESS(p0) %Sp0 ESS(p1) %Sp1 θF θE
p0-optimal
Simon 1 11 6 35 18.3 1.00 32.3 1.00
m-stage 13 80 14.1 0.77 14.4 0.45 0.226 0.997
Block size 4 6 32 18.8 1.03 18.7 0.58 0.194 0.984
p1-optimal
Simon 2 18 5 27 20.4 1.00 26.5 1.00
m-stage 12 66 14.3 0.70 14.4 0.54 0.189 0.990
Block size 4 11 52 18.9 0.93 18.5 0.70 0.191 0.974
P0/1-minimax
Simon 2 18 5 27 20.4 1.00 26.5 1.00
m-stage 5 27 18.7 0.92 16.6 0.63 0.084 0.990
Block size 4 6 32 18.8 0.92 18.7 0.71 0.194 0.984

3.4. Estimation (scenario 1, selected)

Bias and RMSE in the response rate estimates are shown in Figures 56 for p0-optimal design realisations for scenario 1 (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30), with the maximum absolute bias and RMSE shown in Table 4. In Simon’s design and the Mander and Thompson design, the bias is close to zero for all estimators (Figure 5, left). For designs that employ curtailment, the bias adjusted, bias subtracted MUE and UMVUE estimators have a bias consistently close to zero, while the naïve estimator gives more biased estimates (Figure 5, bottom left, Figure 6, left). Overall, bias and RMSE is only slightly poorer among the proposed designs than the existing designs when p < p1. For greater p, the poorer estimates among the proposed designs are a result of the trial being curtailed with fewer participants compared to the existing designs. The maximum absolute bias is similar across designs, with the exception of somewhat greater bias among the proposed designs under the naïve estimator (Table 4).

Figure 5. Bias, RMSE for p0-optimal designs, scenario 1 (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30).

Figure 5

Figure 6. Bias, RMSE for p0-optimal designs, scenario 1 (α, β, p0, p1) = (0.05, 0.15, 0.10, 0.30), continued.

Figure 6

Table 4.

Maximum absolute bias and RMSE for various point estimators of p0-optimal designs, scenario 1: (a = 0.05, β = 0.15, p0 = 0.1, p = 0.3). MT: Mander and Thompson.

Bias (absolute) RMSE
Bias adj. Bias subt. Naïve MUE UMVUE Bias adj. Bias subt. Naïve MUE UMVUE
Simon 0.008 0.009 0.031 0.032 4.44 × 10−16 0.097 0.098 0.104 0.106 0.101
MT 0.010 0.010 0.029 0.049 2.22 × 10−16 0.147 0.147 0.138 0.146 0.150
NSC 0.009 0.010 0.041 0.030 3.33 × 10−16 0.165 0.165 0.161 0.159 0.166
SC 0.022 0.025 0.090 0.030 3.05 × 10−16 0.232 0.235 0.223 0.224 0.236
m-stage 0.025 0.024 0.094 0.024 3.33 × 10−16 0.232 0.236 0.231 0.232 0.246

The RMSE of the estimates gradually increases as the degree of permitted curtailment increases, with Simon’s design having the lowest RMSE and the proposed designs the greatest (Table 4). RMSE decreases sharply to zero as the response rate approaches one.

4. Discussion

Phase II binary outcome trials are a critical aspect of drug development. However, with high failure rates and high costs, it is valuable to find ways of making correct decisions more quickly. We have presented two single-arm designs created to improve binary outcome trials in this respect.

As part of these proposed designs, this work introduces five approaches to improving a search for an optimal or admissible design realisation that uses SC: firstly, the exact distribution of the trial outcomes is obtained, allowing the trial’s operating characteristics to be known without simulation error. Secondly, a new approach is proposed for finding relevant CP thresholds when using SC, based on the CP at each point in each possible set {r, N} or {r1, n1, r, N}, allowing more potential design realisations to be evaluated. Thirdly, the CP at each point in each potential design realisation is calculated taking the possibility of SC into account. Furthermore, in the design search, type-I error-rate and power are only calculated after taking curtailment into account. Finally, the design search is undertaken using wide ranges for maximum sample size and final rejection boundary, rather than being restricted to, say, a single realisation of Simon’s design.

Comparing our proposed designs to a number of existing designs, we found that the proposed designs were superior in almost all cases, whether considering either a single optimality criterion or a weighted combination of multiple optimality criteria. The effect of reducing the frequency of monitoring, was examined. It was shown that considerable savings in ESS can still be made even when employing designs with less frequent monitoring.

While all proposed methods performed well in the comparisons that have been made, there are limitations to their use. The two proposed single-arm designs find design realisations with similar operating characteristics. However, one design, the m-stage design, completes a design search more quickly than other design, the SC design, and this difference is approximately one order of magnitude. The proposed single-arm designs both use sequential monitoring, which may be seen as a logistical limitation. However, in our proposed designs, the frequency of monitoring can be specified at the trial design stage, to accommodate the practical needs of the investigators. A separate limitation of the sequential monitoring is that, depending on the design realisation, it is possible that a trial may end with a small number of participants, which may be undesirable in some circumstances. Conservative investigators may prefer to use a large block size.

Arguably, a limitation of this work in general is that it focuses solely on frequentist methods. Bayesian methods can be used for phase II clinical trial design (Dutton et al. 2018; Johnson and Cook 2009; Lee and Liu 2008; Lin and Lee 2020) and are becoming more widely used over time (Lee and Chu 2012; Lin and Lee 2020). However, some Bayesian and frequentist designs are closely related conceptually, for example, the frequentist CP-based approach and the Bayesian predictive power approach (Jennison and Turnbull 2000). Furthermore, in the context of binary outcome trials, Bayesian and frequentist designs can both be described using vectors of lower and upper stopping boundaries f and e. One main advantage of Bayesian trial design is the ability to incorporate prior information though not all Bayesian designs do so, and instead use an uninformative prior (or priors). In contrast, frequentist methods are deemed to discard such data, or at best use it in as a summary way. Bayesian designs may be created with Bayesian operating characteristics in mind, which can be more intuitive to non-statisticians. However, Bayesian designs may be required to satisfy certain frequentist operating characteristics. In the case that a Bayesian design both uses an uninformative prior and must satisfy some typical frequentist operating characteristics, the resulting design realisation may confer no advantage over an equivalent frequentist design. Another advantage of some Bayesian designs is that they are more flexible in terms of allowing interim analyses to occur at points that are not fixed in advance, with only minor negative consequences in terms of Bayesian operating characteristics (Lee and Liu 2008), though negative consequences persist in terms of frequentist operating characteristics. However, allowing such flexibility could result in unconscious bias, with investigators able to undertake an interim analysis after observing a succession of positive results or avoiding an interim analysis after observing a succession of negative results. Furthermore, in a two-stage design, it is possible to conduct an interim analysis at a different point to that planned, while still controlling the type-I error-rate (Englert and Kieser 2015). Finally, Bayesian methods often use simulation, though conjugate priors may be available in some circumstances, and in the area of very rare disease all posterior distributions can be found (Hampson et al. 2014). Consequences of using simulation may be simulation error and computational requirements, however, our proposed approach avoids simulation entirely.

While the m-stage design performed well in all circumstances examined, other designs performed similarly when sole importance was placed on minimising maximum sample size N. As such, existing designs should be preferred over the proposed designs when performance in similar for the optimality criterion of prime importance and the existing design uses fewer interim analyses. We acknowledge that stopping early for a go decision is not critical for every phase II study. However, a number of single- and two-arm approaches permit such early stopping (Carsten and Chen 2016; Chen et al. 2018; Chi and Chen 2008; Law et al. 2020; Mander and Thompson 2010). We have also provided examples of trials that stopped early for a go decision even without using a design that permitted such stopping (Mego et al. 2016; Necchi et al. 2014; Santana et al. 0000; Wagner et al. 2015), while an upcoming trial will use the m-stage design, including early stopping for a go decision (Positioning imatinib for pulmonary arterial hypertension (PIPAH) 2020). This suggests that early stopping for a go decision is both well-established methodologically and used in practice.

In practical terms, single-arm design realisations can be obtained by calling the appropriate functions in R (R Core Team 2019) after installing and loading the package curtailment (M. Law 2021). For each proposed design approach, a single function undertakes a single-arm design search. Admissible design realisations are returned, if they exist. A second function takes as its input any chosen design realisation and returns the corresponding stopping boundaries in a simple table and diagram. The final output is simply a collection of stopping boundaries.

The proposed single-arm designs could be extended in a number of ways in future work. It would be valuable to investigate the effects of delayed responses on ESS in curtailed designs. In case of a desire to collect a certain degree of information in a trial, a trial could be specified to end only after data is available for some minimum number of participants. With regard to estimation, estimates of confidence intervals and p values could be compared to those from existing design types. While the results of Wald’s SPRT (Wald 1947) are positive, the design has no upper limit for maximum sample size. This poses obvious problems in terms of approval and logistics. This is also why the designs are compared only to optimal designs and not minimax designs – the maximum sample size is infinite. Wald states that if a finite maximum sample size is desired, it can simply be chosen, and provides a “reasonable rule” for the final stopping boundary. Future work may compare this design to the proposed designs not only in terms of weighted combination of ESS(p0), ESS(p1) and N, but also in terms of minimising sup ESS(p) (Kiefer and Weiss 1957; Shuster 2002). p∈[0,1]

In summary, the clinical trial designs we propose offer considerably improved operating characteristics compared to existing designs, and widespread use of these designs would speed up drug development.

Supplementary Material

Supplementary File 1

Funding

This work was supported by the Medical Research Council (grant number MC_UU_00002/3 to ML, MJG, and APM).

Footnotes

Disclosure statement

The authors declare no potential conflicts of interest.

References

  1. A’Hern RP. Sample size tables for exact single-stage phase II designs. Statistics in Medicine. 2001;20(6):859–866. doi: 10.1002/sim.721. [DOI] [PubMed] [Google Scholar]
  2. Atkinson EN, Brown BW. Confidence limits for probability of response in multistage phase II clinical trials. Biometrics. 1985;41(3):741–744. doi: 10.2307/2531294. [DOI] [PubMed] [Google Scholar]
  3. Ayanlowo AO, Redden DT. Stochastically curtailed phase II clinical trials. Statistics in Medicine. 2007;26(7):1462–1472. doi: 10.1002/sim.2653. [DOI] [PubMed] [Google Scholar]
  4. Bennett M, White S, Best N, Mander A. A novel equivalence probability weighted power prior for using historical control data in an adaptive clinical trial design: A comparison to standard methods. Pharmaceutical Statistics. 2021;20(3):462–484. doi: 10.1002/pst.2088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Campbell MK, Snowdon C, Francis D, Elbourne D, Mcdonald AM, Knight R, Entwistle V, Garcia J, Roberts I, Grant A. Recruitment to randomised trials: Strategies for trial enrolment and participation study. Health Technology Assessment. 2007;11(48) doi: 10.3310/hta11480. [DOI] [PubMed] [Google Scholar]
  6. Carsten C, Chen P. Curtailed two-stage matched pairs design in double-arm Phase II clinical trials. Journal of Biopharmaceutical Statistics. 2016;26(5):816–822. doi: 10.1080/10543406.2015.1074921. [DOI] [PubMed] [Google Scholar]
  7. Chang M, Wieand H, Chang V. The bias of the sample proportion following a group sequential phase II clinical trial. Statistics in Medicine. 1989;8(5):563–570. doi: 10.1002/sim.4780080505. [DOI] [PubMed] [Google Scholar]
  8. Chen CM, Chi Y, Chang HM. Curtailed two-stage design for comparing two arms in randomized phase II clinical trials. Journal of Biopharmaceutical Statistics. 2018;28(5):939–950. doi: 10.1080/10543406.2018.1428615. [DOI] [PubMed] [Google Scholar]
  9. Chi Y, Chen C. Curtailed two-stage designs in phase II clinical trials. Statistics in Medicine. 2008;27(29):6175–6189. doi: 10.1002/sim.3424. [DOI] [PubMed] [Google Scholar]
  10. Dutton P, Love SB, Billingham L, Hassan AB. Analysis of phase II methodologies for single-arm clinical trials with multiple endpoints in rare cancers: An example in Ewing’s sarcoma. Statistical Methods in Medical Research. 2018;27(5):1451–1463. doi: 10.1177/0962280216662070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, et al. New response evaluation criteria in solid tumours: Revised recist guideline (version 1.1) European Journal of Cancer. 2009;45(2):228–247. doi: 10.1016/j.ejca.2008.10.026. Response assessment in solid tumours (RECIST): Version 1.1 and supporting papers. [DOI] [PubMed] [Google Scholar]
  12. Englert S, Kieser M. Methods for proper handling of overrunning and underrunning in phase II designs for oncology trials. Statistics in Medicine. 2015;34(13):2128–2137. doi: 10.1002/sim.6479. [DOI] [PubMed] [Google Scholar]
  13. Grayling MJ. singlearm. 2019. https://github.com/mjg211 .
  14. Guo HY, Liu A. A simple and efficient bias-reduced estimator of response probability following a group sequential phase II trial. Journal of Biopharmaceutical Statistics. 2005;15(5):773–781. doi: 10.1081/BIP-200067771. [DOI] [PubMed] [Google Scholar]
  15. Hampson LV, Whitehead J, Eleftheriou D, Brogan P. Bayesian methods for the design and interpretation of clinical trials in very rare diseases. Statistics in Medicine. 2014;33(24):4186–4201. doi: 10.1002/sim.6225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Chapman & Hall/ CRC; 2000. [Google Scholar]
  17. Johnson VE, Cook JD. Bayesian design of single-arm phase II clinical trials with continuous monitoring. Clinical Trials. 2009;6(3):217–226. doi: 10.1177/1740774509105221. [DOI] [PubMed] [Google Scholar]
  18. Jung SH, Kim KM. On the estimation of the binomial probability in multistage clinical trials. Statistics in Medicine. 2004;23(6):881–896. doi: 10.1002/sim.1653. [DOI] [PubMed] [Google Scholar]
  19. Jung SH, Lee T, Kim KM, George SL. Admissible two-stage designs for phase II cancer clinical trials. Statistics in Medicine. 2004;23(4):561–569. doi: 10.1002/sim.1600. [DOI] [PubMed] [Google Scholar]
  20. Kiefer J, Weiss L. Some properties of generalized sequential probability ratio tests. The Annals of Mathematical Statistics. 1957;28(1):57–74. doi: 10.1214/aoms/1177707037. [DOI] [Google Scholar]
  21. Koyama T, Chen H. Proper inference from Simon’s two-stage designs. Statistics in Medicine. 2008;27(16):3145–3154. doi: 10.1002/sim.3123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kunz CU, Kieser M. Simon’s minimax and optimal and Jung’s admissible two-stage designs with or without curtailment. Stata Journal. 2011;11(2):240–254.:15. doi: 10.1177/1536867X1101100205. [DOI] [Google Scholar]
  23. Kunz CU, Kieser M. Curtailment in single-arm two-stage phase II oncology trials. Biometrical Journal. 2012;54(4):445–456. doi: 10.1002/bimj.201100128. [DOI] [PubMed] [Google Scholar]
  24. Kunzmann K, Grayling MJ, Lee KM, Robertson DS, Rufibach K, Wason JMS. Conditional power and friends: The why and how of (un)planned, unblinded sample size recalculations in confirmatory trials. arXiv preprint. 2020:arXiv:2010.06567. doi: 10.1002/sim.9288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Law M, Grayling MJ, Mander AP. A stochastically curtailed two-arm randomised phase II trial design for binary outcomes. Pharmaceutical Statistics. 2020 doi: 10.1002/pst.2067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lee JJ, Chu CT. Bayesian clinical trials in action. Statistics in Medicine. 2012;31(25):2955–2972. doi: 10.1002/sim.5404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lee JJ, Liu DD. A predictive probability design for phase II cancer clinical trials. Clinical Trials. 2008;5(2):93–106. doi: 10.1177/1740774508089279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lin R, Lee JJ. Cancer clinical trials. Cham: Springer International Publishing; 2020. Novel bayesian adaptive designs and their applications; pp. 395–426. [Google Scholar]
  29. Law M. curtailment: Finds binary outcome designs using stochastic curtailment. R package version 0.1.1. 2021.
  30. Mander AP, Wason JMS, Sweeting MJ, Thompson SG. Admissible two-stage designs for phase II cancer clinical trials that incorporate the expected sample size under the alternative hypothesis. Pharmaceutical Statistics. 2012;11(2):91–96. doi: 10.1002/pst.501. [DOI] [PubMed] [Google Scholar]
  31. Mander AP, Thompson SG. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials. Contemporary Clinical Trials. 2010;31(6):572–578. doi: 10.1016/j.cct.2010.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Martin L, Hutchens M, Hawkins C, Radnov A. How much do clinical trials cost? Nature Reviews Drug Discovery. 2017;16(6):381–382. doi: 10.1038/nrd.2017.70. [DOI] [PubMed] [Google Scholar]
  33. McCabe L, White IR, Vinh Chau NV, Barnes E, Pett SL, Cooke GS, Walker AS. The design and statistical aspects of VIETNARMS: A strategic post-licensing trial of multiple oral direct-acting antiviral hepatitis C treatment strategies in Vietnam. Trials. 2020;21:1–12. doi: 10.1186/s13063-020-04350-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Mego M, Svetlovska D, Miskovska V, Obertova J, Palacka P, Rajec J, Sycova-Mila Z, Chovanec M, Rejlekova K, Zuzak P, et al. Phase II study of everolimus in refractory testicular germ cell tumors. Urologic Oncology. 2016 March;34(3):17–22. doi: 10.1016/j.urolonc.2015.10.010. [DOI] [PubMed] [Google Scholar]
  35. Moskowitz AJ, Hamlin PA, Perales M, Gerecitano J, Horwitz SM, Matasar MJ, Noy A, Palomba ML, Portlock CS, Straus DJ, et al. Phase II study of bendamustine in relapsed and refractory hodgkin lymphoma. Journal of Clinical Oncology. 2013;31(4):456–460. doi: 10.1200/JCO.2012.45.3308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Necchi A, Giannatempo P, Mariani L, Farè E, Raggi D, Pennati M, Zaffaroni N, Crippa F, Marchianò A, Nicolai N, et al. PF-03446962, a fully-human monoclonal antibody against transforming growth-factor β (TGF β) receptor ALK1, in pre-treated patients with urothelial cancer: An open label, single-group, phase 2 trial. Investigational New Drugs. 2014 June;32(3):555–560. doi: 10.1007/s10637-014-0074-9. [DOI] [PubMed] [Google Scholar]
  37. Odia Y, Kreisl TN, Aregawi D, Innis EK, Fine HA. A phase II trial of tamoxifen and bortezomib in patients with recurrent malignant gliomas. Journal of Neuro-Oncology. 2015 October;125(1):191–195. doi: 10.1007/s11060-015-1894-y. [DOI] [PubMed] [Google Scholar]
  38. Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, Holmes J, Mander AP, Sydes MR, Villar SS, et al. Adaptive designs in clinical trials: Why use them, and how to run and report them. BMC Medicine. 2018;16(1):1–15. doi: 10.1186/s12916-018-1017-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Pedersen KS, Kim GP, Foster NR, Wang-Gillam A, Erlichman C, McWilliams RR. Phase II trial of gemcitabine and tanespimycin (17AAG) in metastatic pancreatic cancer: A mayo clinic phase II consortium study. Investigational New Drugs. 2015 August;33(4):963–968. doi: 10.1007/s10637-015-0246-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Porcher R, Desseaux K. What inference for two-stage phase II trials? BMC Medical Research Methodology. 2012;12(117) doi: 10.1186/1471-2288-12-117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Positioning imatinib for pulmonary arterial hypertension (PIPAH) 2020. https://clinicaltrials.gov/ct2/show/NCT04416750 .
  42. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2019. [Google Scholar]
  43. Santana TA, Cruz FM, Trufelli DC, Glasberg J, Del Giglio A. Carbamazepine for prevention of chemotherapy-induced nausea and vomiting: A pilot study. Sao Paulo Medical Journal. 2014;132(3):147–151. doi: 10.1590/1516-3180.2014.1323600. 0000 00. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sepulveda-Sanchez JM, Vaz MA, Balana C, Gil-Gil M, Reynes G, Gallego O, Martinez-Garcia M, Vicente E, Quindos M, Luque R, et al. Phase II trial of dacomitinib, a pan-human EGFR tyrosine kinase inhibitor, in recurrent glioblastoma patients with EGFR amplification. Neuro-Oncology. 2017 October;19(11):1522–1531. doi: 10.1093/neuonc/nox105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Shan G, Chen JJ, Ma C. Boundary problem in Simon’s two-stage clinical trial designs. Journal of Biopharmaceutical Statistics. 2017;27(1):25–33. doi: 10.1080/10543406.2016.1148716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sharma MR, Wroblewski K, Polite BN, Knost JA, Wallace JA, Modi S, Sleckman BG, Taber D, Vokes EE, Stadler WM, et al. Dasatinib in previously treated metastatic colorectal cancer: A phase II trial of the university of chicago phase II consortium. Investigational New Drugs. 2012;30(3):1211–1215. doi: 10.1007/s10637-011-9681-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Shuster J. Optimal two-stage designs for single arm phase II clinical trials. Journal of Biopharmaceutical Statistics. 2002;12(1):39–51. doi: 10.1081/BIP-120005739. [DOI] [PubMed] [Google Scholar]
  48. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials. 1989;10(1):1–10. doi: 10.1016/0197-2456(89)90015-9. [DOI] [PubMed] [Google Scholar]
  49. Stein SM, Tiersten A, Hochster HS, Blank SV, Pothuri B, Curtin J, Shapira I, Levinson B, Ivy P, Joseph B, et al. A phase 2 study of oxaliplatin combined with continuous infusion topotecan for patients with previously treated ovarian cancer. International Journal of Gynecologic Cancer. 2013;23(9):1577–1582. doi: 10.1097/IGC.0b013e3182a809e0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Todd JA, Evangelou M, Cutler AJ, Pekalski ML, Walker M, Stevens HE, Porter L, Smyth DJ, Rainbow DB, Ferreira RC, et al. Regulatory T cell responses in participants with type 1 diabetes after a single dose of interleukin-2: A non-randomised, open label, adaptive dose-finding trial. PLOS Medicine. 2016:1–33. doi: 10.1371/journal.pmed.1002139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wagner LM, Fouladi M, Ahmed A, Krailo MD, Weigel B, DuBois SG, Doyle LA, Chen H, Blaney SM. Phase II study of cixutumumab in combination with temsirolimus in pediatric patients and young adults with recurrent or refractory sarcoma: A report from the Children’s Oncology Group. Pediatric Blood & Cancer. 2015 March;62(3):440–444. doi: 10.1002/pbc.25334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wald A. Sequential Analysis. Dover; 1947. [Google Scholar]
  53. Walters SJ, Bonacho I, Bortolami O, Flight L, Hind D, Jacques RM, Knox C, Nadin B, Rothwell J, Surtees M, et al. Recruitment and retention of participants in randomised controlled trials: A review of trials funded and published by the United Kingdom health technology assessment programme. BMJ Open. 2017:1–10. doi: 10.1136/bmjopen-2016-015276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wason JMS, Brocklehurst P, Yap C. When to keep it simple–adaptive designs are not always useful. BMC Medicine. 2019;17(1):1–7. doi: 10.1186/s12916-019-1391-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Whitehead J. On the bias of maximum likelihood estimation following a sequential test. Biometrika. 1986;73(3):573–581. doi: 10.1093/biomet/73.3.573. [DOI] [Google Scholar]
  56. Whitehead J. The design and analysis of sequential clinical trials. Wiley; 1997. [DOI] [PubMed] [Google Scholar]
  57. Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019;20(2):273–286. doi: 10.1093/biostatistics/kxx069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Yoon HH, Foster NR, Meyers JP, Steen PD, Visscher DW, Pillai R, Prow DM, Reynolds CM, Marchello BT, Mowat RB, et al. Gene expression profiling identifies responsive patients with cancer of unknown primary treated with carboplatin, paclitaxel, and everolimus: NCCTG N0871 (alliance) Annals of Oncology. 2016;27(2):339–344. doi: 10.1093/annonc/mdv543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Yu SS, Athreya K, Liu SV, Schally AV, Tsao-Wei D, Groshen S, Quinn DI, Dorff TB, Xiong S, Engel J, et al. A phase II trial of AEZS-108 in castration- and taxane-resistant prostate cancer. Clinical Genitourinary Cancer. 2017;15(6):742–749. doi: 10.1016/j.clgc.2017.06.002. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File 1

RESOURCES