Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Feb 3.
Published in final edited form as: Stata J. 2018 Jun 1;18(2):416–431. doi: 10.1177/1536867X1801800208

Group sequential clinical trial designs for normally distributed outcome variables

Michael J Grayling 1, James M S Wason 2, Adrian P Mander 3
PMCID: PMC7612318  EMSID: EMS140912  PMID: 35125974

Abstract

In a group sequential clinical trial, accumulated data are analyzed at numerous time points to allow early decisions about a hypothesis of interest. These designs have historically been recommended for their ethical, administrative, and economic benefits. In this article, we first discuss a collection of new commands for computing the stopping boundaries and required group size of various classical group sequential designs, assuming a normally distributed outcome variable. Then, we demonstrate how the performance of several designs can be compared graphically.

Keywords: st0529, doubletriangular, haybittlepeto, innerwedge, powerfamily, triangular, wangtsiatis, clinical trial design, group sequential

1. Introduction

Parallel group randomized controlled trials are typically conducted by recruiting a fixed number of individuals and allocating each to receive one of two treatments, ultimately testing a prespecified hypothesis. Since Wald (1947) published his work on the sequential probability ratio test, there has been substantial interest in trial designs that allow hypotheses to be tested multiple times during the trial. With this approach, the trial may be stopped early if the data so suggest. This leads to patient exposure to inferior treatments being limited; by helping to lower the expected required sample size, researchers can often reduce the cost of a trial.

Armitage (1975) was responsible for much of the early use of such methods in medicine. However, his and the other initial approaches were fully sequential, with data analyzed after every patient. While this may seem desirable, it is impractical. Thus, this methodology did not gain general acceptance. The pivotal moment in this field came with the work of Pocock (1977), who provided a clear way of determining group sequential designs with desired type I and type II error rates. In a group sequential design, a hypothesis is analyzed multiple times during an ongoing trial but—as the name suggests—only after groups of certain sizes have been assessed. This allows the majority of the benefits of a fully sequential approach to be retained while also making the design feasible in practice.

Since Pocock (1977), group sequential designs have been researched extensively and used regularly in clinical trials. Today, methodology is well established for designing group sequential trials with normal, binary, and survival endpoints. Approaches are available to design trials with unknown variance, to design trials with multiple arms, or to optimize a design’s features. For a detailed discussion of available methods, see Whitehead (1997) or Jennison and Turnbull (2000).

In this article, we focus on the design of two-treatment group sequential trials with a normally distributed outcome variable, but we note that asymptotically other endpoint types can be treated with the same normal test statistics. We proceed by summarizing the statistical theory behind group sequential methodology. Following this, we detail our new commands and provide several examples of their use.

2. Statistical theory

We consider a randomized two-arm group sequential trial design with up to L planned analyses. We index one arm by 0 and the other by 1. Often, it will be the case that arm 0 is a control and arm 1 an experimental treatment, but this may not always be true. We assume that the lth analyses takes place after n 0l = ln and n 1l = rln patients have been randomized to arms 0 and 1, respectively. Possible extensions to this framework are discussed in section 8. Thus, r is the ratio of patients allocated to arm 1 relative to arm 0, and we refer to n as the group size. The outcome from patient i in arm d in stage l, Ydli , is assumed to be distributed as follows:

YdliN(μd,σd2)

Thus, we are assuming that the variance in response of both treatments is known.

Our ultimate goal is to make inference about the difference in the average treatment effect of arms 0 and 1. To this end, we define τ = μ 1μ 0, and at each interim analysis l, we compute the test statistic

Zl=(1n1lj=1li=1rnY1ji1n0lj=1li=1nY0ji)Il1/2

with

Il=(σ02n0l+σ12n1l)1=(σ02ln+σ12rln)1 (1)

being the information for this analysis. It can be shown that {Z 1,...,ZL } have, for the parameter of interest τ with information levels {I1 ,..., IL }, what has been referred to as the canonical joint distribution (Jennison and Turnbull 2000). That is,

  • (Z1 ,..., ZL ) is multivariate normal;

  • E(Zl)=τIl1/2,l=1,,L ; and

  • cov(Zl1 , Zl2 ) = (I l 1 /I l 2 )1/2,1 ≤ l1 ≤ l2 ≤ L.

Using this, we can determine the operating characteristics of a group sequential design with any choice of stopping boundaries by using multivariate normal integration as described in Jennison and Turnbull (2000) and Wason (2015a). This allows the use of numerical optimization routines to determine suitable sample sizes and stopping boundaries. The particular type of boundaries to use depends on the chosen hypothesis testing framework. Therefore, in the following sections, we discuss several established methods for two-sided and then one-sided tests.

3. Two-sided tests

3.1. Stopping rules and operating characteristics

In a two-sided test, we assess whether there is significant evidence of a difference in the mean responses of the two treatment arms. That is, we test

H0:τ=0H1:τ0

Here a group sequential trial design is characterized by stopping boundaries a 1,...,aL and r 1,...,rL , with 0 ≤ al < rl for l = 1,...,L – 1, and aL = rL , and the following stopping rules at analyses l = 1,...,L:

  • if |Zl | ≥ rl , stop and reject H 0 ;

  • if |Zl | < al , stop and do not reject H 0; and

  • otherwise, continue to stage l + 1.

The choice aL = rL ensures termination after analysis L while also guaranteeing a conclusion is made about H 0 .

Then, the probability of rejecting H 0 for any τ, given n, is

P(RejectH0τ)=l=1LP(Reject H0atstagelτ)=P(|Z1|r1τ)  +l=2LP(a1|Z1|<r1,,al1|Zl1|<rl1,|Zl|rlτ)

Similarly, the probability of not rejecting H 0 for any τ is

P(Accept H0τ)=l=1LP(Accept H0 at stage lτ)=P(|Zl|<alτ)  +l=2LP(a1|Z1|<r1,,al1|Zl1|<rl1,|Zl|<alτ)

Using the above, we can calculate the expected sample size for any τ as

E(Nτ)=l=1L{P(RejectH0 at stage lτ)+P(Accept H0 at stage lτ)}(ln+rln)

As discussed earlier, each of these probabilities can be computed using multivariate normal integration. Explicitly, defining

Λl=(cov(Z1,Z1)cov(Z1,Zl)cov(Zl,Z1)cov(Zl,Zl))Il=(I1,,Il)T

then, for example,

P(RejectH0 at stage 2τ)=r1a1r2ϕ{x,τI21/2,Λ2}dx2dx1+-r1-a1r2ϕ{x,τI21/2,Λ2}dx2dx1+a1r1r2ϕ{x,τI21/2,Λ2}dx2dx1+a1r1r2ϕ{x,τI21/2,Λ2}dx2dx1

Here x = (x 1,x 2) T , the square root of the vector I 2 is taken in an elementwise manner, and φ{x, μ, Λ} is the probability density function of a multivariate normal distribution with mean μ and covariance matrix Λ, evaluated at vector x. In all the commands presented here, these integrals are evaluated using the Mata function pmvnormal_mata() (Grayling and Mander Forthcoming).

With the above specifications, all that remains is a method for determining stopping boundaries and an associated required sample size such that P (Reject H 0 | 0) ≤ α and P (Reject H 0 | ±δ) ≥ 1 – β for clinically relevant difference δ > 0 and desired type I and type II error rates α and β. Much of the group sequential clinical trial design literature has focused upon this problem. In the following sections, we discuss several options available with our commands.

3.2. Early stopping to reject H0

Much of the early work on group sequential trial design focused on two-sided tests with early stopping only to reject H 0 . That is, with al =0 for l = 1,..., L – 1. In particular, Haybittle (1971) and Peto et al. (1976) suggested a simple set of boundaries with rl =3 for l = 1,...,L – 1. The final critical boundary rL is then determined to ensure an overall type I error rate of α. Following the determination of rL , a one-dimensional numerical search is used to ascertain the exact required group size n for power of 1 – β when τ = ±δ, treating n as a continuous quantity.

Haybittle and Peto’s procedure is advantageous in that it is a simple one, while its wide-stopping boundaries mean that early stopping is unlikely, which is a desirable property in some instances to help increase data accumulation, with termination only in the case of extreme disparities in treatment performance. However, trialists will often desire stopping boundaries that help to substantially reduce the expected sample size when H 0 is not true. For this, Wang and Tsiatis (1987) suggested the following family of stopping boundaries, indexed by a parameter Ω:

rl=CWT(l/L)Ω1/2

Their procedure encompasses the popular Pocock (1977) and O’Brien and Fleming (1979) boundaries by taking Ω = 0.5 or Ω = 0, respectively. In this approach, a numerical search is used for any chosen Ω to determine the value of CWT that implies the correct type I error rate α. Following this, as with Haybittle and Peto’s design, a further search is then used to ascertain the required sample size for the power constraint. In general, as Ω increases, the maximum sample size increases, but the expected sample size for larger values of |τ | decreases.

Later, we present the commands haybittlepeto and wangtsiatis for determining the stopping boundaries and required sample size of these designs for any choice of L, σ02 , σ12 , δ, α, β, and Ω.

3.3. Early stopping to reject and not reject H0

The above designs deal well with the issue of ethics in two-sided clinical trials, namely, the desire to stop early when the difference between treatments is substantial. However, there are also often sound reasons to desire early stopping when it is clear there is no detectable treatment difference; these reasons are usually based around reducing the cost of a trial. These are trial designs with not all al = 0, l = 1,..., L – 1. Pampallona and Tsiatis (1994) described a one-parameter family of such trial designs, again indexed by a shape parameter Ω, that has been referred to as the power family of inner wedge designs. Explicitly,

rl=Cr(l/L)Ω1/2al=δIl1/2Ca(l/L)Ω1/2

The final information level is then

IL=(Ca+Cr)2δ2

to ensure aL = rL as desired. A two-dimensional numerical search is used to determine the values of Ca and Cr that provide the desired type I and type II error rates given choices for L, α, β, and Ω. With these values identified, the final required information level IL is used to determine the exact required group size n through (1). As in the procedure of Wang and Tsiatis (1987) above, the inclusion of the parameter Ω allows a large range of designs to be determined, with varying performance in terms of their expected sample sizes. In section 6, we will see how these performances can be examined graphically.

Alternatively, Whitehead and Stratton (1983) and Whitehead (1997) proposed an approach for the determination of a group sequential clinical trial design for a two-sided test with early stopping to not reject H 0, termed the double triangular test. Specifically, they demonstrated that a design with

rl={2δ˜log(1α)0.583(ILL)1/2+δ˜4lLIL}1Il1/2al=max[{2δ˜log(1α)+0.583(ILL)1/2+3δ˜4lLIL}1Il1/2,0]

where

δ˜=2z1α/2δz1α/2+z1β

and

IL=[{4(0.583)2L+8log(1α)}1/22(0.583)L1/2]21δ˜

would approximately attain a type I error rate of α when τ = 0 and a type II error rate of β when τ = ±δ.

Later, we discuss our commands innerwedge and doubletriangular for determining these designs.

4. One-sided tests

4.1. Stopping rules and operating characteristics

In a one-sided test, we assess whether, without loss of generality, the mean response on treatment 1 is significantly larger than that on treatment 0. That is, we test

H0:τ0H1:τ>0

A group sequential trial design of this type is characterized by stopping boundaries f 1,..., fL and e 1,...,eL ,with fl < el for l = 1,..., L – 1and fl = eL , and the following stopping rules at analyses l = 1,...,L:

  • if Zl el , stop and reject H 0;

  • if Zl < fl , stop and do not reject H0; and

  • otherwise, continue to stage l + 1.

Again, the choice fL = eL is to ensure termination after analysis L and to guarantee a conclusion is drawn about H 0.

Now, the probability of rejecting H 0 for any τ, given n, becomes

P(Reject H0τ)=l=1LP(Reject H0 at stage lτ)=P(Zlelτ)+l=2LP(f1Z1<e1,,fl1Zl1<el1,Zlelτ)

Similarly, the probability of not rejecting H 0 for any τ is

P(Accept H0τ)=l=1LP(Accept H0 at stage lτ)=P(Zl<flτ)+l=2LP(f1Z1<e1,,fl1Zl1<el1,Zl<flτ)

As before, the expected sample size for any τ is given by

E(Nτ)=l=1L{P(RejectH0 at stage lτ)+P(Accept H0 at stage lτ)}(ln+rln)

Moreover, these probabilities can again be computed using multivariate normal integration. Using our notation from earlier, we have, for example,

P(Reject H0 at stage 2τ)=f1e1e2ϕ(x,τI21/2,Σ2)dx2dx1

In some situations, a one-sided test will be more appropriate because departures from H 0 in one direction are implausible. Alternatively, it may be the case that we are interested in directly testing the superiority of one treatment over another. Consequently, much research has gone into determining designs that will have desired operating characteristics (a type I error rate of α when τ = 0 and a type II error rate of β when τ = δ) and favorable performance in terms of the expected sample size. Below we discuss two popular methods that are available for implementation with our commands.

4.2. Power family of one-sided designs

In addition to their power family of inner wedge designs, Pampallona and Tsiatis (1994) also detailed a one-parameter family of designs for one-sided tests, with boundaries given by

el=Ce(l/L)Δ1/2fl=δIl1/2Cf(l/L)Ω1/2

As before, taking a final information level of

IL=(Ce+Cf)2δ2

ensures that fL = eL as desired, and a two-dimensional grid search can be used to determine the appropriate values of Ce and Cf . Our command powerfamily is available to perform these computations.

4.3. Triangular test

Whitehead and Stratton (1983) and Whitehead (1997) also proposed a triangular test for one-sided group sequential clinical trial designs. Specifically, they proposed

el={2δ˜log(12α)0.583(ILL)1/2+δ˜4lLIL}1Il1/2fl={2δ˜log(12α)+0.583(ILL)1/2+3δ˜4lLIL}1Il1/2

with

δ˜=2z1α/2δz1α/2+z1β

and

IL=[{4(0.583)2L+8log(12α)}1/22(0.583)L1/2]21δ˜

demonstrating this design would approximately attain the desired operating characteristics.

This design has proven popular with trialists because of the speed with which it can be calculated and also because of its strong performance in terms of its expected sample sizes (Wason and Jaki 2012). Our command triangular determines this design.

5. The commands

5.1. Syntax

In this section, we detail the syntax of our six discussed commands, which are all declared as r-class:

  • doubletriangular, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) performance twoway_options]

  • haybittlepeto, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) performance twoway_options]

  • innerwedge, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) omega(real) performance twoway_options]

  • powerfamily, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) omega(real) performance twoway_options]

  • triangular, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) performance twoway_options]

  • wangtsiatis, sigma(numlist) [l(integer) delta(real) alpha(real) beta(real) ratio(real) omega(real) performance twoway_options]

5.2. Options

Here the prescribed options denote the following:

  • sigma(numlist) specifies the standard deviation of the responses in arms 0 and 1, σ0 and σ1. This can be either of length two, containing the assumed values of these two parameters, or of length one, implying σ0 = σ1. All values must be strictly positive. sigma() is required.

  • 1(integer) specifies the maximum number of allowed stages in the design, L. integer must be greater than or equal to 2. The default is l(3).

  • delta(real) specifies the clinically relevant difference at which to power, δ. That is, the power of 1 - β is provided when τ = δ. real must be strictly positive. The default is delta(0.2).

  • alpha(real) specifies the desired overall type I error rate, α. That is, it is the two-sided or one-sided type I error rate according to the chosen command. real must be strictly between 0 and 1. The default is alpha(0.05).

  • beta(real) specifies the desired type II error rate, β. real must be strictly between 0 and 1. The default is beta(0.2).

  • ratio(real) specifies the desired ratio r of the sample sizes between arms 0 and 1. real must be strictly positive. The default is ratio(1).

  • omega(real) specifies the shape parameter of the boundaries of the power family and Wang–Tsiatis designs. real must be less than 0.5. The default is omega(0.25).

  • performance specifies that the performance of the identified design, that is, its expected sample size and power curves, should be determined and plotted.

  • twoway_options specifies additional options to pass to twoway.

5.3. Stored results

Attainable with return list for all six commands are the determined exact required group size n (r(n)) and the stopping boundaries a, r, e,and f as appropriate (for example, r(a)). In addition, the vector of information levels I (r(I)), the covariance matrix Λ (r(Lambda)), and a vector summarizing the performance of the design (r(performance)),

{P(Reject H00),E(N0),P(Reject H0δ),E(Nδ),maxτE(Nτ),maxN}T

are available.

Note that in all of these commands, required one-dimensional numerical searches are performed using a purpose-built implementation of Brent’s (1973) algorithm. In contrast, all two-dimensional numerical searches are carried out with the Nelder–Mead option in optimize().

6. Example 1: Two-sided tests

For our first example, we consider the case L = 2, α = 0.05, β = 0.2, δ = 0.2, σ 0 = σ 1 = 2, and r = 1 in two-sided testing.

We begin by demonstrating how doubletriangular can be used to determine the boundaries and sample size required by the double triangular test of Whitehead and Stratton (1983). The following code is used to determine the design:

. doubletriangular, l(2) alpha(0.05) beta(0.2) delta(0.2) sigma(2) ratio(1) 
2-stage Group Sequential Trial Design
The hypotheses to be tested are as follows:
 H0: tau = 0 H1: tau != 0,
with the following error constraints:
 P(Reject H0 | tau = 0) = .05,
 P(Reject H0 | tau = delta = .2) = 1 - .2.
  Double-triangular boundaries selected....................
...now determining design...................................
...design determined. Returning the results.................
...Exact required group sizes for each arm determined to be:
  875.5 and 875.5.
...Rejection boundaries r determined to be:
  (2.2,2.07).
...Acceptance boundaries a determined to be:
  (.73,2.07).
...Operating characteristics of the design are:
P(Reject H0 | tau = 0)   = .0531,
P(Reject H0 | tau = .2)  = .8003,
E(N | tau = 0)           = 2514.6,
E(N | tau = .2)          = 2550.5,
max_tau E(N | tau)       = 2716.4,
max N                    = 3501.9.

As can be seen, by default, the commands return an informative summary of the chosen testing framework, their progress, and the characteristics of the final design. Specifically, the first few lines describe the hypotheses that will be tested based on the chosen command. The input values of alpha and beta are then used in printing a summary of the desired operating characteristics. Several lines then follow that describe the progress of the command in completing its required computations. Next, the exact required number of patients in each arm and stage is printed. The rejection and acceptance boundaries then follow, along with a summary of the operating characteristics of the identified design. In this case, we see the design has a type I error rate of 0.053 and power of 0.800. This is a well-known limitation of the double triangular design; the type I and type II error requirements are only approximately achieved. The final four printed results summarize various important sample-size characteristics of the design: the expected sample size when τ = 0, that when τ = δ, the maximum expected sample size over all possible values of τ, and the maximum possible required sample size. We can see that in this case, while the maximum possible value of N is 3,501.9, we would be expected not to require more than 2,716.4 patients.

Being able to easily determine this design is useful. However, in most situations, it is unlikely that a researcher will have a single design in mind. Consequently, it is important to be able to determine the performance of several designs and compare them graphically. Here we demonstrate this for the power family of inner wedge designs. Using the following code, we find the designs for Ω = −0.5, −0.25, 0, and 0.25, saving their performance. Then, we combine the saved graphs to produce figure 1.

Figure 1. Comparison of the performance of several two-sided power family of inner wedge designs.

Figure 1

. quietly innerwedge, l(2) alpha(0.05) beta(0.2) delta(0.2) sigma(2) omega(-0.5)
  > ratio(1) performance saving(firstDesign) nodraw
  > title({&Omega} = -0.5) scale(0.75) scheme(sj)
. quietly innerwedge, l(2) alpha(0.05) beta(0.2) delta(0.2) sigma(2)
  > omega(-0.25) ratio(1) performance saving(secondDesign) nodraw
  > title({&Omega} = -0.25) scale(0.75) scheme(sj)
. quietly innerwedge, l(2) alpha(0.05) beta(0.2) delta(0.2) sigma(2) omega(0)
  > ratio(1) performance saving(thirdDesign) nodraw title({&Omega} = 0)
  > scale(0.75) scheme(sj)
. quietly innerwedge, l(2) alpha(0.05) beta(0.2) delta(0.2) sigma(2) omega(0.25)
  > ratio(1) performance saving(fourthDesign) nodraw
  > title({&Omega} = 0.25) scale(0.75) scheme(sj)
. graph combine firstDesign.gph secondDesign.gph thirdDesign.gph
  > fourthDesign.gph, ycommon scheme(sj)

We observe that increasing the value of Ω appears to reduce the expected sample required when |τ| is small. However, this comes at a cost to that required when |τ| is large.

7. Example 2: One-sided tests

For our next example, we consider one-sided testing. We take L = 3, α = 0.1, β = 0.1, δ = 0.25, σ 0 = 1, σ 1 = 2, and r = 2. Similarly to the above, we demonstrate how powerfamily can be used to determine several designs (Ω = −0.25, Ω = 0, and Ω = 0.25) and also compute the boundaries and sample size of the triangular test. Saving the performance of each, we then compare their performance graphically, creating figure 2 with the following code:

. quietly powerfamily, l(3) alpha(0.1) beta(0.1) delta(0.25) sigma(1, 2)
  > omega(-0.25) ratio(2) performance saving(firstDesign) nodraw
  > title(Power family with {&Omega} = -0.25) scale(0.75) scheme(sj)
  . quietly powerfamily, l(3) alpha(0.1) beta(0.1) delta(0.25) sigma(1, 2) 
  > omega(0) ratio(2) performance saving(secondDesign) nodraw
  > title(Power family with {&Omega} = 0) scale(0.75) scheme(sj)
  . quietly powerfamily, l(3) alpha(0.1) beta(0.1) delta(0.25) sigma(1, 2)
  > omega(0.25) ratio(2) performance saving(thirdDesign) nodraw
  > title(Power family with {&Omega} = 0.25) scale(0.75) scheme(sj)
  . quietly triangular, l(3) alpha(0.1) beta(0.1) delta(0.25) sigma(1, 2)
  > ratio(2) performance saving(fourthDesign) nodraw title(Triangular test)
  > scale(0.75) scheme(sj)
  . graph combine firstDesign.gph secondDesign.gph thirdDesign.gph
  > fourthDesign.gph, ycommon scheme(sj)

Figure 2. Comparison of the performance of several one-sided power family designs and the triangular test.

Figure 2

As reported previously, the triangular test does indeed fare well in comparison with the two identified power family designs. Explicitly, it has the lowest maximum expected sample size of the four designs. However, this does come at the cost of an increased maximum possible sample size, as evidenced by its performance for large |τ |.

8. Conclusion

It is important that any clinical trial control both its type I and type II error rates accurately. For this task, Stata introduced in version 13 the command power, which can be used for an extremely wide array of trial scenarios. However, as we have discussed, group sequential clinical trial designs are extremely popular with researchers, and to date, few commands are available in Stata for determining such designs. Notable exceptions include nstage (Barthel, Royston, and Parmar 2009; Bratton, Choodari-Oskooei, and Royston 2015) and nstagebin (Bratton 2014) for multiarm multistage trial designs with time-to-event and binary endpoints, respectively. In addition, the command simsam can determine the required sample size of certain group sequential clinical trial designs given stopping boundaries (Hooper 2013). However, there are no established commands for determining the boundaries and group size required by the wide array of group sequential trial designs for normally distributed outcomes discussed here.

Several extensions to our commands are now possible. We have assumed that the variance of the responses on both treatment arms is known prior to trial commencement. While this is a common assumption in the group sequential design literature, it is often a strong one to make. However, Whitehead, Valdés-Márquez, and Lissmats (2009) proposed a simple quantile substitution method for dealing with this problem that has been shown to generally control the type I error rate to the correct level (Wason, Mander, and Thompson 2012). This would no doubt be a useful addition to our commands. Moreover, we have assumed that the interim analyses are equally spaced in terms of the number of patient responses accrued in each arm. Gordon Lan and Demets (1983) proposed an error-spending approach to the design of group sequential trials that allows this assumption to be relaxed. Consequently, a command to use such methodology could prove useful to those seeking more complex designs.

Additionally, our focus has been on two-arm trials. Today, multiarm multistage trials are becoming increasingly popular. Therefore, extending these designs to allow for multiple experimental arms would be advantageous. Finally, there have now been several proposals for the determination of optimal or near-optimal group sequential designs (see, for example, Wason and Jaki [2012]; Wason, Mander, and Thompson [2012]; and Wason [2015b]). To allow trialists to maximize the efficiency gains made by using a group sequential design, the establishment of commands for determining such designs would be highly advantageous.

Regardless of these possible expansions, our commands can be used to determine stopping boundaries and exact required group sizes and also used to compare the performance of a selection of designs. Consequently, they should prove useful to those seeking to exploit the efficiencies of a group sequential design while working in Stata.

9. Acknowledgments

Michael J. Grayling is supported by the Wellcome Trust (grant number 099770/Z/12/Z). James M. S. Wason is supported by the National Institute for Health Research Cambridge Biomedical Research Centre (grant number MC_UP_1302/6). Adrian P. Mander is supported by the Medical Research Council (grant number MC_UP_1302/2).

Contributor Information

Michael J. Grayling, Email: mjg211@cam.ac.uk, Hub for Trials Methodology Research, MRC Biostatistics Unit, Cambridge, UK.

James M. S. Wason, Email: james.wason@mrc-bsu.cam.ac.uk, Hub for Trials Methodology Research, MRC Biostatistics Unit, Cambridge, UK.

Adrian P. Mander, Email: adrian.mander@mrc-bsu.cam.ac.uk, Hub for Trials Methodology Research, MRC Biostatistics Unit, Cambridge, UK.

References

  1. Armitage P. Sequential Medical Trials. 2nd ed. Blackwell; Oxford: 1975. [Google Scholar]
  2. Barthel FM-S, Royston P, Parmar MKB. A menu-driven facility for sample-size calculation in novel multiarm, multistage randomized controlled trials with a time-to-event outcome. Stata Journal. 2009;9:505–523. [Google Scholar]
  3. Bratton D. Statistical Software Components. Department of Economics; Boston College: 2014. nstagebin: Stata module to perform sample size calculation for multi-arm multi-stage randomised controlled trials with binary outcomes; S457911. https://ideas.repec.org/c/boc/bocode/s457911.html . [Google Scholar]
  4. Bratton DJ, Choodari-Oskooei B, Royston P. A menu-driven facility for sample-size calculation in multiarm, multistage randomized controlled trials with time-to-event outcomes: Update. Stata Journal. 2015;15:350–368. [Google Scholar]
  5. Brent RP. Algorithms for Minimization without Derivatives, chap 4. Prentice Hall; Englewood Cliffs, NJ: 1973. An algorithm with guaranteed convergence for finding a zero of a function. [Google Scholar]
  6. Gordon Lan KK, Demets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
  7. Grayling MJ, Mander AP. Calculations involving the multivariate normal and multivariate t distributions. Stata Journal. Forthcoming. [Google Scholar]
  8. Haybittle JL. Repeated assessment of results in clinical trials of cancer treatment. British Journal of Radiology. 1971;44:793–797. doi: 10.1259/0007-1285-44-526-793. [DOI] [PubMed] [Google Scholar]
  9. Hooper R. Versatile sample-size calculation using simulation. Stata Journal. 2013;13:21–38. [Google Scholar]
  10. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC; Boca Raton, FL: 2000. [Google Scholar]
  11. O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–556. [PubMed] [Google Scholar]
  12. Pampallona S, Tsiatis AA. Group sequential designs for one-sided and two-sided hypothesis testing with provision for early stopping in favor of the null hypothesis. Journal of Statistical Planning and Inference. 1994;42:19–35. [Google Scholar]
  13. Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. British Journal of Cancer. 1976;34:585–612. doi: 10.1038/bjc.1976.220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–199. [Google Scholar]
  15. Wald A. Sequential Analysis. Wiley; New York: 1947. [Google Scholar]
  16. Wang SK, Tsiatis AA. Approximately optimal one-parameter boundaries for group sequential trials. Biometrics. 1987;43:193–199. [PubMed] [Google Scholar]
  17. Wason J. In: Modern Adaptive Randomized Clinical Trials: Statistical and Practical Aspects. Sverdlov O, editor. Chapman & Hall/CRC; Boca Raton, FL: 2015a. Multi-arm multi-stage designs for clinical trials with treatment selection; pp. 389–410. [Google Scholar]
  18. Wason JMS. OptGS: An R package for finding near-optimal group-sequential designs. Journal of Statistical Software. 2015b;66(2):1–13. [Google Scholar]
  19. Wason JMS, Jaki T. Optimal design of multi-arm multi-stage trials. Statistics in Medicine. 2012;31:4269–4279. doi: 10.1002/sim.5513. [DOI] [PubMed] [Google Scholar]
  20. Wason JMS, Mander AP, Thompson SG. Optimal multistage designs for randomised clinical trials with continuous outcomes. Statistics in Medicine. 2012;31:301–312. doi: 10.1002/sim.4421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Whitehead J. The Design and Analysis of Sequential Clinical Trials Rev. 2nd ed. Wiley; Chichester, UK: 1997. [Google Scholar]
  22. Whitehead J, Stratton I. Group sequential clinical trials with triangular continuation regions. Biometrics. 1983;39:227–236. [PubMed] [Google Scholar]
  23. Whitehead J, Valdés-Márquez E, Lissmats A. A simple two-stage design for quantitative responses with application to a study in diabetic neuropathic pain. Pharmaceutical Statistics. 2009;8:125–135. doi: 10.1002/pst.341. [DOI] [PubMed] [Google Scholar]

RESOURCES