Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2016 Mar 24.
Published in final edited form as: Biometrika. 2013 Dec 1;100(4):985–996. doi: 10.1093/biomet/ast035

Simultaneous confidence intervals that are compatible with closed testing in adaptive designs

D MAGIRR 1, T JAKI 1, M POSCH 2, F KLINGLMUELLER 2
PMCID: PMC4806862  EMSID: EMS67492  PMID: 27019516

Summary

We describe a general method for finding a confidence region for a parameter vector that is compatible with the decisions of a two-stage closed test procedure in an adaptive experiment. The closed test procedure is characterized by the fact that rejection or nonrejection of a null hypothesis may depend on the decisions for other hypotheses and the compatible confidence region will, in general, have a complex, nonrectangular shape. We find the smallest cross-product of simultaneous confidence intervals containing the region and provide computational shortcuts for calculating the lower bounds on parameters corresponding to the rejected null hypotheses. We illustrate the method with an adaptive phase II/III clinical trial.

Keywords: Closed testing principle, Combination test, Conditional error, Multiple comparisons, Simultaneous inference

1. Introduction

For experiments designed to make inference about a parameter vector θ = (θ1, … , θK), it is common to find confidence intervals for all of the individual θk such that the simultaneous coverage probability is at least 1 − α. Sometimes, though, an experimenter will only attempt to assert that an individual parameter exceeds a specific value, say θk > δk. If this cannot be achieved in such a way that the probability of making at least one incorrect rejection in a family of hypotheses Hk = {θkδk} (k = 1, … , K) is no greater than α, the experimenter will not assert anything about θk. The latter method of inference is used in so-called closed test procedures (Marcus et al., 1976), and its advantage is often greater power.

For experiments conducted in a single stage, Hayter & Hsu (1994) showed how simultaneous 100(1 − α)% confidence intervals can be constructed to be compatible with some commonly used closed test procedures, in the sense that a null hypothesis Hk is rejected at familywise level α if and only if the confidence interval for θk excludes all values for which Hk is true. Often, these intervals are scarcely more informative than the test decisions. For example, for one-sided problems where larger parameter values are more beneficial, no 100(1 − α)% lower confidence bound for any individual θk can exceed δk unless all hypotheses H1, … , HK can be rejected at familywise level α.

In this article we derive confidence intervals for adaptive experiments. Our motivating example is a seamless phase II/III clinical trial, although the method is not limited to this setting. Such trials consist of a first stage in which K experimental treatments, indexed by T1= {1, … , K}, are compared with a common control and, after an interim analysis, a second stage in which only a subset of treatments, indexed by T2T1, are compared with the control. The state-of-the-art methodology for this problem (Bauer & Kieser, 1999; Posch et al., 2005; Bretz et al., 2009) is a hybrid of the closure principle of Marcus et al. (1976) and a p-value combination which goes back to Fisher (1932). This methodology allows any subset of treatments to be chosen at interim, based on all trial data and external factors. Other adaptations, such as sample size re-estimation, are also possible. A serious concern, though, is that there is no established method for constructing confidence intervals. As emphasized in the International Conference on Harmonisation’s E9 guideline (ICH E9 Expert Working Group, 1999, p. 1932), ‘Estimates of treatment effect should be accompanied by confidence intervals, whenever possible, and the way in which these will be calculated should be identified.’

Posch et al. (2005) proposed 100(1 − α)% simultaneous confidence intervals following such a trial. Unfortunately, their intervals are not guaranteed to be compatible with the closed test procedure. Here, we construct intervals that are compatible. As in the one-stage case, an inevitable shortcoming of these intervals is that they are not always substantially more informative than the original test decisions. We will show that this problem is mitigated to some extent by the adaptive nature of the experiment.

2. Fundamental Methodology

2·1. Closure principle

The closure principle of Marcus et al. (1976) is a general method for multiple hypothesis testing. A formal description is given in Finner & Strassburger (2002), and we adopt similar notation here. Let P={Pθ:θΘ} be a family of probability measures defined on a common sample space (Ω, F), where Θ is a multi-dimensional parameter space. Suppose that we wish to test a family of null hypotheses H={Hi:iI}, where Hi ⊂ Θ for each i in some index set I. Let ψ={ψi:iI} denote a multiple test of H, with each component ψi taking value 0 or 1 corresponding to nonrejection or rejection of Hi, respectively. It is often desirable to ensure that

supθΘPθ(iI(θ){ψi=1})α, (1)

where I(θ)={iI:θHi} is the index set of true hypotheses under θ*. In other words, the probability of rejecting at least one true null hypothesis is bounded by α. This is known as strong control of the familywise error rate. The closure principle can be used to ensure (1). We are required to find, for each II such that HI=iIHi is nonempty, a local level-α test φI for the intersection hypothesis HI; that is, we require

supθHIPθ(φI=1)α, (2)

where φI takes values in {0, 1} with the usual interpretation. If we define ψi=minI:HI,HIHi(φI), then (1) holds. This can be very useful, as in many applications it is easy to find tests satisfying (2), whereas validating (1) directly is hard.

2·2. Combination test

Fisher (1932) discussed combining independent p-values to test a single null hypothesis. For convenience and brevity, we will only consider two-stage designs. We define a p-value combination function Q: [0, 1]2 ↣ [0, 1] that is left-continuous and nondecreasing in both its arguments and is uniformly distributed provided that both arguments are themselves independent and uniformly distributed. An example is

Q(u,v)=1Φ[212{Φ1(1u)+Φ1(1v)}], (3)

where Φ denotes the standard normal distribution function.

Such a combination function lends itself to a two-stage adaptive closed test, ψ, for a family of null hypotheses, H. An important application, discussed in Bretz et al. (2009), is a seamless phase II/III confirmatory clinical trial. We henceforth restrict attention to a parameter θ = (θ1, … , θK) taking values in parameter space Θ=K and a family of null hypotheses H={Hk:kT1} where T1 = {1, … , K} and Hk = {θkδk} (kT1) for some constants δ1,,δK. The θk (kT1) might correspond to the mean effects of K different treatments, for example. By defining local tests φI (IT1) via a combination function Q, it is possible to make data-dependent modifications to the trial design at an interim analysis (cf. Bauer & Kieser, 1999; Hommel, 2001; Brannath et al., 2002). For instance, attention can be focused on a subset T2T1 of the initial hypotheses of interest; changes can be made to sample sizes, allocation ratios, etc.

2·3. Two-stage closed test procedure

Assume that the full first-stage trial data are represented by a random vector Xn with distribution function G(x; θ). Prior to starting the trial, one must specify a combination function Q and, for each IT1, a first-stage test of HI=iI Hi with an associated p-value function pI(1):n[0,1] that satisfies supθHI{pI(1)(x)u}dG(x;θ)u for all u[0,1]. The second-stage design is unspecified.

At the interim analysis, the experimenter defines a second-stage design, d, by choosing a subset of the original hypotheses, indexed by T2T1, to continue studying in the second stage, along with second-stage sample sizes and, for each IT1, a second-stage hypothesis test for HI. See below for a proposal for choosing second-stage tests for HI where IT2. We assume that the design d is allowed to depend on the unblinded first-stage data x without prespecifying an adaptation rule. Let Y denote the data collected at the second stage, taking values in m, and let pI,x,d(2)(y) (IT1) denote the p-value functions of the second-stage tests. Because the tests used in the second stage depend on the first-stage data x and the chosen design d, the p-value functions will in general depend on both.

Let Fx,d(y; θ) denote the distribution function of the second-stage data, given the chosen design d and interim data x. We assume that for all x, d and IT1, the second-stage p-values pI,x,d(2) satisfy supθHI{pI,x,d(2)(y)u}dFx,d(y;θ)u for all u ∈ [0, 1]. The distribution Fx,d is assumed to be known, i.e., not merely specified up to a null set, for all x and d, a condition that can be formalized by assuming an appropriate regression model (Brannath et al., 2012). See § 3·2 for a numerical example.

At the final analysis, for each IT1, the test decision is φI = 1 if and only if Q{pI(1),pI,x,d(2)}α. As shown in Brannath et al. (2012), this combination test for HI controls the Type I error rate at level α.

We assume that only data for the hypotheses indexed by T2 are collected in the second stage and propose setting pI(2)=pIT2(2) for IT2, where we drop the indices x and d for simplicity and set p(2)=1 by convention. Such second-stage p-values have the required distribution under HIT2 and hence also under HI.

We emphasize that while Type I error control is guaranteed even if the second-stage design is initially open-ended, in the design of actual clinical trials it is crucial to perform detailed planning based on likely first-stage outcomes. The added flexibility is necessary because it is impossible to foresee all eventualities in extremely complex areas such as clinical drug development.

3. Confidence regions

3·1. Partitioning the parameter space

A standard approach to deriving a 100(1 − α)% confidence set for θ is to perform a level-α test of each elementary hypothesis {θ = θ*} (θ* ∈ Θ) and include all θ* corresponding to nonrejected hypotheses (see, e.g., Lehmann, 1986, p. 90). To ensure compatibility with closed testing, the key idea (Stefansson et al., 1988; Hayter & Hsu, 1994; Finner & Strassburger, 2002) is to partition the parameter space into disjoint regions

ΘI={θΘ:θiδi,iI;θi>δi,iT1\I}(IT1)

and apply different tests in each of the disjoint ΘI. If, for each IT1, we let {φI (θ*): θ* ∈ Θ} denote a family of tests with

infθΘPθ{φI(θ)=0}1α, (4)

where φI (θ*) takes values in {0, 1} with the usual interpretation, we can apply the following general result from Hsu (1996, p. 234).

Lemma 1. A level-100(1 − α)% confidence set for θ is

C=IT1[{θΘ:φI(θ)=0}ΘI]. (5)

Our aim is to find families of tests such that C is compatible with the two-stage closed test procedure. This requires us to augment our specification of pITj(j)(j=1,2;IT1) with a family of p-values {pITj(j)(θ):θΘ} where, under {θ = θ*}, the distribution of pI(1)(θ) and pIT2(2)(θ) meet conditions as outlined for pI(1) and pIT2(2) in § 2·3. Additionally, if we treat the data as fixed and view each family as a function pITj(j):Θ[0,1], then unless ITj=,pITj(j)(θ) is constant in all arguments θi such that iITj, and is left-continuous and nondecreasing in all arguments θi such that iITj, with pITj(j)(θ)=pITj(j) for any θ* such that θi=δi for all iITj. Furthermore, we assume that

limθi,iT2p(2)(θ)=1. (6)

Proposition 1. Inserted into (5), the following families of hypothesis tests give rise to a 100(1 − α)% confidence set for θ, denoted by C, that is compatible with the two-stage closed test procedure, i.e., ψk = 1 if and only if HkC = ∅: for ∅ ≠ IT1 and θ* ∈ Θ,

φI(θ)={1,Q{pI(1)(θ),pIT2(2)(θ)}α,0,Q{pI(1)(θ),pIT2(2)(θ)}>α,} (7)

and {φ(θ*): θ* ∈ Θ } is any family of tests satisfying (4).

Proof. See the Appendix.

There will be no unique collection of families of p-values satisfying the aforementioned distributional and monotonicity constraints. Rather, the families must be specified in a two-stage procedure in an analogous way to the p-values in § 2·3. As will become clear from the example below, for many commonly encountered scenarios and when ITj ≠ ∅, the choice of {pITj(j)(θ):θΘ} will be obvious from the choice of pITj(j). As a simple example, suppose that p{k}(j) is the p-value from a one-sided z-test of the null hypothesis {θkδk} using the stage-j data only. Then the natural choice for p{k}(j)(θ) is the one-sided p-value from a standard z-test of {θkθk} using the same stage-j data.

While for ITj ≠ ∅ there will often be a natural choice for pITj(j)(θ), it is unclear how φ(θ*) and p(2)(θ) should be chosen. A reasonable suggestion is given below.

Corollary 1. Define p(j)(θ)=pTj(j)(θ) for j = 1,2. The following is a 100(1 − α)% confidence region for θ that is compatible with the two-stage closed test procedure:

C1=IT1[θΘI:Q{pI(1)(θ),pIT2(2)(θ)}>α]. (8)

The properties of a region defined by (8) are best illustrated by a specific example.

3·2. Example

Posch et al. (2005) considered a clinical trial where three active treatments, indexed by T1 = {A, B, C}, are compared with a placebo using a two-stage adaptive design. The individual null hypotheses of interest are Hk = {θk ⩽ 0} (kT1), where θk = πkπ0 denotes the difference between the success probabilities of treatment k and placebo. Denote the observed success rate of treatment k in stage j by π^k,j(kT1{0};j=1,2), where treatment 0 corresponds to a placebo.

At the design stage, the inverse normal combination function (3) is specified and n1 = 140 first-stage patients are recruited to each treatment arm. Approximately, the θ^k,1=π^k,1π^0,1(kT1) are multivariate normal with E(θ^k,1)=θk,var(θ^k,1)={π^k,1(1π^k,1)+π^0,1(1π^0,1)}n1 and positive correlations. Based on this assumption, Simes (1986) tests are used for each intersection hypothesis; that is, p{k}(1)=1Φ[θ^k,1{var(θ^k,1)}12] for kT1 and, for |I|>1, pI(1)=minkIp{k}(1)IR(k,I), where R(k, I) denotes the rank of p{k}(1) among {p{i}(1):iI}. The natural way of augmenting these p-values is to define p{k}(1)(θ)=1Φ[(θ^k,1θk){var(θ^k,1)}12] for kT1 and pI(1)(θ)=minkIp{k}(1)(θ)IR(k,I,θ) for |I| > 1, where R(k, I, θ*) denotes the rank of p{k}(1)(θ) among {p{i}(1)(θ):iI}.

Suppose that the unblinded first-stage results are π^0,1=021,π^A,1=022, π^B,1=03 and π^C,1=036. The experimenter decides that treatments A and C are not to be considered in the second stage owing to lack of efficacy and safety concerns, respectively. A further n2 = 140 patients are recruited to both treatment B and placebo. A family of p-values with p{B}(2)(θ)=1Φ[(θ^B,2θB){var(θ^B,2)}12] is chosen, where θ^B,2=π^B,2π^0,2.

Now suppose that the second-stage results are π^0,2=019 and π^B,2=031. The p-values from the elementary hypotheses are p{A}(1)=0419, p{B}(1)=00412, p{C}(1)=000241 and p{B}(2)=000961. Therefore p{A,B,C}(1)=3p{C}(1), p{A,B}(1)=2p{B}(1) and p{B,C}(1)=2p{C}(1). As minIT1,BIQ(pI(1),pB(2))0025, HB can be rejected at familywise level 0·025. Both HA and HC fail to be rejected, as Q{p{k}(1),1}=1 for k = A,C. A compatible 97·5% confidence region for θ is given by

IT1{θΘI:Q{pI(1)(θ),pB(2)(θ)}>0.025}, (9)

where p(1)(θ) is defined as pT1(1)(θ) for all θ* ∈ Θ.

The region (9) will have a complicated three-dimensional shape. However, in terms of making inference on θB, its crucial features can be seen by taking two cross-sections, as displayed in Fig. 1. As pI(1)(θ) is nondecreasing in θC for all IT1, we know that for any γ ∈ (-∞, 0), the cross-section at θC=γ is contained in the cross-section at θC=0. Similarly, for any γ ∈ (0, ∞), the cross-section at θC=γ is contained in the limit of the cross-section of the region as θC. One can see immediately from Fig. 1 that for any ϵ > 0, the 97·5% confidence region fails to exclude all parameter vectors θ* such that θBϵ. In other words, the lower confidence bound on θB provides no more information than the decision of the closed test procedure.

Fig. 1.

Fig. 1

Cross-sections of confidence regions of the form (9) for making inference on the second-stage parameter of interest, θB, in the example of § 3·2: (a) two cross-sections of the 97·5% confidence region; (b) two cross-sections of the 95% confidence region.

For confidence intervals that are compatible with single-stage closed test procedures (Hayter & Hsu, 1994; Strassburger & Bretz, 2008; Guilbaud, 2008), a necessary condition for obtaining informative lower confidence bounds for parameters corresponding to the rejected null hypotheses is that ψk =1 for all kT1. In the adaptive setting, this is no longer a necessary condition. For example, repeating the above test procedure at level α=0·05, the compatible 95% confidence region analogous to (9) is also summarized in Fig. 1. Here it appears, and indeed can be verified by considering all values of θA, that there does exist some ϵ > 0 such that the confidence region excludes all parameter vectors θ* for which θBϵ. We will show that for the two-stage adaptive setting, a necessary condition for informative lower confidence bounds on parameters corresponding to the rejected null hypotheses is that ψk =1 for all kT2. However, as can be seen from Fig. 1, this condition is not sufficient.

3·3. A two-stage, single-step confidence region

Posch et al. (2005) proposed the following 100(1 − α)% confidence region:

C2={θΘ:Q{pT1(1)(θ),pT2(2)(θ)}>α}. (10)

They note that the resulting confidence intervals are not compatible with the closed test procedure described in § 2·3 (Posch et al., 2005, p. 3702). Nevertheless, the region (10) can be used to generate an alternative multiple test. More generally, any 1 − α confidence set C generates a multiple test for a family of hypotheses H, whereby HkH is rejected if and only if HkC = ∅. This guarantees strong control of the familywise error rate (1). The multiple test generated by (10) can be thought of as single-step in the sense that rejection or nonrejection of a null hypothesis does not take into account the decision for any other hypothesis. If Hk is rejected, informative lower bounds will be available for θk regardless of the test decisions for all other hypotheses.

4. Computation of confidence intervals

4·1. Least-favourable parameter configurations

In the above example, marginal inference on θB was achieved by considering least-favourable parameter configurations for θk, kT1 \ {B}. This idea can be generalized to find 100(1 − α)% simultaneous confidence intervals containing (8) or (10).

Definition 1. For j = 1, 2, kT1 and ITj, the locally least-favourable jth-stage p-value function for Hk in ΘI, pk,I(j):[0,1], is defined for I ≠ ∅ as pk,I(j)(ϑ)=pI(j)(ξ), where ξ =(ξ1, … , ξK) with ξii for i ≠ k and ξk = ϑ. Additionally, for j = 1, 2,

pk,(j)(ϑ)=limξi,iTj\{k}pTj(j)(ξ)(ξk=ϑ). (11)

Proposition 2. The smallest Cartesian product of intervals, ×kT1(lk, ∞), that contains the confidence region (8) has lk = minIT1 lk,I, where for kI,

lk,I={(φI=1),sup{ϑ:Q{pk,I(1)(ϑ),pk,IT2(2)(ϑ)}α}(φI=0),} (12)

and for kI,

lk,I=max(δk,sup{ϑ:Q{pk,I(1)(ϑ),pk,IT2(2)(ϑ)}α}). (13)

Furthermore, these intervals are compatible with the two-stage closed test procedure, i.e., ψk = 1 if and only if Hk ∩ ×kT1(lk, ∞)=∅.

Proof. See the Appendix.

In general, to find each interval requires one-dimensional root finding for each IT1, a calculation that is O(2K). However, substantial shortcuts are available for reducing the computational burden.

4·2. Efficient computation of confidence bounds

There are two possible scenarios at the end of the closed test procedure: either ψk = 1 for all kT2, or at least one Hk (kT2) fails to be rejected. In the latter case, there exists some IT1 with IT2 ≠ ∅ such that for any kT2,

α<Q(pI(1),pIT2(2))=Q{pk,I(1)(δk),pk,IT2(2)(δk)}

and therefore lklk,Iδk. Due to the compatibility of the intervals with the closed test procedure, if ψk = 1, then lk = δk; if ψk = 0, then lk < δk.

If ψk =1 for all kT2, then lkδk for all kT2. Additionally, we can use the fact that for all kT2 and IT1 with IT2 ≠ ∅, we know from (12) and (13) that lk,I = ∞; so, when finding lk =minIT1 lk,I in Proposition 2, the minimum can be taken over a much smaller number of lk,I. The following algorithm finds the lower bounds for all parameters corresponding to the rejected hypotheses.

Step 1. Perform the closed test procedure. If ψk = 0 for some k′ ∈ T2, then lk = δk for ψk =1 and lk < δk for ψk =0. If ψk =1 for all kT2, go to Step 2.

Step 2. Find pM=maxIT1\T2pI(1). If T1 \ T2 = ∅, then pM =0.

Step 3. For kT2,

lk=max[δk,sup{ϑ:Q[max{pM,pk,(1)(ϑ)},pk,(2)(ϑ)]α}].

The cost of computing the intervals for θk (kT2) in Step 3 is linear in the number of parameters. Step 1 is O(2|T1|), but a shortcut of O(|T1|2) is given in Brannath & Bretz (2010). Step 2 is O(2|T1\T2|), but a shortcut of size |T1 \ T2| is available, provided there exists an ordering i1, … , ik of T1 \ T2 such that for each u ∈ {1, … , k}, pJ(1)pL(1) for all JL ⊆ {iu, … , ik} with iuJ. This is because we only have to check p{iu,,ik} for u =1, … , k. Many common multiple test procedures, such as those based on Dunnett (1955) tests or weighted Bonferroni tests, satisfy this condition, with the ordering i1, … , ik following the ordering of the univariate test statistics or the weighted elementary p-values (Brannath & Bretz, 2010).

4·3. Lower bounds for parameters corresponding to retained hypotheses

Consider kT2 such that ψk = 0. We know that lk < δk, and therefore we need only consider lk,I such that kI. However, since in general lk,I < ∞, finding the minimum such lower bound will still have a computational cost that is exponential in the number of parameters.

For kIT1 \ T2, we have pk,IT2(2)(ϑ)=pk,(2)(ϑ) and know from (11) and (6) that this is equal to 1. Many commonly used combination functions, including (3), have the property that v = 1 implies Q(u,v)=1. In this case, lk = −∞ for all kT1 \ T2.

4·4. Lower bounds for the two-stage single-step procedure

Posch et al. (2005) showed that the region (10) is contained in a rectangle, ×kT1(lk,), where

lk=sup{ϑ:Q{pk,(1)(ϑ),pk,(2)(ϑ)}α}. (14)

The computation of each interval requires only a one-dimensional search for a root, and overall computation will be linear in the number of parameters.

4·5. Example continued

Recall from § 3·2 that T2 = {B} and ψB = 1. Proceeding to Step 2 of the above algorithm, pM =0·419. In this case we need just one iteration in Step 3, because

Q[max{0.419,pB,(1)(0)},pB,(2)(0)]=0.0360>0.025,

and therefore the 97·5% confidence interval for θB is (0, ∞), consistent with Fig. 1. This example emphasizes that there is a price to pay for the additional power of the closed test as opposed to the single-step procedure of § 3·3 with, by (14),

lB=sup{ϑ:Q{pB,(1)(ϑ),pB,(2)(ϑ)}0.025}=0.0159.

While this agrees with the assertion θB > 0 in this specific case, it is invalid to claim it as a 97·5% lower confidence bound if the closed test procedure of § 2·3 had been planned. One can see that for any α > 0·036, the 100(1 − α)% confidence interval for treatment B that is compatible with the closed test procedure has a positive lower bound. For example, the 95% lower confidence bound is lB = 0·0112, consistent with Fig. 1. Again, if the region (10) had been specified pre-trial, the 95% lower confidence bound (14) would have been lB=00252.

5. Confidence bounds for closed tests based on the conditional error rate

Consider again the two-stage closed test procedure of § 2·3. As an alternative to combination tests, Koenig et al. (2008) used the conditional error approach (Proschan & Hunsberger, 1995) to derive local tests φI (IT1). The only difference is that instead of prespecifying a combination function Q and first-stage p-value pI(1), one must prespecify a measurable conditional error function AI:n[0,1] such that

supθHInAI(x)dG(x;θ)α

and, at the final analysis, φI =1 if and only if pIT2(2)AI(x).

To produce a compatible 100(1 − α)% confidence region for θ, each AI (IT1) must be augmented with a family of conditional error functions {AI(θ*) : θ* ∈ Θ} such that nAI(θ)(x)dG(x;θ)α and, for fixed xn, AI(θ*) is constant in all arguments θi with iI and is left-continuous and nonincreasing in all arguments θi with iI. Furthermore, AI (θ*)= AI for all θ* ∈ Θ such that θi=δi for iI. The second-stage p-values pIT2(2)(IT1) must be augmented with a family {pIT2(2)(θ):θΘ} as described in § 3·1.

Müller & Schäfer (2004) propose defining AI = supθ*∈HI Eθ*(ϕI | X), where ϕI is a pre-planned fixed sample level-α test for HI. In many situations the natural choice for AI(θ*) will be obvious from AI. For example, if ϕI is the decision function for a Dunnett (1955) test of HI = ⋂kI{θkδk}, then it is natural to choose AI (θ*) = Eθ*(ϕI,θ* | X) where ϕI,θ* is the decision function for a Dunnett test of kI{θkθk} which can be derived via a corresponding translation of the test statistics.

Using the arguments of Propositions 1 and 2, it can be shown that, analogously to (8), a compatible 100(1 − α)% confidence region for θ is

IT1{θΘI:pIT2(2)(θ)>AI(θ)},

where p(2)(θ) and A(θ*) are set equal to pT2(2)(θ) and AT1(θ*) respectively. Also, the largest compatible 100(1 − α)% confidence lower bounds are lk =minIT1 lk,I, where for kI,

lk,I={(φI=1),sup{ϑ:pk,IT2(2)(ϑ)Ak,I(ϑ)}(φI=0),}

and for kI, lk,I=max[δk,sup{ϑ:pk,IT2(2)(ϑ)Ak,I(ϑ)}] with Ak,I(ϑ) defined analogously to pk,I(1)(ϑ)(kT1;IT1) in Definition 1.

6. Concluding remarks

The lower confidence bounds (12)–(13) provide more information about the location of θ than the decisions of the closed test procedure of § 2·3. The utility of this additional information will depend strongly on the context. In practice, the primary concern will often be to find lower bounds for the components of θ corresponding to the rejected null hypotheses. As this can be achieved using an algorithm that is O(K2), application to large-scale simultaneous inference problems is, in principle, feasible. However, these lower bounds will only be informative if all hypotheses considered in the second stage of testing are rejected, and even this may be insufficient. In practice, therefore, the lower bounds (12)–(13) are only likely to be useful in relatively small-scale problems. Furthermore, in situations where informative lower confidence bounds are deemed to be more important than the possibility of rejecting as many individual null hypotheses as possible, it would be sensible to use the intervals (14) instead of applying the closed test procedure. For large-scale simultaneous inference problems, an approach based on controlling the false coverage-statement rate (Benjamini & Yekutieli, 2005) may be more appropriate than aiming for a high simultaneous coverage probability.

Extensions to more than two stages and to allow early rejection of hypotheses are straightforward with an appropriate combination function in place of (3). An open question is how best to choose φ(θ*) and p(2)(θ). The tests we use in region (8) are a natural choice but may not be the most powerful.

Acknowledgement

This work was supported by the National Institute for Health Research and the Austrian Science Fund. The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health.

Appendix

Proof of Proposition 1. With the assumptions in § 3·1, all tests of the form (7) satisfy condition (4), and therefore C is a 100(1 − α)% confidence set for θ. By the monotonicity conditions imposed on the p-values, we have pITj(j)(θ)pITj(j) for all θ* ∈ ΘI (j = 1,2; I ≠ ∅; IT1), so that ΘIC = ∅ if and only if Q(pI(1),pIT2(2))α. Therefore, ψ =1 if and only if minIT1,kI Q(pI(1),pIT2(2))α if and only if ⋃IT1,kI ΘIC = ∅. Since ⋃IT1,kI ΘI = Hk, we have compatibility.

Proof of Proposition 2. First, note the key property that pk,ITj(j)(ϑ)pITj(j)(θ) for all θ* ∈ ΘI with θkϑ(IT1;kT1;j=1,2).

To show that C1⊆×kT1 (lk, ∞), consider any θ* ∈ Θ \ ×kT1 (lk, ∞). We must have θ* ⊆ ΘI for some IT1 and θklk for some kT1. If kI, then θkmin(δk,lk,I), and (12) implies that αQ{pk,I(1)(θk),pk,IT2(2)(θk)}Q{pI(1)(θ),pIT2(2)(θ)}. The same inequality follows from lk,Iθk>δk and (13) if kI. Therefore, θ* ∉ C1 and C1 ⊆ ×kT1 (lk, ∞).

To show that no smaller interval (lk + ϵ, ∞) is possible for any ϵ > 0, we must find some θ* ∈ C1 with θk(lk,lk+ϵ). Consider a subset IT1 such that lk = lk,I and therefore Q{pk,I(1)(ϑ),pk,IT2(2)(ϑ)}>α for all ϑ > lk. If kI or, equivalently, lk < δk, take any θk(lk,min{δk,lk+ϵ}). If kI or, equivalently, lkδk, take any θk(lk,lk+ϵ). Now consider a parameter vector ξI,k=(ξ1I,k,,ξKI,k), where ξkI,k=θk, ξiI,k=δi for kiI, and ξiI,k>δi for iI ∪ {k}. All such parameter vectors ξI,k are contained in ΘI, and

α<Q{pk,I(1)(θk),pk,IT2(2)(θk)}=limξiI,k,iI{k}Q{pI(1)(ξI,k),pIT2(2)(ξI,k)}.

Thus there exists some such ξI,kC1 and hence C1 is not contained in this smaller product of intervals.

Finally, Hk ∩×kT1 (lk, ∞) = ∅ if and only if lk,Iδk for IT1. if and only if Q{pk,I(1)(δk), pk,IT2(2)(δk)}=Q{pI(1),pIT2(2)}α for IT1 and kI, if and only if ψk = 1.

References

  1. Bauer P, Kieser M. Combining different phases in the development of medical treatments within a single trial. Statist. Med. 1999;18:1833–48. doi: 10.1002/(sici)1097-0258(19990730)18:14<1833::aid-sim221>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  2. Benjamini Y, Yekutieli Y. False discovery rate controlling confidence intervals for selected parameters. J. Am. Statist. Assoc. 2005;100:71–80. [Google Scholar]
  3. Brannath W, Bretz F. Shortcuts for locally consonant closed test procedures. J. Am. Statist. Assoc. 2010;105:660–9. [Google Scholar]
  4. Brannath W, Gutjahr G, Bauer P. Probabilistic foundation of confirmatory adaptive designs. J. Am. Statist. Assoc. 2012;107:824–32. [Google Scholar]
  5. Brannath W, Posch M, Bauer P. Recursive combination tests. J. Am. Statist. Assoc. 2002;97:236–44. [Google Scholar]
  6. Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statist. Med. 2009;28:1181–217. doi: 10.1002/sim.3538. [DOI] [PubMed] [Google Scholar]
  7. Dunnett C. A multiple comparison procedure for comparing several treatments with a control. J. Am. Statist. Assoc. 1955;50:1096–121. [Google Scholar]
  8. Finner H, Strassburger K. The partitioning principle: a powerful tool in multiple decision theory. Ann. Statist. 2002;30:1194–213. [Google Scholar]
  9. Fisher RA. Statistical Methods for Research Workers. 4th ed. Oliver and Boyd; London: 1932. [Google Scholar]
  10. Guilbaud O. Simultaneous confidence regions corresponding to Holm’s stepdown procedure and other closed-testing procedures. Biomet. J. 2008;50:678–92. doi: 10.1002/bimj.200710449. [DOI] [PubMed] [Google Scholar]
  11. Hayter AJ, Hsu JC. On the relationship between stepwise decision procedures and confidence sets. J. Am. Statist. Assoc. 1994;89:128–36. [Google Scholar]
  12. Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biomet. J. 2001;43:581–9. [Google Scholar]
  13. Hsu JC. Multiple Comparisons: Theory and Methods. Chapman and Hall; London: 1996. [Google Scholar]
  14. ICH E9 Expert Working Group Statistical principles for clinical trials: ICH harmonized tripartite guideline. Statist. Med. 1999;18:1905–42. [PubMed] [Google Scholar]
  15. Koenig F, Brannath W, Bretz F, Posch M. Adaptive Dunnett tests for treatment selection. Statist. Med. 2008;27:1612–25. doi: 10.1002/sim.3048. [DOI] [PubMed] [Google Scholar]
  16. Lehmann EL. Testing Statistical Hypotheses. 2nd ed. Wiley; New York: 1986. [Google Scholar]
  17. Marcus R, Peritz E, Gabriel KR. On closed testing procedures with special reference to ordered analysis of variance. Biometrika. 1976;63:655–60. [Google Scholar]
  18. Müller HH, Schäfer H. A general statistical principle for changing a design any time during the course of a trial. Statist. Med. 2004;23:2497–508. doi: 10.1002/sim.1852. [DOI] [PubMed] [Google Scholar]
  19. Posch M, Koenig F, Branson M, Brannath W, Dunger-Baldauf C, Bauer P. Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statist. Med. 2005;24:3697–714. doi: 10.1002/sim.2389. [DOI] [PubMed] [Google Scholar]
  20. Proschan M, Hunsberger S. Designed extension of studies based on conditional power. Biometrics. 1995;51:1315–24. [PubMed] [Google Scholar]
  21. Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]
  22. Stefansson G, Kim W, Hsu J. On confidence sets in multiple comparisons. In: Gupta SS, Berger JO, editors. Statistical Decision Theory and Related Topics IV. Springer; New York: 1988. pp. 89–104. [Google Scholar]
  23. Strassburger K, Bretz F. Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Statist. Med. 2008;27:4914–27. doi: 10.1002/sim.3338. [DOI] [PubMed] [Google Scholar]

RESOURCES