Summary
We describe a general method for finding a confidence region for a parameter vector that is compatible with the decisions of a two-stage closed test procedure in an adaptive experiment. The closed test procedure is characterized by the fact that rejection or nonrejection of a null hypothesis may depend on the decisions for other hypotheses and the compatible confidence region will, in general, have a complex, nonrectangular shape. We find the smallest cross-product of simultaneous confidence intervals containing the region and provide computational shortcuts for calculating the lower bounds on parameters corresponding to the rejected null hypotheses. We illustrate the method with an adaptive phase II/III clinical trial.
Keywords: Closed testing principle, Combination test, Conditional error, Multiple comparisons, Simultaneous inference
1. Introduction
For experiments designed to make inference about a parameter vector θ = (θ1, … , θK), it is common to find confidence intervals for all of the individual θk such that the simultaneous coverage probability is at least 1 − α. Sometimes, though, an experimenter will only attempt to assert that an individual parameter exceeds a specific value, say θk > δk. If this cannot be achieved in such a way that the probability of making at least one incorrect rejection in a family of hypotheses Hk = {θk ⩽ δk} (k = 1, … , K) is no greater than α, the experimenter will not assert anything about θk. The latter method of inference is used in so-called closed test procedures (Marcus et al., 1976), and its advantage is often greater power.
For experiments conducted in a single stage, Hayter & Hsu (1994) showed how simultaneous 100(1 − α)% confidence intervals can be constructed to be compatible with some commonly used closed test procedures, in the sense that a null hypothesis Hk is rejected at familywise level α if and only if the confidence interval for θk excludes all values for which Hk is true. Often, these intervals are scarcely more informative than the test decisions. For example, for one-sided problems where larger parameter values are more beneficial, no 100(1 − α)% lower confidence bound for any individual θk can exceed δk unless all hypotheses H1, … , HK can be rejected at familywise level α.
In this article we derive confidence intervals for adaptive experiments. Our motivating example is a seamless phase II/III clinical trial, although the method is not limited to this setting. Such trials consist of a first stage in which K experimental treatments, indexed by T1= {1, … , K}, are compared with a common control and, after an interim analysis, a second stage in which only a subset of treatments, indexed by T2 ⊆ T1, are compared with the control. The state-of-the-art methodology for this problem (Bauer & Kieser, 1999; Posch et al., 2005; Bretz et al., 2009) is a hybrid of the closure principle of Marcus et al. (1976) and a p-value combination which goes back to Fisher (1932). This methodology allows any subset of treatments to be chosen at interim, based on all trial data and external factors. Other adaptations, such as sample size re-estimation, are also possible. A serious concern, though, is that there is no established method for constructing confidence intervals. As emphasized in the International Conference on Harmonisation’s E9 guideline (ICH E9 Expert Working Group, 1999, p. 1932), ‘Estimates of treatment effect should be accompanied by confidence intervals, whenever possible, and the way in which these will be calculated should be identified.’
Posch et al. (2005) proposed 100(1 − α)% simultaneous confidence intervals following such a trial. Unfortunately, their intervals are not guaranteed to be compatible with the closed test procedure. Here, we construct intervals that are compatible. As in the one-stage case, an inevitable shortcoming of these intervals is that they are not always substantially more informative than the original test decisions. We will show that this problem is mitigated to some extent by the adaptive nature of the experiment.
2. Fundamental Methodology
2·1. Closure principle
The closure principle of Marcus et al. (1976) is a general method for multiple hypothesis testing. A formal description is given in Finner & Strassburger (2002), and we adopt similar notation here. Let be a family of probability measures defined on a common sample space (Ω, ), where Θ is a multi-dimensional parameter space. Suppose that we wish to test a family of null hypotheses , where Hi ⊂ Θ for each i in some index set . Let denote a multiple test of , with each component ψi taking value 0 or 1 corresponding to nonrejection or rejection of Hi, respectively. It is often desirable to ensure that
(1) |
where is the index set of true hypotheses under θ*. In other words, the probability of rejecting at least one true null hypothesis is bounded by α. This is known as strong control of the familywise error rate. The closure principle can be used to ensure (1). We are required to find, for each such that is nonempty, a local level-α test φI for the intersection hypothesis HI; that is, we require
(2) |
where φI takes values in {0, 1} with the usual interpretation. If we define , then (1) holds. This can be very useful, as in many applications it is easy to find tests satisfying (2), whereas validating (1) directly is hard.
2·2. Combination test
Fisher (1932) discussed combining independent p-values to test a single null hypothesis. For convenience and brevity, we will only consider two-stage designs. We define a p-value combination function Q: [0, 1]2 ↣ [0, 1] that is left-continuous and nondecreasing in both its arguments and is uniformly distributed provided that both arguments are themselves independent and uniformly distributed. An example is
(3) |
where Φ denotes the standard normal distribution function.
Such a combination function lends itself to a two-stage adaptive closed test, ψ, for a family of null hypotheses, . An important application, discussed in Bretz et al. (2009), is a seamless phase II/III confirmatory clinical trial. We henceforth restrict attention to a parameter θ = (θ1, … , θK) taking values in parameter space and a family of null hypotheses where T1 = {1, … , K} and Hk = {θk ⩽ δk} (k ∈ T1) for some constants . The θk (k ∈ T1) might correspond to the mean effects of K different treatments, for example. By defining local tests φI (I ⊆ T1) via a combination function , it is possible to make data-dependent modifications to the trial design at an interim analysis (cf. Bauer & Kieser, 1999; Hommel, 2001; Brannath et al., 2002). For instance, attention can be focused on a subset T2 ⊆ T1 of the initial hypotheses of interest; changes can be made to sample sizes, allocation ratios, etc.
2·3. Two-stage closed test procedure
Assume that the full first-stage trial data are represented by a random vector with distribution function G(x; θ). Prior to starting the trial, one must specify a combination function and, for each I ⊆ T1, a first-stage test of Hi with an associated p-value function that satisfies for all . The second-stage design is unspecified.
At the interim analysis, the experimenter defines a second-stage design, d, by choosing a subset of the original hypotheses, indexed by T2 ⊆ T1, to continue studying in the second stage, along with second-stage sample sizes and, for each I ⊆ T1, a second-stage hypothesis test for HI. See below for a proposal for choosing second-stage tests for HI where I ⊈ T2. We assume that the design d is allowed to depend on the unblinded first-stage data x without prespecifying an adaptation rule. Let Y denote the data collected at the second stage, taking values in , and let (I ⊆ T1) denote the p-value functions of the second-stage tests. Because the tests used in the second stage depend on the first-stage data x and the chosen design d, the p-value functions will in general depend on both.
Let Fx,d(y; θ) denote the distribution function of the second-stage data, given the chosen design d and interim data x. We assume that for all x, d and I ⊆ T1, the second-stage p-values satisfy for all u ∈ [0, 1]. The distribution Fx,d is assumed to be known, i.e., not merely specified up to a null set, for all x and d, a condition that can be formalized by assuming an appropriate regression model (Brannath et al., 2012). See § 3·2 for a numerical example.
At the final analysis, for each I ⊆ T1, the test decision is φI = 1 if and only if . As shown in Brannath et al. (2012), this combination test for HI controls the Type I error rate at level α.
We assume that only data for the hypotheses indexed by T2 are collected in the second stage and propose setting for I ⊈ T2, where we drop the indices x and d for simplicity and set by convention. Such second-stage p-values have the required distribution under HI∩T2 and hence also under HI.
We emphasize that while Type I error control is guaranteed even if the second-stage design is initially open-ended, in the design of actual clinical trials it is crucial to perform detailed planning based on likely first-stage outcomes. The added flexibility is necessary because it is impossible to foresee all eventualities in extremely complex areas such as clinical drug development.
3. Confidence regions
3·1. Partitioning the parameter space
A standard approach to deriving a 100(1 − α)% confidence set for θ is to perform a level-α test of each elementary hypothesis {θ = θ*} (θ* ∈ Θ) and include all θ* corresponding to nonrejected hypotheses (see, e.g., Lehmann, 1986, p. 90). To ensure compatibility with closed testing, the key idea (Stefansson et al., 1988; Hayter & Hsu, 1994; Finner & Strassburger, 2002) is to partition the parameter space into disjoint regions
and apply different tests in each of the disjoint ΘI. If, for each I ⊆ T1, we let {φI (θ*): θ* ∈ Θ} denote a family of tests with
(4) |
where φI (θ*) takes values in {0, 1} with the usual interpretation, we can apply the following general result from Hsu (1996, p. 234).
Lemma 1. A level-100(1 − α)% confidence set for θ is
(5) |
Our aim is to find families of tests such that C is compatible with the two-stage closed test procedure. This requires us to augment our specification of with a family of p-values where, under {θ = θ*}, the distribution of and meet conditions as outlined for and in § 2·3. Additionally, if we treat the data as fixed and view each family as a function , then unless is constant in all arguments such that i ∉ I ∩ Tj, and is left-continuous and nondecreasing in all arguments such that i ∈ I ∩ Tj, with for any θ* such that for all i ∈ I ∩ Tj. Furthermore, we assume that
(6) |
Proposition 1. Inserted into (5), the following families of hypothesis tests give rise to a 100(1 − α)% confidence set for θ, denoted by C, that is compatible with the two-stage closed test procedure, i.e., ψk = 1 if and only if Hk ∩ C = ∅: for ∅ ≠ I ⊆ T1 and θ* ∈ Θ,
(7) |
and {φ∅(θ*): θ* ∈ Θ } is any family of tests satisfying (4).
Proof. See the Appendix.
There will be no unique collection of families of p-values satisfying the aforementioned distributional and monotonicity constraints. Rather, the families must be specified in a two-stage procedure in an analogous way to the p-values in § 2·3. As will become clear from the example below, for many commonly encountered scenarios and when I ∩ Tj ≠ ∅, the choice of will be obvious from the choice of . As a simple example, suppose that is the p-value from a one-sided z-test of the null hypothesis {θk ⩽ δk} using the stage-j data only. Then the natural choice for is the one-sided p-value from a standard z-test of using the same stage-j data.
While for I ∩ Tj ≠ ∅ there will often be a natural choice for , it is unclear how φ∅(θ*) and should be chosen. A reasonable suggestion is given below.
Corollary 1. Define for j = 1,2. The following is a 100(1 − α)% confidence region for θ that is compatible with the two-stage closed test procedure:
(8) |
The properties of a region defined by (8) are best illustrated by a specific example.
3·2. Example
Posch et al. (2005) considered a clinical trial where three active treatments, indexed by T1 = {A, B, C}, are compared with a placebo using a two-stage adaptive design. The individual null hypotheses of interest are Hk = {θk ⩽ 0} (k ∈ T1), where θk = πk − π0 denotes the difference between the success probabilities of treatment k and placebo. Denote the observed success rate of treatment k in stage j by , where treatment 0 corresponds to a placebo.
At the design stage, the inverse normal combination function (3) is specified and n1 = 140 first-stage patients are recruited to each treatment arm. Approximately, the are multivariate normal with and positive correlations. Based on this assumption, Simes (1986) tests are used for each intersection hypothesis; that is, for k ∈ T1 and, for |I|>1, , where R(k, I) denotes the rank of among . The natural way of augmenting these p-values is to define for k ∈ T1 and for |I| > 1, where R(k, I, θ*) denotes the rank of among .
Suppose that the unblinded first-stage results are , and . The experimenter decides that treatments A and C are not to be considered in the second stage owing to lack of efficacy and safety concerns, respectively. A further n2 = 140 patients are recruited to both treatment B and placebo. A family of p-values with is chosen, where .
Now suppose that the second-stage results are and . The p-values from the elementary hypotheses are , , and . Therefore , and . As , HB can be rejected at familywise level 0·025. Both HA and HC fail to be rejected, as for k = A,C. A compatible 97·5% confidence region for θ is given by
(9) |
where is defined as for all θ* ∈ Θ.
The region (9) will have a complicated three-dimensional shape. However, in terms of making inference on θB, its crucial features can be seen by taking two cross-sections, as displayed in Fig. 1. As is nondecreasing in for all I ⊆ T1, we know that for any γ ∈ (-∞, 0), the cross-section at is contained in the cross-section at . Similarly, for any γ ∈ (0, ∞), the cross-section at is contained in the limit of the cross-section of the region as . One can see immediately from Fig. 1 that for any ϵ > 0, the 97·5% confidence region fails to exclude all parameter vectors θ* such that . In other words, the lower confidence bound on θB provides no more information than the decision of the closed test procedure.
For confidence intervals that are compatible with single-stage closed test procedures (Hayter & Hsu, 1994; Strassburger & Bretz, 2008; Guilbaud, 2008), a necessary condition for obtaining informative lower confidence bounds for parameters corresponding to the rejected null hypotheses is that ψk =1 for all k ∈ T1. In the adaptive setting, this is no longer a necessary condition. For example, repeating the above test procedure at level α=0·05, the compatible 95% confidence region analogous to (9) is also summarized in Fig. 1. Here it appears, and indeed can be verified by considering all values of , that there does exist some ϵ > 0 such that the confidence region excludes all parameter vectors θ* for which . We will show that for the two-stage adaptive setting, a necessary condition for informative lower confidence bounds on parameters corresponding to the rejected null hypotheses is that ψk =1 for all k ∈ T2. However, as can be seen from Fig. 1, this condition is not sufficient.
3·3. A two-stage, single-step confidence region
Posch et al. (2005) proposed the following 100(1 − α)% confidence region:
(10) |
They note that the resulting confidence intervals are not compatible with the closed test procedure described in § 2·3 (Posch et al., 2005, p. 3702). Nevertheless, the region (10) can be used to generate an alternative multiple test. More generally, any 1 − α confidence set C generates a multiple test for a family of hypotheses , whereby is rejected if and only if Hk ∩ C = ∅. This guarantees strong control of the familywise error rate (1). The multiple test generated by (10) can be thought of as single-step in the sense that rejection or nonrejection of a null hypothesis does not take into account the decision for any other hypothesis. If Hk is rejected, informative lower bounds will be available for θk regardless of the test decisions for all other hypotheses.
4. Computation of confidence intervals
4·1. Least-favourable parameter configurations
In the above example, marginal inference on θB was achieved by considering least-favourable parameter configurations for θk, k ∈ T1 \ {B}. This idea can be generalized to find 100(1 − α)% simultaneous confidence intervals containing (8) or (10).
Definition 1. For j = 1, 2, k ∈ T1 and I ⊆ Tj, the locally least-favourable jth-stage p-value function for Hk in ΘI, , is defined for I ≠ ∅ as , where ξ =(ξ1, … , ξK) with ξi =δi for i ≠ k and ξk = ϑ. Additionally, for j = 1, 2,
(11) |
Proposition 2. The smallest Cartesian product of intervals, ×k∈T1(lk, ∞), that contains the confidence region (8) has lk = minI⊆T1 lk,I, where for k ∈ I,
(12) |
and for k ∉ I,
(13) |
Furthermore, these intervals are compatible with the two-stage closed test procedure, i.e., ψk = 1 if and only if Hk ∩ ×k∈T1(lk, ∞)=∅.
Proof. See the Appendix.
In general, to find each interval requires one-dimensional root finding for each I ⊆ T1, a calculation that is O(2K). However, substantial shortcuts are available for reducing the computational burden.
4·2. Efficient computation of confidence bounds
There are two possible scenarios at the end of the closed test procedure: either ψk = 1 for all k ∈ T2, or at least one Hk (k ∈ T2) fails to be rejected. In the latter case, there exists some I ⊆ T1 with I ∩ T2 ≠ ∅ such that for any k ∈ T2,
and therefore lk ⩽ lk,I ⩽ δk. Due to the compatibility of the intervals with the closed test procedure, if ψk = 1, then lk = δk; if ψk = 0, then lk < δk.
If ψk =1 for all k ∈ T2, then lk ⩾ δk for all k ∈ T2. Additionally, we can use the fact that for all k ∈ T2 and I ⊆ T1 with I ∩ T2 ≠ ∅, we know from (12) and (13) that lk,I = ∞; so, when finding lk =minI⊆T1 lk,I in Proposition 2, the minimum can be taken over a much smaller number of lk,I. The following algorithm finds the lower bounds for all parameters corresponding to the rejected hypotheses.
Step 1. Perform the closed test procedure. If ψk′ = 0 for some k′ ∈ T2, then lk = δk for ψk =1 and lk < δk for ψk =0. If ψk =1 for all k ∈ T2, go to Step 2.
Step 2. Find . If T1 \ T2 = ∅, then pM =0.
Step 3. For k ∈ T2,
The cost of computing the intervals for θk (k ∈ T2) in Step 3 is linear in the number of parameters. Step 1 is O(2|T1|), but a shortcut of O(|T1|2) is given in Brannath & Bretz (2010). Step 2 is O(2|T1\T2|), but a shortcut of size |T1 \ T2| is available, provided there exists an ordering i1, … , ik of T1 \ T2 such that for each u ∈ {1, … , k}, for all J ⊆ L ⊆ {iu, … , ik} with iu ∈ J. This is because we only have to check for u =1, … , k. Many common multiple test procedures, such as those based on Dunnett (1955) tests or weighted Bonferroni tests, satisfy this condition, with the ordering i1, … , ik following the ordering of the univariate test statistics or the weighted elementary p-values (Brannath & Bretz, 2010).
4·3. Lower bounds for parameters corresponding to retained hypotheses
Consider k ∈ T2 such that ψk = 0. We know that lk < δk, and therefore we need only consider lk,I such that k ∈ I. However, since in general lk,I < ∞, finding the minimum such lower bound will still have a computational cost that is exponential in the number of parameters.
For k ∈ I ⊆ T1 \ T2, we have and know from (11) and (6) that this is equal to 1. Many commonly used combination functions, including (3), have the property that v = 1 implies . In this case, lk = −∞ for all k ∈ T1 \ T2.
4·4. Lower bounds for the two-stage single-step procedure
Posch et al. (2005) showed that the region (10) is contained in a rectangle, , where
(14) |
The computation of each interval requires only a one-dimensional search for a root, and overall computation will be linear in the number of parameters.
4·5. Example continued
Recall from § 3·2 that T2 = {B} and ψB = 1. Proceeding to Step 2 of the above algorithm, pM =0·419. In this case we need just one iteration in Step 3, because
and therefore the 97·5% confidence interval for θB is (0, ∞), consistent with Fig. 1. This example emphasizes that there is a price to pay for the additional power of the closed test as opposed to the single-step procedure of § 3·3 with, by (14),
While this agrees with the assertion θB > 0 in this specific case, it is invalid to claim it as a 97·5% lower confidence bound if the closed test procedure of § 2·3 had been planned. One can see that for any α > 0·036, the 100(1 − α)% confidence interval for treatment B that is compatible with the closed test procedure has a positive lower bound. For example, the 95% lower confidence bound is lB = 0·0112, consistent with Fig. 1. Again, if the region (10) had been specified pre-trial, the 95% lower confidence bound (14) would have been .
5. Confidence bounds for closed tests based on the conditional error rate
Consider again the two-stage closed test procedure of § 2·3. As an alternative to combination tests, Koenig et al. (2008) used the conditional error approach (Proschan & Hunsberger, 1995) to derive local tests φI (I ⊆ T1). The only difference is that instead of prespecifying a combination function Q and first-stage p-value , one must prespecify a measurable conditional error function such that
and, at the final analysis, φI =1 if and only if .
To produce a compatible 100(1 − α)% confidence region for θ, each AI (I ⊆ T1) must be augmented with a family of conditional error functions {AI(θ*) : θ* ∈ Θ} such that and, for fixed , AI(θ*) is constant in all arguments with i ∉ I and is left-continuous and nonincreasing in all arguments with i ∈ I. Furthermore, AI (θ*)= AI for all θ* ∈ Θ such that for i ∈ I. The second-stage p-values must be augmented with a family as described in § 3·1.
Müller & Schäfer (2004) propose defining AI = supθ*∈HI Eθ*(ϕI | X), where ϕI is a pre-planned fixed sample level-α test for HI. In many situations the natural choice for AI(θ*) will be obvious from AI. For example, if ϕI is the decision function for a Dunnett (1955) test of HI = ⋂k∈I{θk ⩽ δk}, then it is natural to choose AI (θ*) = Eθ*(ϕI,θ* | X) where ϕI,θ* is the decision function for a Dunnett test of which can be derived via a corresponding translation of the test statistics.
Using the arguments of Propositions 1 and 2, it can be shown that, analogously to (8), a compatible 100(1 − α)% confidence region for θ is
where and A∅(θ*) are set equal to and AT1(θ*) respectively. Also, the largest compatible 100(1 − α)% confidence lower bounds are lk =minI⊆T1 lk,I, where for k ∈ I,
and for k ∉ I, with Ak,I(ϑ) defined analogously to in Definition 1.
6. Concluding remarks
The lower confidence bounds (12)–(13) provide more information about the location of θ than the decisions of the closed test procedure of § 2·3. The utility of this additional information will depend strongly on the context. In practice, the primary concern will often be to find lower bounds for the components of θ corresponding to the rejected null hypotheses. As this can be achieved using an algorithm that is O(K2), application to large-scale simultaneous inference problems is, in principle, feasible. However, these lower bounds will only be informative if all hypotheses considered in the second stage of testing are rejected, and even this may be insufficient. In practice, therefore, the lower bounds (12)–(13) are only likely to be useful in relatively small-scale problems. Furthermore, in situations where informative lower confidence bounds are deemed to be more important than the possibility of rejecting as many individual null hypotheses as possible, it would be sensible to use the intervals (14) instead of applying the closed test procedure. For large-scale simultaneous inference problems, an approach based on controlling the false coverage-statement rate (Benjamini & Yekutieli, 2005) may be more appropriate than aiming for a high simultaneous coverage probability.
Extensions to more than two stages and to allow early rejection of hypotheses are straightforward with an appropriate combination function in place of (3). An open question is how best to choose φ∅(θ*) and . The tests we use in region (8) are a natural choice but may not be the most powerful.
Acknowledgement
This work was supported by the National Institute for Health Research and the Austrian Science Fund. The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health.
Appendix
Proof of Proposition 1. With the assumptions in § 3·1, all tests of the form (7) satisfy condition (4), and therefore C is a 100(1 − α)% confidence set for θ. By the monotonicity conditions imposed on the p-values, we have for all θ* ∈ ΘI (j = 1,2; I ≠ ∅; I ⊆ T1), so that ΘI ∩ C = ∅ if and only if . Therefore, ψ =1 if and only if minI⊆T1,k∈I if and only if ⋃I⊆T1,k∈I ΘI ∩ C = ∅. Since ⋃I⊆T1,k∈I ΘI = Hk, we have compatibility.
Proof of Proposition 2. First, note the key property that for all θ* ∈ ΘI with .
To show that C1⊆×k∈T1 (lk, ∞), consider any θ* ∈ Θ \ ×k∈T1 (lk, ∞). We must have θ* ⊆ ΘI for some I ⊆ T1 and for some k ∈ T1. If k ∈ I, then , and (12) implies that . The same inequality follows from and (13) if k ∉ I. Therefore, θ* ∉ C1 and C1 ⊆ ×k∈T1 (lk, ∞).
To show that no smaller interval (lk + ϵ, ∞) is possible for any ϵ > 0, we must find some θ* ∈ C1 with . Consider a subset I ⊆ T1 such that lk = lk,I and therefore for all ϑ > lk. If k ∈ I or, equivalently, lk < δk, take any . If k ∉ I or, equivalently, lk ⩾ δk, take any . Now consider a parameter vector , where , for k ≠ i ∈ I, and for i ∉ I ∪ {k}. All such parameter vectors ξI,k are contained in ΘI, and
Thus there exists some such ξI,k ∈ C1 and hence C1 is not contained in this smaller product of intervals.
Finally, Hk ∩×k∈T1 (lk, ∞) = ∅ if and only if lk,I ⩾ δk for I ⊆ T1. if and only if , for I ⊆ T1 and k ∈ I, if and only if ψk = 1.
References
- Bauer P, Kieser M. Combining different phases in the development of medical treatments within a single trial. Statist. Med. 1999;18:1833–48. doi: 10.1002/(sici)1097-0258(19990730)18:14<1833::aid-sim221>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- Benjamini Y, Yekutieli Y. False discovery rate controlling confidence intervals for selected parameters. J. Am. Statist. Assoc. 2005;100:71–80. [Google Scholar]
- Brannath W, Bretz F. Shortcuts for locally consonant closed test procedures. J. Am. Statist. Assoc. 2010;105:660–9. [Google Scholar]
- Brannath W, Gutjahr G, Bauer P. Probabilistic foundation of confirmatory adaptive designs. J. Am. Statist. Assoc. 2012;107:824–32. [Google Scholar]
- Brannath W, Posch M, Bauer P. Recursive combination tests. J. Am. Statist. Assoc. 2002;97:236–44. [Google Scholar]
- Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statist. Med. 2009;28:1181–217. doi: 10.1002/sim.3538. [DOI] [PubMed] [Google Scholar]
- Dunnett C. A multiple comparison procedure for comparing several treatments with a control. J. Am. Statist. Assoc. 1955;50:1096–121. [Google Scholar]
- Finner H, Strassburger K. The partitioning principle: a powerful tool in multiple decision theory. Ann. Statist. 2002;30:1194–213. [Google Scholar]
- Fisher RA. Statistical Methods for Research Workers. 4th ed. Oliver and Boyd; London: 1932. [Google Scholar]
- Guilbaud O. Simultaneous confidence regions corresponding to Holm’s stepdown procedure and other closed-testing procedures. Biomet. J. 2008;50:678–92. doi: 10.1002/bimj.200710449. [DOI] [PubMed] [Google Scholar]
- Hayter AJ, Hsu JC. On the relationship between stepwise decision procedures and confidence sets. J. Am. Statist. Assoc. 1994;89:128–36. [Google Scholar]
- Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biomet. J. 2001;43:581–9. [Google Scholar]
- Hsu JC. Multiple Comparisons: Theory and Methods. Chapman and Hall; London: 1996. [Google Scholar]
- ICH E9 Expert Working Group Statistical principles for clinical trials: ICH harmonized tripartite guideline. Statist. Med. 1999;18:1905–42. [PubMed] [Google Scholar]
- Koenig F, Brannath W, Bretz F, Posch M. Adaptive Dunnett tests for treatment selection. Statist. Med. 2008;27:1612–25. doi: 10.1002/sim.3048. [DOI] [PubMed] [Google Scholar]
- Lehmann EL. Testing Statistical Hypotheses. 2nd ed. Wiley; New York: 1986. [Google Scholar]
- Marcus R, Peritz E, Gabriel KR. On closed testing procedures with special reference to ordered analysis of variance. Biometrika. 1976;63:655–60. [Google Scholar]
- Müller HH, Schäfer H. A general statistical principle for changing a design any time during the course of a trial. Statist. Med. 2004;23:2497–508. doi: 10.1002/sim.1852. [DOI] [PubMed] [Google Scholar]
- Posch M, Koenig F, Branson M, Brannath W, Dunger-Baldauf C, Bauer P. Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statist. Med. 2005;24:3697–714. doi: 10.1002/sim.2389. [DOI] [PubMed] [Google Scholar]
- Proschan M, Hunsberger S. Designed extension of studies based on conditional power. Biometrics. 1995;51:1315–24. [PubMed] [Google Scholar]
- Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]
- Stefansson G, Kim W, Hsu J. On confidence sets in multiple comparisons. In: Gupta SS, Berger JO, editors. Statistical Decision Theory and Related Topics IV. Springer; New York: 1988. pp. 89–104. [Google Scholar]
- Strassburger K, Bretz F. Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Statist. Med. 2008;27:4914–27. doi: 10.1002/sim.3338. [DOI] [PubMed] [Google Scholar]