Simultaneous confidence intervals that are compatible with closed testing in adaptive designs

D MAGIRR; T JAKI; M POSCH; F KLINGLMUELLER

doi:10.1093/biomet/ast035

. Author manuscript; available in PMC: 2016 Mar 24.

Published in final edited form as: Biometrika. 2013 Dec 1;100(4):985–996. doi: 10.1093/biomet/ast035

Simultaneous confidence intervals that are compatible with closed testing in adaptive designs

D MAGIRR ¹, T JAKI ¹, M POSCH ², F KLINGLMUELLER ²

PMCID: PMC4806862 EMSID: EMS67492 PMID: 27019516

Summary

We describe a general method for finding a confidence region for a parameter vector that is compatible with the decisions of a two-stage closed test procedure in an adaptive experiment. The closed test procedure is characterized by the fact that rejection or nonrejection of a null hypothesis may depend on the decisions for other hypotheses and the compatible confidence region will, in general, have a complex, nonrectangular shape. We find the smallest cross-product of simultaneous confidence intervals containing the region and provide computational shortcuts for calculating the lower bounds on parameters corresponding to the rejected null hypotheses. We illustrate the method with an adaptive phase II/III clinical trial.

Keywords: Closed testing principle, Combination test, Conditional error, Multiple comparisons, Simultaneous inference

1. Introduction

For experiments designed to make inference about a parameter vector θ = (θ₁, … , θ_K), it is common to find confidence intervals for all of the individual θ_k such that the simultaneous coverage probability is at least 1 − α. Sometimes, though, an experimenter will only attempt to assert that an individual parameter exceeds a specific value, say θ_k > δ_k. If this cannot be achieved in such a way that the probability of making at least one incorrect rejection in a family of hypotheses H_k = {θ_k ⩽ δ_k} (k = 1, … , K) is no greater than α, the experimenter will not assert anything about θ_k. The latter method of inference is used in so-called closed test procedures (Marcus et al., 1976), and its advantage is often greater power.

For experiments conducted in a single stage, Hayter & Hsu (1994) showed how simultaneous 100(1 − α)% confidence intervals can be constructed to be compatible with some commonly used closed test procedures, in the sense that a null hypothesis H_k is rejected at familywise level α if and only if the confidence interval for θ_k excludes all values for which H_k is true. Often, these intervals are scarcely more informative than the test decisions. For example, for one-sided problems where larger parameter values are more beneficial, no 100(1 − α)% lower confidence bound for any individual θ_k can exceed δ_k unless all hypotheses H₁, … , H_K can be rejected at familywise level α.

In this article we derive confidence intervals for adaptive experiments. Our motivating example is a seamless phase II/III clinical trial, although the method is not limited to this setting. Such trials consist of a first stage in which K experimental treatments, indexed by T₁= {1, … , K}, are compared with a common control and, after an interim analysis, a second stage in which only a subset of treatments, indexed by T₂ ⊆ T₁, are compared with the control. The state-of-the-art methodology for this problem (Bauer & Kieser, 1999; Posch et al., 2005; Bretz et al., 2009) is a hybrid of the closure principle of Marcus et al. (1976) and a p-value combination which goes back to Fisher (1932). This methodology allows any subset of treatments to be chosen at interim, based on all trial data and external factors. Other adaptations, such as sample size re-estimation, are also possible. A serious concern, though, is that there is no established method for constructing confidence intervals. As emphasized in the International Conference on Harmonisation’s E9 guideline (ICH E9 Expert Working Group, 1999, p. 1932), ‘Estimates of treatment effect should be accompanied by confidence intervals, whenever possible, and the way in which these will be calculated should be identified.’

Posch et al. (2005) proposed 100(1 − α)% simultaneous confidence intervals following such a trial. Unfortunately, their intervals are not guaranteed to be compatible with the closed test procedure. Here, we construct intervals that are compatible. As in the one-stage case, an inevitable shortcoming of these intervals is that they are not always substantially more informative than the original test decisions. We will show that this problem is mitigated to some extent by the adaptive nature of the experiment.

2. Fundamental Methodology

2·1. Closure principle

The closure principle of Marcus et al. (1976) is a general method for multiple hypothesis testing. A formal description is given in Finner & Strassburger (2002), and we adopt similar notation here. Let $P = {P_{θ^{*}} : θ^{*} \in Θ}$ be a family of probability measures defined on a common sample space (Ω, $F$ ), where Θ is a multi-dimensional parameter space. Suppose that we wish to test a family of null hypotheses $H = {H_{i} : i \in I}$ , where H_i ⊂ Θ for each i in some index set $I$ . Let $ψ = {ψ_{i} : i \in I}$ denote a multiple test of $H$ , with each component ψ_i taking value 0 or 1 corresponding to nonrejection or rejection of H_i, respectively. It is often desirable to ensure that

\sup_{θ^{*} \in Θ} P_{θ^{*}} (⋃_{i \in I (θ^{*})} {ψ_{i} = 1}) ⩽ α,

(1)

where $I (θ^{*}) = {i \in I : θ^{*} \in H_{i}}$ is the index set of true hypotheses under θ*. In other words, the probability of rejecting at least one true null hypothesis is bounded by α. This is known as strong control of the familywise error rate. The closure principle can be used to ensure (1). We are required to find, for each $I \subseteq I$ such that $H_{I} = ⋂_{i \in I} H_{i}$ is nonempty, a local level-α test φ_I for the intersection hypothesis H_I; that is, we require

\sup_{θ^{*} \in H_{I}} P_{θ^{*}} (φ_{I} = 1) ⩽ α,

(2)

where φ_I takes values in {0, 1} with the usual interpretation. If we define $ψ_{i} = \min_{I : H_{I} \neq \emptyset, H_{I} \subseteq H_{i}} (φ_{I})$ , then (1) holds. This can be very useful, as in many applications it is easy to find tests satisfying (2), whereas validating (1) directly is hard.

2·2. Combination test

Fisher (1932) discussed combining independent p-values to test a single null hypothesis. For convenience and brevity, we will only consider two-stage designs. We define a p-value combination function Q: [0, 1]² ↣ [0, 1] that is left-continuous and nondecreasing in both its arguments and is uniformly distributed provided that both arguments are themselves independent and uniformly distributed. An example is

Q (u, v) = 1 - Φ [2^{1 ∕ 2} {Φ^{- 1} (1 - u) + Φ^{- 1} (1 - v)}],

(3)

where Φ denotes the standard normal distribution function.

Such a combination function lends itself to a two-stage adaptive closed test, ψ, for a family of null hypotheses, $H$ . An important application, discussed in Bretz et al. (2009), is a seamless phase II/III confirmatory clinical trial. We henceforth restrict attention to a parameter θ = (θ₁, … , θ_K) taking values in parameter space $Θ = ℝ^{K}$ and a family of null hypotheses $H = {H_{k} : k \in T_{1}}$ where T₁ = {1, … , K} and H_k = {θ_k ⩽ δ_k} (k ∈ T₁) for some constants $δ_{1}, \dots, δ_{K} \in ℝ$ . The θ_k (k ∈ T₁) might correspond to the mean effects of K different treatments, for example. By defining local tests φ_I (I ⊆ T₁) via a combination function $Q$ , it is possible to make data-dependent modifications to the trial design at an interim analysis (cf. Bauer & Kieser, 1999; Hommel, 2001; Brannath et al., 2002). For instance, attention can be focused on a subset T₂ ⊆ T₁ of the initial hypotheses of interest; changes can be made to sample sizes, allocation ratios, etc.

2·3. Two-stage closed test procedure

Assume that the full first-stage trial data are represented by a random vector $X \in ℝ^{n}$ with distribution function G(x; θ). Prior to starting the trial, one must specify a combination function $Q$ and, for each I ⊆ T₁, a first-stage test of $H_{I} = ⋂_{i \in I}$ H_i with an associated p-value function $p_{I}^{(1)} : ℝ^{n} \to [0, 1]$ that satisfies $\sup_{θ^{*} \in H_{I}} \int_{{p_{I}^{(1)} (x) ⩽ u}} d G (x; θ^{*}) ⩽ u$ for all $u \in [0, 1]$ . The second-stage design is unspecified.

At the interim analysis, the experimenter defines a second-stage design, d, by choosing a subset of the original hypotheses, indexed by T₂ ⊆ T₁, to continue studying in the second stage, along with second-stage sample sizes and, for each I ⊆ T₁, a second-stage hypothesis test for H_I. See below for a proposal for choosing second-stage tests for H_I where I ⊈ T₂. We assume that the design d is allowed to depend on the unblinded first-stage data x without prespecifying an adaptation rule. Let Y denote the data collected at the second stage, taking values in $ℝ^{m}$ , and let $p_{I, x, d}^{(2)} (y)$ (I ⊆ T₁) denote the p-value functions of the second-stage tests. Because the tests used in the second stage depend on the first-stage data x and the chosen design d, the p-value functions will in general depend on both.

Let F_x,d(y; θ) denote the distribution function of the second-stage data, given the chosen design d and interim data x. We assume that for all x, d and I ⊆ T₁, the second-stage p-values $p_{I, x, d}^{(2)}$ satisfy $\sup_{θ^{*} \in H_{I}} \int_{{p_{I, x, d}^{(2)} (y) ⩽ u}} d F_{x, d} (y; θ^{*}) ⩽ u$ for all u ∈ [0, 1]. The distribution F_x,d is assumed to be known, i.e., not merely specified up to a null set, for all x and d, a condition that can be formalized by assuming an appropriate regression model (Brannath et al., 2012). See § 3·2 for a numerical example.

At the final analysis, for each I ⊆ T₁, the test decision is φ_I = 1 if and only if $Q {p_{I}^{(1)}, p_{I, x, d}^{(2)}} ⩽ α$ . As shown in Brannath et al. (2012), this combination test for H_I controls the Type I error rate at level α.

We assume that only data for the hypotheses indexed by T₂ are collected in the second stage and propose setting $p_{I}^{(2)} = p_{I \cap T_{2}}^{(2)}$ for I ⊈ T₂, where we drop the indices x and d for simplicity and set $p_{\emptyset}^{(2)} = 1$ by convention. Such second-stage p-values have the required distribution under H_I∩T₂ and hence also under H_I.

We emphasize that while Type I error control is guaranteed even if the second-stage design is initially open-ended, in the design of actual clinical trials it is crucial to perform detailed planning based on likely first-stage outcomes. The added flexibility is necessary because it is impossible to foresee all eventualities in extremely complex areas such as clinical drug development.

3. Confidence regions

3·1. Partitioning the parameter space

A standard approach to deriving a 100(1 − α)% confidence set for θ is to perform a level-α test of each elementary hypothesis {θ = θ*} (θ* ∈ Θ) and include all θ* corresponding to nonrejected hypotheses (see, e.g., Lehmann, 1986, p. 90). To ensure compatibility with closed testing, the key idea (Stefansson et al., 1988; Hayter & Hsu, 1994; Finner & Strassburger, 2002) is to partition the parameter space into disjoint regions

Θ_{I} = {θ^{*} \in Θ : θ_{i}^{*} ⩽ δ_{i}, i \in I; θ_{i}^{*} > δ_{i}, i \in T_{1} \ I} (I \subseteq T_{1})

and apply different tests in each of the disjoint Θ_I. If, for each I ⊆ T₁, we let {φ_I (θ*): θ* ∈ Θ} denote a family of tests with

\inf_{θ^{*} \in Θ} P_{θ^{*}} {φ_{I} (θ^{*}) = 0} ⩾ 1 - α,

(4)

where φ_I (θ*) takes values in {0, 1} with the usual interpretation, we can apply the following general result from Hsu (1996, p. 234).

Lemma 1. A level-100(1 − α)% confidence set for θ is

C = ⋃_{I \subseteq T_{1}} [{θ^{*} \in Θ : φ_{I} (θ^{*}) = 0} \cap Θ_{I}] .

(5)

Our aim is to find families of tests such that C is compatible with the two-stage closed test procedure. This requires us to augment our specification of $p_{I \cap T_{j}}^{(j)} (j = 1, 2; I \subseteq T_{1})$ with a family of p-values ${p_{I \cap T_{j}}^{(j)} (θ^{*}) : θ^{*} \in Θ}$ where, under {θ = θ*}, the distribution of $p_{I}^{(1)} (θ^{*})$ and $p_{I \cap T_{2}}^{(2)} (θ^{*})$ meet conditions as outlined for $p_{I}^{(1)}$ and $p_{I \cap T_{2}}^{(2)}$ in § 2·3. Additionally, if we treat the data as fixed and view each family as a function $p_{I \cap T_{j}}^{(j)} : Θ \to [0, 1]$ , then unless $I \cap T_{j} = \emptyset, p_{I \cap T_{j}}^{(j)} (θ^{*})$ is constant in all arguments $θ_{i}^{*}$ such that i ∉ I ∩ T_j, and is left-continuous and nondecreasing in all arguments $θ_{i}^{*}$ such that i ∈ I ∩ T_j, with $p_{I \cap T_{j}}^{(j)} (θ^{*}) = p_{I \cap T_{j}}^{(j)}$ for any θ* such that $θ_{i}^{*} = δ_{i}$ for all i ∈ I ∩ T_j. Furthermore, we assume that

\lim_{θ_{i}^{*} \to \infty, i \in T_{2}} p_{\emptyset}^{(2)} (θ^{*}) = 1 .

(6)

Proposition 1. Inserted into (5), the following families of hypothesis tests give rise to a 100(1 − α)% confidence set for θ, denoted by C, that is compatible with the two-stage closed test procedure, i.e., ψ_k = 1 if and only if H_k ∩ C = ∅: for ∅ ≠ I ⊆ T₁ and θ* ∈ Θ,

φ_{I} (θ^{*}) = {\begin{matrix} 1, & Q {p_{I}^{(1)} (θ^{*}), p_{I \cap T_{2}}^{(2)} (θ^{*})} ⩽ α, \\ 0, & Q {p_{I}^{(1)} (θ^{*}), p_{I \cap T_{2}}^{(2)} (θ^{*})} > α, \end{matrix}

(7)

and {φ_∅(θ*): θ* ∈ Θ } is any family of tests satisfying (4).

Proof. See the Appendix.

There will be no unique collection of families of p-values satisfying the aforementioned distributional and monotonicity constraints. Rather, the families must be specified in a two-stage procedure in an analogous way to the p-values in § 2·3. As will become clear from the example below, for many commonly encountered scenarios and when I ∩ T_j ≠ ∅, the choice of ${p_{I \cap T_{j}}^{(j)} (θ^{*}) : θ^{*} \in Θ}$ will be obvious from the choice of $p_{I \cap T_{j}}^{(j)}$ . As a simple example, suppose that $p_{{k}}^{(j)}$ is the p-value from a one-sided z-test of the null hypothesis {θ_k ⩽ δ_k} using the stage-j data only. Then the natural choice for $p_{{k}}^{(j)} (θ^{*})$ is the one-sided p-value from a standard z-test of ${θ_{k} ⩽ θ_{k}^{*}}$ using the same stage-j data.

While for I ∩ T_j ≠ ∅ there will often be a natural choice for $p_{I \cap T_{j}}^{(j)} (θ^{*})$ , it is unclear how φ_∅(θ*) and $p_{\emptyset}^{(2)} (θ^{*})$ should be chosen. A reasonable suggestion is given below.

Corollary 1. Define $p_{\emptyset}^{(j)} (θ^{*}) = p_{T_{j}}^{(j)} (θ^{*})$ for j = 1,2. The following is a 100(1 − α)% confidence region for θ that is compatible with the two-stage closed test procedure:

C_{1} = ⋃_{I \subseteq T_{1}} [θ^{*} \in Θ_{I} : Q {p_{I}^{(1)} (θ^{*}), p_{I \cap T_{2}}^{(2)} (θ^{*})} > α] .

(8)

The properties of a region defined by (8) are best illustrated by a specific example.

3·2. Example

Posch et al. (2005) considered a clinical trial where three active treatments, indexed by T₁ = {A, B, C}, are compared with a placebo using a two-stage adaptive design. The individual null hypotheses of interest are H_k = {θ_k ⩽ 0} (k ∈ T₁), where θ_k = π_k − π₀ denotes the difference between the success probabilities of treatment k and placebo. Denote the observed success rate of treatment k in stage j by ${\hat{π}}_{k, j} (k \in T_{1} \cup {0}; j = 1, 2)$ , where treatment 0 corresponds to a placebo.

At the design stage, the inverse normal combination function (3) is specified and n₁ = 140 first-stage patients are recruited to each treatment arm. Approximately, the ${\hat{θ}}_{k, 1} = {\hat{π}}_{k, 1} - {\hat{π}}_{0, 1} (k \in T_{1})$ are multivariate normal with $E ({\hat{θ}}_{k, 1}) = θ_{k}, var ({\hat{θ}}_{k, 1}) = {{\hat{π}}_{k, 1} (1 - {\hat{π}}_{k, 1}) + {\hat{π}}_{0, 1} (1 - {\hat{π}}_{0, 1})} ∕ n_{1}$ and positive correlations. Based on this assumption, Simes (1986) tests are used for each intersection hypothesis; that is, $p_{{k}}^{(1)} = 1 - Φ [{\hat{θ}}_{k, 1} {var ({\hat{θ}}_{k, 1})}^{- 1 ∕ 2}]$ for k ∈ T₁ and, for |I|>1, $p_{I}^{(1)} = \min_{k \in I} p_{{k}}^{(1)} ∣ I ∣ ∕ R (k, I)$ , where R(k, I) denotes the rank of $p_{{k}}^{(1)}$ among ${p_{{i}}^{(1)} : i \in I}$ . The natural way of augmenting these p-values is to define $p_{{k}}^{(1)} (θ^{*}) = 1 - Φ [({\hat{θ}}_{k, 1} - θ_{k}^{*}) {var ({\hat{θ}}_{k, 1})}^{- 1 ∕ 2}]$ for k ∈ T₁ and $p_{I}^{(1)} (θ^{*}) = \min_{k \in I} p_{{k}}^{(1)} (θ^{*}) ∣ I ∣ ∕ R (k, I, θ^{*})$ for |I| > 1, where R(k, I, θ*) denotes the rank of $p_{{k}}^{(1)} (θ^{*})$ among ${p_{{i}}^{(1)} (θ^{*}) : i \in I}$ .

Suppose that the unblinded first-stage results are ${\hat{π}}_{0, 1} = 0 \cdot 21, {\hat{π}}_{A, 1} = 0 \cdot 22$ , ${\hat{π}}_{B, 1} = 0 \cdot 3$ and ${\hat{π}}_{C, 1} = 0 \cdot 36$ . The experimenter decides that treatments A and C are not to be considered in the second stage owing to lack of efficacy and safety concerns, respectively. A further n₂ = 140 patients are recruited to both treatment B and placebo. A family of p-values with $p_{{B}}^{(2)} (θ^{*}) = 1 - Φ [({\hat{θ}}_{B, 2} - θ_{B}^{*}) {var ({\hat{θ}}_{B, 2})}^{- 1 ∕ 2}]$ is chosen, where ${\hat{θ}}_{B, 2} = {\hat{π}}_{B, 2} - {\hat{π}}_{0, 2}$ .

Now suppose that the second-stage results are ${\hat{π}}_{0, 2} = 0 \cdot 19$ and ${\hat{π}}_{B, 2} = 0 \cdot 31$ . The p-values from the elementary hypotheses are $p_{{A}}^{(1)} = 0 \cdot 419$ , $p_{{B}}^{(1)} = 0 \cdot 0412$ , $p_{{C}}^{(1)} = 0 \cdot 00241$ and $p_{{B}}^{(2)} = 0 \cdot 00961$ . Therefore $p_{{A, B, C}}^{(1)} = 3 p_{{C}}^{(1)}$ , $p_{{A, B}}^{(1)} = 2 p_{{B}}^{(1)}$ and $p_{{B, C}}^{(1)} = 2 p_{{C}}^{(1)}$ . As $\min_{I \subseteq} T_{1}, B \in I Q (p_{I}^{(1)}, p_{B}^{(2)}) \leq 0 \cdot 025$ , H_B can be rejected at familywise level 0·025. Both H_A and H_C fail to be rejected, as $Q {p_{{k}}^{(1)}, 1} = 1$ for k = A,C. A compatible 97·5% confidence region for θ is given by

⋃_{I \subseteq T_{1}} {θ^{*} \in Θ_{I} : Q {p_{I}^{(1)} (θ^{*}), p_{B}^{(2)} (θ^{*})} > 0.025},

(9)

where $p_{\emptyset}^{(1)} (θ^{*})$ is defined as $p_{T_{1}}^{(1)} (θ^{*})$ for all θ* ∈ Θ.

The region (9) will have a complicated three-dimensional shape. However, in terms of making inference on θ_B, its crucial features can be seen by taking two cross-sections, as displayed in Fig. 1. As $p_{I}^{(1)} (θ^{*})$ is nondecreasing in $θ_{C}^{*}$ for all I ⊆ T₁, we know that for any γ ∈ (-∞, 0), the cross-section at $θ_{C}^{*} = γ$ is contained in the cross-section at $θ_{C}^{*} = 0$ . Similarly, for any γ ∈ (0, ∞), the cross-section at $θ_{C}^{*} = γ$ is contained in the limit of the cross-section of the region as $θ_{C}^{*} \to \infty$ . One can see immediately from Fig. 1 that for any ϵ > 0, the 97·5% confidence region fails to exclude all parameter vectors θ* such that $θ_{B}^{*} ⩽ ϵ$ . In other words, the lower confidence bound on θ_B provides no more information than the decision of the closed test procedure.

Fig. 1 — Cross-sections of confidence regions of the form (9) for making inference on the second-stage parameter of interest, *θ_B*, in the example of § 3·2: (a) two cross-sections of the 97·5% confidence region; (b) two cross-sections of the 95% confidence region.

For confidence intervals that are compatible with single-stage closed test procedures (Hayter & Hsu, 1994; Strassburger & Bretz, 2008; Guilbaud, 2008), a necessary condition for obtaining informative lower confidence bounds for parameters corresponding to the rejected null hypotheses is that ψ_k =1 for all k ∈ T₁. In the adaptive setting, this is no longer a necessary condition. For example, repeating the above test procedure at level α=0·05, the compatible 95% confidence region analogous to (9) is also summarized in Fig. 1. Here it appears, and indeed can be verified by considering all values of $θ_{A}^{*}$ , that there does exist some ϵ > 0 such that the confidence region excludes all parameter vectors θ* for which $θ_{B}^{*} ⩽ ϵ$ . We will show that for the two-stage adaptive setting, a necessary condition for informative lower confidence bounds on parameters corresponding to the rejected null hypotheses is that ψ_k =1 for all k ∈ T₂. However, as can be seen from Fig. 1, this condition is not sufficient.

3·3. A two-stage, single-step confidence region

Posch et al. (2005) proposed the following 100(1 − α)% confidence region:

C_{2} = {θ^{*} \in Θ : Q {p_{T_{1}}^{(1)} (θ^{*}), p_{T_{2}}^{(2)} (θ^{*})} > α} .

(10)

They note that the resulting confidence intervals are not compatible with the closed test procedure described in § 2·3 (Posch et al., 2005, p. 3702). Nevertheless, the region (10) can be used to generate an alternative multiple test. More generally, any 1 − α confidence set C generates a multiple test for a family of hypotheses $H$ , whereby $H_{k} \in H$ is rejected if and only if H_k ∩ C = ∅. This guarantees strong control of the familywise error rate (1). The multiple test generated by (10) can be thought of as single-step in the sense that rejection or nonrejection of a null hypothesis does not take into account the decision for any other hypothesis. If H_k is rejected, informative lower bounds will be available for θ_k regardless of the test decisions for all other hypotheses.

4. Computation of confidence intervals

4·1. Least-favourable parameter configurations

In the above example, marginal inference on θ_B was achieved by considering least-favourable parameter configurations for θ_k, k ∈ T₁ \ {B}. This idea can be generalized to find 100(1 − α)% simultaneous confidence intervals containing (8) or (10).

Definition 1. For j = 1, 2, k ∈ T₁ and I ⊆ T_j, the locally least-favourable jth-stage p-value function for H_k in Θ_I, $p_{k, I}^{(j)} : ℝ \to [0, 1]$ , is defined for I ≠ ∅ as $p_{k, I}^{(j)} (ϑ) = p_{I}^{(j)} (ξ)$ , where ξ =(ξ₁, … , ξ_K) with ξ_i =δ_i for i ≠ k and ξ_k = ϑ. Additionally, for j = 1, 2,

p_{k, \emptyset}^{(j)} (ϑ) = \lim_{ξ_{i} \to \infty, i \in T_{j} \ {k}} p_{T_{j}}^{(j)} (ξ) (ξ_{k} = ϑ) .

(11)

Proposition 2. The smallest Cartesian product of intervals, ×_k∈T₁(l_k, ∞), that contains the confidence region (8) has l_k = min_I⊆T₁ l_k,I, where for k ∈ I,

l_{k, I} = {\begin{matrix} \infty & (φ_{I} = 1), \\ sup {ϑ : Q {p_{k, I}^{(1)} (ϑ), p_{k, I \cap T_{2}}^{(2)} (ϑ)} ⩽ α} & (φ_{I} = 0), \end{matrix}

(12)

and for k ∉ I,

l_{k, I} = \max (δ_{k}, sup {ϑ : Q {p_{k, I}^{(1)} (ϑ), p_{k, I \cap T_{2}}^{(2)} (ϑ)} ⩽ α}) .

(13)

Furthermore, these intervals are compatible with the two-stage closed test procedure, i.e., ψ_k = 1 if and only if H_k ∩ ×_k∈T₁(l_k, ∞)=∅.

Proof. See the Appendix.

In general, to find each interval requires one-dimensional root finding for each I ⊆ T₁, a calculation that is O(2^K). However, substantial shortcuts are available for reducing the computational burden.

4·2. Efficient computation of confidence bounds

There are two possible scenarios at the end of the closed test procedure: either ψ_k = 1 for all k ∈ T₂, or at least one H_k (k ∈ T₂) fails to be rejected. In the latter case, there exists some I ⊆ T₁ with I ∩ T₂ ≠ ∅ such that for any k ∈ T₂,

α < Q (p_{I}^{(1)}, p_{I \cap T_{2}}^{(2)}) = Q {p_{k, I}^{(1)} (δ_{k}), p_{k, I \cap T_{2}}^{(2)} (δ_{k})}

and therefore l_k ⩽ l_k,I ⩽ δ_k. Due to the compatibility of the intervals with the closed test procedure, if ψ_k = 1, then l_k = δ_k; if ψ_k = 0, then l_k < δ_k.

If ψ_k =1 for all k ∈ T₂, then l_k ⩾ δ_k for all k ∈ T₂. Additionally, we can use the fact that for all k ∈ T₂ and I ⊆ T₁ with I ∩ T₂ ≠ ∅, we know from (12) and (13) that l_k,I = ∞; so, when finding l_k =min_I⊆T₁ l_k,I in Proposition 2, the minimum can be taken over a much smaller number of l_k,I. The following algorithm finds the lower bounds for all parameters corresponding to the rejected hypotheses.

Step 1. Perform the closed test procedure. If ψ_k′ = 0 for some k′ ∈ T₂, then l_k = δk for ψ_k =1 and l_k < δ_k for ψ_k =0. If ψ_k =1 for all k ∈ T₂, go to Step 2.

Step 2. Find $p_{M} = \max_{\emptyset \neq I \subseteq T_{1} \ T_{2}} p_{I}^{(1)}$ . If T₁ \ T₂ = ∅, then p_M =0.

Step 3. For k ∈ T₂,

l_{k} = \max [δ_{k}, sup {ϑ : Q [\max {p_{M}, p_{k, \emptyset}^{(1)} (ϑ)}, p_{k, \emptyset}^{(2)} (ϑ)] ⩽ α}] .

The cost of computing the intervals for θ_k (k ∈ T₂) in Step 3 is linear in the number of parameters. Step 1 is O(2^|T₁|), but a shortcut of O(|T₁|²) is given in Brannath & Bretz (2010). Step 2 is O(2^|T₁\T₂|), but a shortcut of size |T₁ \ T₂| is available, provided there exists an ordering i₁, … , i_k of T₁ \ T₂ such that for each u ∈ {1, … , k}, $p_{J}^{(1)} ⩽ p_{L}^{(1)}$ for all J ⊆ L ⊆ {i_u, … , i_k} with i_u ∈ J. This is because we only have to check $p_{{i_{u}, \dots, i_{k}}}$ for u =1, … , k. Many common multiple test procedures, such as those based on Dunnett (1955) tests or weighted Bonferroni tests, satisfy this condition, with the ordering i₁, … , i_k following the ordering of the univariate test statistics or the weighted elementary p-values (Brannath & Bretz, 2010).

4·3. Lower bounds for parameters corresponding to retained hypotheses

Consider k ∈ T₂ such that ψ_k = 0. We know that l_k < δ_k, and therefore we need only consider l_k,I such that k ∈ I. However, since in general l_k,I < ∞, finding the minimum such lower bound will still have a computational cost that is exponential in the number of parameters.

For k ∈ I ⊆ T₁ \ T₂, we have $p_{k, I \cap T_{2}}^{(2)} (ϑ) = p_{k, \emptyset}^{(2)} (ϑ)$ and know from (11) and (6) that this is equal to 1. Many commonly used combination functions, including (3), have the property that v = 1 implies $Q (u, v) = 1$ . In this case, l_k = −∞ for all k ∈ T₁ \ T₂.

4·4. Lower bounds for the two-stage single-step procedure

Posch et al. (2005) showed that the region (10) is contained in a rectangle, $\times_{k \in T_{1}} ({\overset{‒}{l}}_{k}, \infty)$ , where

{\overset{‒}{l}}_{k} = sup {ϑ : Q {p_{k, \emptyset}^{(1)} (ϑ), p_{k, \emptyset}^{(2)} (ϑ)} ⩽ α} .

(14)

The computation of each interval requires only a one-dimensional search for a root, and overall computation will be linear in the number of parameters.

4·5. Example continued

Recall from § 3·2 that T₂ = {B} and ψ_B = 1. Proceeding to Step 2 of the above algorithm, p_M =0·419. In this case we need just one iteration in Step 3, because

Q [\max {0.419, p_{B, \emptyset}^{(1)} (0)}, p_{B, \emptyset}^{(2)} (0)] = 0.0360 > 0.025,

and therefore the 97·5% confidence interval for θ_B is (0, ∞), consistent with Fig. 1. This example emphasizes that there is a price to pay for the additional power of the closed test as opposed to the single-step procedure of § 3·3 with, by (14),

{\overset{‒}{l}}_{B} = sup {ϑ : Q {p_{B, \emptyset}^{(1)} (ϑ), p_{B, \emptyset}^{(2)} (ϑ)} ⩽ 0.025} = 0.0159 .

While this agrees with the assertion θ_B > 0 in this specific case, it is invalid to claim it as a 97·5% lower confidence bound if the closed test procedure of § 2·3 had been planned. One can see that for any α > 0·036, the 100(1 − α)% confidence interval for treatment B that is compatible with the closed test procedure has a positive lower bound. For example, the 95% lower confidence bound is l_B = 0·0112, consistent with Fig. 1. Again, if the region (10) had been specified pre-trial, the 95% lower confidence bound (14) would have been ${\overset{‒}{l}}_{B} = 0 \cdot 0252$ .

5. Confidence bounds for closed tests based on the conditional error rate

Consider again the two-stage closed test procedure of § 2·3. As an alternative to combination tests, Koenig et al. (2008) used the conditional error approach (Proschan & Hunsberger, 1995) to derive local tests φ_I (I ⊆ T₁). The only difference is that instead of prespecifying a combination function Q and first-stage p-value $p_{I}^{(1)}$ , one must prespecify a measurable conditional error function $A_{I} : ℝ^{n} \to [0, 1]$ such that

sup_{θ^{*} \in H_{I}} \int_{ℝ^{n}} A_{I} (x) d G (x; θ^{*}) ⩽ α

and, at the final analysis, φ_I =1 if and only if $p_{I \cap T_{2}}^{(2)} ⩽ A_{I} (x)$ .

To produce a compatible 100(1 − α)% confidence region for θ, each A_I (I ⊆ T₁) must be augmented with a family of conditional error functions {A_I(θ*) : θ* ∈ Θ} such that $\int_{ℝ^{n}} A_{I} (θ^{*}) (x) d G (x; θ^{*}) ⩽ α$ and, for fixed $x \in ℝ^{n}$ , A_I(θ*) is constant in all arguments $θ_{i}^{*}$ with i ∉ I and is left-continuous and nonincreasing in all arguments $θ_{i}^{*}$ with i ∈ I. Furthermore, A_I (θ*)= A_I for all θ* ∈ Θ such that $θ_{i}^{*} = δ_{i}$ for i ∈ I. The second-stage p-values $p_{I \cap T_{2}}^{(2)} (I \subseteq T_{1})$ must be augmented with a family ${p_{I \cap T_{2}}^{(2)} (θ^{*}) : θ^{*} \in Θ}$ as described in § 3·1.

Müller & Schäfer (2004) propose defining A_I = sup_{θ*∈H_I} E_θ*(ϕ_I | X), where ϕ_I is a pre-planned fixed sample level-α test for H_I. In many situations the natural choice for A_I(θ*) will be obvious from A_I. For example, if ϕ_I is the decision function for a Dunnett (1955) test of H_I = ⋂_{k∈_I}{θ_k ⩽ δ_k}, then it is natural to choose A_I (θ*) = E_θ*(ϕ_I,θ* | X) where ϕ_I,θ* is the decision function for a Dunnett test of $⋂_{k \in I} {θ_{k} ⩽ θ_{k}^{*}}$ which can be derived via a corresponding translation of the test statistics.

Using the arguments of Propositions 1 and 2, it can be shown that, analogously to (8), a compatible 100(1 − α)% confidence region for θ is

⋃_{I \subseteq T_{1}} {θ^{*} \in Θ_{I} : p_{I \cap T_{2}}^{(2)} (θ^{*}) > A_{I} (θ^{*})},

where $p_{\emptyset}^{(2)} (θ^{*})$ and A_∅(θ*) are set equal to $p_{T_{2}}^{(2)} (θ^{*})$ and A_T₁(θ*) respectively. Also, the largest compatible 100(1 − α)% confidence lower bounds are l_k =min_I⊆T₁ l_k,I, where for k ∈ I,

l_{k, I} = {\begin{matrix} \infty & (φ_{I} = 1), \\ sup {ϑ : p_{k, I \cap T_{2}}^{(2)} (ϑ) ⩽ A_{k, I} (ϑ)} & (φ_{I} = 0), \end{matrix}

and for k ∉ I, $l_{k, I} = \max [δ_{k}, sup {ϑ : p_{k, I \cap T_{2}}^{(2)} (ϑ) ⩽ A_{k, I} (ϑ)}]$ with A_k,I(ϑ) defined analogously to $p_{k, I}^{(1)} (ϑ) (k \in T_{1}; I \subseteq T_{1})$ in Definition 1.

6. Concluding remarks

The lower confidence bounds (12)–(13) provide more information about the location of θ than the decisions of the closed test procedure of § 2·3. The utility of this additional information will depend strongly on the context. In practice, the primary concern will often be to find lower bounds for the components of θ corresponding to the rejected null hypotheses. As this can be achieved using an algorithm that is O(K²), application to large-scale simultaneous inference problems is, in principle, feasible. However, these lower bounds will only be informative if all hypotheses considered in the second stage of testing are rejected, and even this may be insufficient. In practice, therefore, the lower bounds (12)–(13) are only likely to be useful in relatively small-scale problems. Furthermore, in situations where informative lower confidence bounds are deemed to be more important than the possibility of rejecting as many individual null hypotheses as possible, it would be sensible to use the intervals (14) instead of applying the closed test procedure. For large-scale simultaneous inference problems, an approach based on controlling the false coverage-statement rate (Benjamini & Yekutieli, 2005) may be more appropriate than aiming for a high simultaneous coverage probability.

Extensions to more than two stages and to allow early rejection of hypotheses are straightforward with an appropriate combination function in place of (3). An open question is how best to choose φ_∅(θ*) and $p_{\emptyset}^{(2)} (θ^{*})$ . The tests we use in region (8) are a natural choice but may not be the most powerful.

Acknowledgement

This work was supported by the National Institute for Health Research and the Austrian Science Fund. The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health.

Appendix

Proof of Proposition 1. With the assumptions in § 3·1, all tests of the form (7) satisfy condition (4), and therefore C is a 100(1 − α)% confidence set for θ. By the monotonicity conditions imposed on the p-values, we have $p_{I \cap T_{j}}^{(j)} (θ^{*}) ⩽ p_{I \cap T_{j}}^{(j)}$ for all θ* ∈ Θ_I (j = 1,2; I ≠ ∅; I ⊆ T₁), so that Θ_I ∩ C = ∅ if and only if $Q (p_{I}^{(1)}, p_{I \cap T_{2}}^{(2)}) ⩽ α$ . Therefore, ψ =1 if and only if min_{I⊆T₁,k∈I} $Q (p_{I}^{(1)}, p_{I \cap T_{2}}^{(2)}) ⩽ α$ if and only if ⋃_{I⊆T₁,k∈I} ΘI ∩ C = ∅. Since ⋃_{I⊆T₁,k∈I} Θ_I = H_k, we have compatibility.

Proof of Proposition 2. First, note the key property that $p_{k, I \cap T_{j}}^{(j)} (ϑ) ⩾ p_{I \cap T_{j}}^{(j)} (θ^{*})$ for all θ* ∈ Θ_I with $θ_{k}^{*} ⩽ ϑ (I \subseteq T_{1}; k \in T_{1}; j = 1, 2)$ .

To show that C₁⊆×_k∈T₁ (l_k, ∞), consider any θ* ∈ Θ \ ×_k∈T₁ (l_k, ∞). We must have θ* ⊆ Θ_I for some I ⊆ T₁ and $θ_{k}^{*} ⩽ l_{k}$ for some k ∈ T₁. If k ∈ I, then $θ_{k}^{*} ⩽ \min (δ_{k}, l_{k, I})$ , and (12) implies that $α ⩾ Q {p_{k, I}^{(1)} (θ_{k}^{*}), p_{k, I \cap T_{2}}^{(2)} (θ_{k}^{*})} ⩾ Q {p_{I}^{(1)} (θ^{*}), p_{I \cap T_{2}}^{(2)} (θ^{*})}$ . The same inequality follows from $l_{k, I} ⩾ θ_{k}^{*} > δ_{k}$ and (13) if k ∉ I. Therefore, θ* ∉ C₁ and C₁ ⊆ ×_k∈T₁ (l_k, ∞).

To show that no smaller interval (l_k + ϵ, ∞) is possible for any ϵ > 0, we must find some θ* ∈ C₁ with $θ_{k}^{*} \in (l_{k}, l_{k} + ϵ)$ . Consider a subset I ⊆ T₁ such that l_k = l_k,I and therefore $Q {p_{k, I}^{(1)} (ϑ), p_{k, I \cap T_{2}}^{(2)} (ϑ)} > α$ for all ϑ > l_k. If k ∈ I or, equivalently, l_k < δ_k, take any $θ_{k}^{*} \in (l_{k}, \min {δ_{k}, l_{k} + ϵ})$ . If k ∉ I or, equivalently, l_k ⩾ δ_k, take any $θ_{k}^{*} \in (l_{k}, l_{k} + ϵ)$ . Now consider a parameter vector $ξ^{I, k} = (ξ_{1}^{I, k}, \dots, ξ_{K}^{I, k})$ , where $ξ_{k}^{I, k} = θ_{k}^{*}$ , $ξ_{i}^{I, k} = δ_{i}$ for k ≠ i ∈ I, and $ξ_{i}^{I, k} > δ_{i}$ for i ∉ I ∪ {k}. All such parameter vectors ξ^I,k are contained in Θ_I, and

α < Q {p_{k, I}^{(1)} (θ_{k}^{*}), p_{k, I \cap T_{2}}^{(2)} (θ_{k}^{*})} = \lim_{ξ_{i}^{I, k} \to \infty, i \notin I \cup {k}} Q {p_{I}^{(1)} (ξ^{I, k}), p_{I \cap T_{2}}^{(2)} (ξ^{I, k})} .

Thus there exists some such ξ^I,k ∈ C₁ and hence C₁ is not contained in this smaller product of intervals.

Finally, H_k ∩×_k∈T₁ (l_k, ∞) = ∅ if and only if l_k,I ⩾ δ_k for I ⊆ T₁. if and only if $Q {p_{k, I}^{(1)} (δ_{k})$ , $p_{k, I \cap T_{2}}^{(2)} (δ_{k})} = Q {p_{I}^{(1)}, p_{I \cap T_{2}}^{(2)}} ⩽ α$ for I ⊆ T₁ and k ∈ I, if and only if ψ_k = 1.

References

Bauer P, Kieser M. Combining different phases in the development of medical treatments within a single trial. Statist. Med. 1999;18:1833–48. doi: 10.1002/(sici)1097-0258(19990730)18:14<1833::aid-sim221>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
Benjamini Y, Yekutieli Y. False discovery rate controlling confidence intervals for selected parameters. J. Am. Statist. Assoc. 2005;100:71–80. [Google Scholar]
Brannath W, Bretz F. Shortcuts for locally consonant closed test procedures. J. Am. Statist. Assoc. 2010;105:660–9. [Google Scholar]
Brannath W, Gutjahr G, Bauer P. Probabilistic foundation of confirmatory adaptive designs. J. Am. Statist. Assoc. 2012;107:824–32. [Google Scholar]
Brannath W, Posch M, Bauer P. Recursive combination tests. J. Am. Statist. Assoc. 2002;97:236–44. [Google Scholar]
Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statist. Med. 2009;28:1181–217. doi: 10.1002/sim.3538. [DOI] [PubMed] [Google Scholar]
Dunnett C. A multiple comparison procedure for comparing several treatments with a control. J. Am. Statist. Assoc. 1955;50:1096–121. [Google Scholar]
Finner H, Strassburger K. The partitioning principle: a powerful tool in multiple decision theory. Ann. Statist. 2002;30:1194–213. [Google Scholar]
Fisher RA. Statistical Methods for Research Workers. 4th ed. Oliver and Boyd; London: 1932. [Google Scholar]
Guilbaud O. Simultaneous confidence regions corresponding to Holm’s stepdown procedure and other closed-testing procedures. Biomet. J. 2008;50:678–92. doi: 10.1002/bimj.200710449. [DOI] [PubMed] [Google Scholar]
Hayter AJ, Hsu JC. On the relationship between stepwise decision procedures and confidence sets. J. Am. Statist. Assoc. 1994;89:128–36. [Google Scholar]
Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biomet. J. 2001;43:581–9. [Google Scholar]
Hsu JC. Multiple Comparisons: Theory and Methods. Chapman and Hall; London: 1996. [Google Scholar]
ICH E9 Expert Working Group Statistical principles for clinical trials: ICH harmonized tripartite guideline. Statist. Med. 1999;18:1905–42. [PubMed] [Google Scholar]
Koenig F, Brannath W, Bretz F, Posch M. Adaptive Dunnett tests for treatment selection. Statist. Med. 2008;27:1612–25. doi: 10.1002/sim.3048. [DOI] [PubMed] [Google Scholar]
Lehmann EL. Testing Statistical Hypotheses. 2nd ed. Wiley; New York: 1986. [Google Scholar]
Marcus R, Peritz E, Gabriel KR. On closed testing procedures with special reference to ordered analysis of variance. Biometrika. 1976;63:655–60. [Google Scholar]
Müller HH, Schäfer H. A general statistical principle for changing a design any time during the course of a trial. Statist. Med. 2004;23:2497–508. doi: 10.1002/sim.1852. [DOI] [PubMed] [Google Scholar]
Posch M, Koenig F, Branson M, Brannath W, Dunger-Baldauf C, Bauer P. Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statist. Med. 2005;24:3697–714. doi: 10.1002/sim.2389. [DOI] [PubMed] [Google Scholar]
Proschan M, Hunsberger S. Designed extension of studies based on conditional power. Biometrics. 1995;51:1315–24. [PubMed] [Google Scholar]
Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]
Stefansson G, Kim W, Hsu J. On confidence sets in multiple comparisons. In: Gupta SS, Berger JO, editors. Statistical Decision Theory and Related Topics IV. Springer; New York: 1988. pp. 89–104. [Google Scholar]
Strassburger K, Bretz F. Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Statist. Med. 2008;27:4914–27. doi: 10.1002/sim.3338. [DOI] [PubMed] [Google Scholar]

[R1] Bauer P, Kieser M. Combining different phases in the development of medical treatments within a single trial. Statist. Med. 1999;18:1833–48. doi: 10.1002/(sici)1097-0258(19990730)18:14<1833::aid-sim221>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R2] Benjamini Y, Yekutieli Y. False discovery rate controlling confidence intervals for selected parameters. J. Am. Statist. Assoc. 2005;100:71–80. [Google Scholar]

[R3] Brannath W, Bretz F. Shortcuts for locally consonant closed test procedures. J. Am. Statist. Assoc. 2010;105:660–9. [Google Scholar]

[R4] Brannath W, Gutjahr G, Bauer P. Probabilistic foundation of confirmatory adaptive designs. J. Am. Statist. Assoc. 2012;107:824–32. [Google Scholar]

[R5] Brannath W, Posch M, Bauer P. Recursive combination tests. J. Am. Statist. Assoc. 2002;97:236–44. [Google Scholar]

[R6] Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statist. Med. 2009;28:1181–217. doi: 10.1002/sim.3538. [DOI] [PubMed] [Google Scholar]

[R7] Dunnett C. A multiple comparison procedure for comparing several treatments with a control. J. Am. Statist. Assoc. 1955;50:1096–121. [Google Scholar]

[R8] Finner H, Strassburger K. The partitioning principle: a powerful tool in multiple decision theory. Ann. Statist. 2002;30:1194–213. [Google Scholar]

[R9] Fisher RA. Statistical Methods for Research Workers. 4th ed. Oliver and Boyd; London: 1932. [Google Scholar]

[R10] Guilbaud O. Simultaneous confidence regions corresponding to Holm’s stepdown procedure and other closed-testing procedures. Biomet. J. 2008;50:678–92. doi: 10.1002/bimj.200710449. [DOI] [PubMed] [Google Scholar]

[R11] Hayter AJ, Hsu JC. On the relationship between stepwise decision procedures and confidence sets. J. Am. Statist. Assoc. 1994;89:128–36. [Google Scholar]

[R12] Hommel G. Adaptive modifications of hypotheses after an interim analysis. Biomet. J. 2001;43:581–9. [Google Scholar]

[R13] Hsu JC. Multiple Comparisons: Theory and Methods. Chapman and Hall; London: 1996. [Google Scholar]

[R14] ICH E9 Expert Working Group Statistical principles for clinical trials: ICH harmonized tripartite guideline. Statist. Med. 1999;18:1905–42. [PubMed] [Google Scholar]

[R15] Koenig F, Brannath W, Bretz F, Posch M. Adaptive Dunnett tests for treatment selection. Statist. Med. 2008;27:1612–25. doi: 10.1002/sim.3048. [DOI] [PubMed] [Google Scholar]

[R16] Lehmann EL. Testing Statistical Hypotheses. 2nd ed. Wiley; New York: 1986. [Google Scholar]

[R17] Marcus R, Peritz E, Gabriel KR. On closed testing procedures with special reference to ordered analysis of variance. Biometrika. 1976;63:655–60. [Google Scholar]

[R18] Müller HH, Schäfer H. A general statistical principle for changing a design any time during the course of a trial. Statist. Med. 2004;23:2497–508. doi: 10.1002/sim.1852. [DOI] [PubMed] [Google Scholar]

[R19] Posch M, Koenig F, Branson M, Brannath W, Dunger-Baldauf C, Bauer P. Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statist. Med. 2005;24:3697–714. doi: 10.1002/sim.2389. [DOI] [PubMed] [Google Scholar]

[R20] Proschan M, Hunsberger S. Designed extension of studies based on conditional power. Biometrics. 1995;51:1315–24. [PubMed] [Google Scholar]

[R21] Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–4. [Google Scholar]

[R22] Stefansson G, Kim W, Hsu J. On confidence sets in multiple comparisons. In: Gupta SS, Berger JO, editors. Statistical Decision Theory and Related Topics IV. Springer; New York: 1988. pp. 89–104. [Google Scholar]

[R23] Strassburger K, Bretz F. Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni-based closed tests. Statist. Med. 2008;27:4914–27. doi: 10.1002/sim.3338. [DOI] [PubMed] [Google Scholar]

PERMALINK

Simultaneous confidence intervals that are compatible with closed testing in adaptive designs

D MAGIRR

T JAKI

M POSCH

F KLINGLMUELLER

Summary

1. Introduction

2. Fundamental Methodology

2·1. Closure principle

2·2. Combination test

2·3. Two-stage closed test procedure

3. Confidence regions

3·1. Partitioning the parameter space

3·2. Example

Fig. 1.

3·3. A two-stage, single-step confidence region

4. Computation of confidence intervals

4·1. Least-favourable parameter configurations

4·2. Efficient computation of confidence bounds

4·3. Lower bounds for parameters corresponding to retained hypotheses

4·4. Lower bounds for the two-stage single-step procedure

4·5. Example continued

5. Confidence bounds for closed tests based on the conditional error rate

6. Concluding remarks

Acknowledgement

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Simultaneous confidence intervals that are compatible with closed testing in adaptive designs

D MAGIRR

T JAKI

M POSCH

F KLINGLMUELLER

Summary

1. Introduction

2. Fundamental Methodology

2·1. Closure principle

2·2. Combination test

2·3. Two-stage closed test procedure

3. Confidence regions

3·1. Partitioning the parameter space

3·2. Example

Fig. 1.

3·3. A two-stage, single-step confidence region

4. Computation of confidence intervals

4·1. Least-favourable parameter configurations

4·2. Efficient computation of confidence bounds

4·3. Lower bounds for parameters corresponding to retained hypotheses

4·4. Lower bounds for the two-stage single-step procedure

4·5. Example continued

5. Confidence bounds for closed tests based on the conditional error rate

6. Concluding remarks

Acknowledgement

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases