Exact confidence limits for the probability of response in two-stage designs

Guogen Shan

doi:10.1080/02331888.2018.1469023

. Author manuscript; available in PMC: 2019 May 8.

Published in final edited form as: Statistics (Ber). 2018 May 8;52(5):1086–1095. doi: 10.1080/02331888.2018.1469023

Exact confidence limits for the probability of response in two-stage designs

Guogen Shan ¹

PMCID: PMC6426334 NIHMSID: NIHMS1504616 PMID: 30906095

Abstract

In addition to point estimate for the probability of response in a two-stage design (e.g., Simon’s two-stage design for a Phase II clinical trial with binary endpoints), confidence limits should be Cute the confidence interval does not guarantee coverage probability in a two-stage setting. The existing exact approach to calculate one-sided limits is based on the overall number of responses to order the sample space. This approach could be conservative because many sample points have the same limits. We propose a new exact one-sided interval based on p-value for the sample space ordering. Exact intervals are computed by using binomial distributions directly, instead of a normal approximation. Both exact intervals preserve the nominal confidence level. The proposed exact interval based on the p-value generally performs better than the other exact interval with regard to expected length and simple average length of confidence intervals. Therefore, the new interval calculation based on p-value is recommended for use in practice.

Keywords: Clinical Trials, Confidence interval, Coverage Probability, Simon’s two-stage design

1. Introduction

For a phase II clinical trial with the binary endpoint as the primary outcome (e.g., response vs non-response to a drug), Simon’s optimal or minimax two-stage designs [1] have been widely used to assess the activity of a new treatment. With this design, a study can be stopped earlier in the first stage for futility when a very few number of responses are observed in that stage. It should be noted that Simon’s two-stage designs only allow early stopping due to futility in the first stage. Later, Mander and Thompson [2] proposed a modified Simon’s design that allows a study to be stopped in the first stage for either futility or efficacy. An additional critical response value was introduced for the first stage termination due to efficacy, and they developed a statistical software package to compute these designs [2]. Recently, other modified Simon’s designs, such as adaptive two-stage designs [3–7], have been developed to improve the flexibility and efficiency of a trial by allowing the second stage sample size to be changed as the observed first stage responses.

When a new experimental treatment under investigation in a study has a very few number of responses, the treatment is not considered promising. To test the activity of a new treatment, the hypotheses are presented as

H_{0} : π \leq π_{0},

against the alternative

H_{a} : π > π_{0},

but it is powered at π₁, where π₀ and π₁ are the unacceptable and acceptable probabilities of response, respectively. Upon completion of a study, the probability of response is often reported. In addition to point estimation, a confidence interval should also be calculated and presented. For a one-sided hypothesis, a one-sided confidence interval is statistically desirable to make sure that the hypothesis testing and the confidence interval construction are consistent with each other [8]. For a study to be tested at the significance level of α, a 1 − α one-sided interval, (L,1] should be computed for statistical inference, where L is the 1 − α lower limit.

Traditionally, a one-sided interval can be computed by inverting the rejection probability function at every possible π value. Recently, Zeng et al. [9] conducted a simulation study and demonstrated that this confidence interval does not guarantee the coverage probability. Thus, this confidence interval is not exact [10,11]. In addition, the simulated confidence intervals were shown to be conservative at the sample points’ values.

To respect the coverage property, exact one-sided confidence intervals by using the Clopper-Pearson approach [12] were developed in a two-stage or multi-stage design setting by Jennison and Turnbull [13], Duffy and Santner [14], and Jung and Kim [15]. In a multi-stage design, a study can be stopped earlier before the final stage when the futility threshold or the efficacy threshold is met, and a reasonable sample ordering is needed to define the tail area in order to compute p-value and the associated confidence interval [13,14,16]. The sample space ordering should consider the nature of the multistage design. One commonly used sample space ordering in a two-stage design is based on the total number of responses, and this ordering is the same as that based on the unbiased estimate proposed by Girshick, Mosteller and Savage [17]. Later, Jung and Kim [15] derived the uniformly minimum variance unbiased estimator (UMVUE) for the probability of response in a two-stage design setting. The sample space ordering based on the UMVUE is the same as the ordering by the total number of responses [18]. This sample space ordering may be associated with a significant number of ties. To have a finer ordering, we propose usingv the p-value to order the sample space and develop a new exact one-sided interval. Both exact intervals guarantee the nominal coverage.

In Section 2, we first introduce the construction of exact one-sided lower limits, and then conduct extensive numerical studies to compare the performance of the two new exact intervals with regards to expected length and simple average length. Exact intervals are computed by using the exact discrete probability distribution and respect the nominal coverage probability. The results indicate that the exact interval based on p-value has a shorter expected length for the majority of the samples and is always associated with a shorter average length, as compared to the exact interval based on the total number of responses. The construction of the exact upper limit can be readily obtained by using the approach proposed in this section. In Section 3, we first use an example to compare the performance of the two exact intervals and the traditional interval by the inverting approach, then we compare the two exact intervals, looking at expected length and simple average length of intervals. Finally, we provide some comments in Section 4.

2. Methods

In Simon’s two-stage design, for given type I and II error rates, π₀ and π₁ as the response rates in the hypothesis testing, four design parameters (r₁,r_t,n₁,n₂) need to be determined, where r₁ and r_t are the critical values for the number of responses in the first stage and both stages combined, and n₁ and n₂ are the sample sizes for the first stage and the second stage, respectively. A study will be terminated early if the first stage number of responses, X₁, is less than or equal to r₁. In cases where X₁ > r₁, the trial goes to the second stage and additional n₂ participants will be enrolled in the study. The second stage number of responses, X₂, is added to X₁ to compute the total number of responses, X_t = X₁ + X₂. A final decision is made by comparing X_t to r_t. The null hypothesis is rejected at the end of a study if X₁ > r₁ and X_t > r_t. Let (x₁, x₂) be the observed numbers of responses. The rejection region is presented as

Ω (x_{1}, x_{2}) = {(X_{1}, X_{2}) : X_{1} \geq x_{1}, X_{1} + X_{2} \geq x_{1} + x_{2}, 0 \leq X_{1} \leq n_{1}, 0 \leq X_{2} \leq n_{2}}

and the p-value is then calculated as

P (x_{1}, x_{2} | π_{0}) = P (Ω (x_{1}, x_{2}) | π_{0}) .

(1)

With a one-sided hypothesis, a one-sided confidence interval for the probability of response in the form (L(x₁,x₂),1] is suitable to make proper statistical inference of a two-stage design with x₁ and x₂ responders observed out of n₁ and n₂ participants from the first stage and the second stage. The lower limit is traditionally computed by inverting the rejection probability function (referred to as the I approach). The one-side interval (L_I(x₁,x₂),1] for π based on the I approach is given as

{π : P (Ω_{1} (x_{1}, x_{2}) | π) > α},

(2)

where Ω_I(x₁,x₂) = Ω(x₁,x₂).

In Simon’s two-stage design with design parameters (r₁,r_t,n₁,n₂), the associated sample space is presented as

Ω = {0, 1, \dots, r_{1}, (r_{1} + 1, 0), (r_{1} + 1, 1), \dots, (r_{1} + 1, n_{2}), \dots, (n_{1}, 0), (n_{1}, 1), \dots, (n_{1}, n_{2})} .

The study is terminated early in the first stage when X₁ ≤ r₁, thus, the first r₁ + 1 points in the sample space are the possible number of responses in the first stage from 0 to r₁. We use set A = {0,1,··· ,r₁} to represent this set. When X₁ > r₁, the study proceeds to the second stage with an additional n₂ participants. For each X₁ from r₁ + 1 to n₁, the value of X₂ ranges from 0 to n₂. This set is referred to as set G = {(X₁,X₂) : X₁ > r₁,0 ≤ X₂ ≤ n₂}. The size of the sample space Ω is (r₁ +1)+(n₁ −r₁)(n₂ +1). For each sample point (X₁,X₂) in set G, the total number of responses is (X₁ + X₂), and all these sample points in set G can be ordered by the total number of responses. For sample points in set A where the study is stopped in the first stage, they can be ordered by their responses. The ordering of the lower limits in the proposed approach based on the number of responses is defined as

L (X_{1}, X_{2}) \geq L ({X^{'}}_{1}, {X^{'}}_{2}) when X_{1} + X_{2} \geq {X^{'}}_{1} + {X^{'}}_{2} .

This approach is referred to as the R approach. The lower limit based on the R approach is computed as

{π : P (Ω_{R} (x_{1}, x_{2}) | π) > α},

(3)

where Ω_R(x₁,x₂) = {(X₁,X₂) : X₁ + X₂ ≥ x₁ + x₂}.

For any sample point $(x_{1}^{*}, x_{2}^{*})$ in set G, and any sample point $x_{1}^{@}$ in set A, since $x_{1}^{*} > x_{1}^{@}$ , it follows that $x_{1}^{*} + x_{2}^{*} > x_{1}^{@}$ , and $L (x_{1}^{*}, x_{2}^{*}) > L (x_{1}^{@}, 0)$ . Therefore, the lower limits for sample points in set G are always larger than those for sample points in set A. The lower limit for sample points in set A can also be calculated from Equation (3), resulting in the exact lower Clopper-Pearson limit [12] for one binomial proportion.

This stochastic ordering based on the total number of responses was proposed by Jennison and Turnbull [13], Duffy and Santner [14], Jung and Kim [15], and Jovic and Whitehead [19]. This ordering is the same as that based on the naive point estimate for the probability of response (x₁ + x₂)/(n₁ + n₂) which was shown to be a biased estimate [15]. Recently, Jung [18] reviewed the statistical issues in analyzing data from multi-stage designs with an emphasis on two-stage designs. He pointed out that the stochastic ordering based on the UMUVE [15] is identical to that based on the overall number of responses in the R approach.

When two sample points have the same total number of responses, their lower limits using the R approach are the same. To create a finer sample space ordering, we propose using p-value to order the sample space, and thus refer to it as the PV approach. For each sample point (x₁,x₂), its rejection probability function is calculated as P(x₁,x₂|π₀) as in Equation (1). A sample point with a small p-value, is more likely to result in rejection of the null hypothesis. Thus, it should have a large lower confidence limit. Therefore, the sample space can be sorted by p-value from the smallest to the largest as a new sample space ordering. The exact one-sided lower interval based on the PV approach is computed as

{π : P (Ω_{P V} (x_{1}, x_{2}) | π) > α},

(4)

where Ω_PV(x₁,x₂) = {(X₁,X₂) : P(X₁,X₂|π₀) ≤ P(x₁,x₂|π₀)}.

For any sample point $(x_{1}^{*}, x_{2}^{*})$ in set G and any sample point $x_{1}^{@}$ in set A, it is easy to show that

\begin{array}{l} P (x_{1}^{@}) \\ = P (Ω (x_{1}^{@}, x_{2} = 0) | Ω (x_{1}^{@}, x_{2} = 0) = {(X_{1}, X_{2}) : X_{1} \geq x_{1}^{@}, X_{1} + X_{2} \geq x_{1}^{@}}) \\ > P (Ω (x_{1}^{*}, x_{2}^{*}) | Ω (x_{1}^{*}, x_{2}^{*}) = {(X_{1}, X_{2}) : X_{1} \geq x_{1}^{*}, X_{1} + X_{2} \geq x_{1}^{*} + x_{2}^{*}, x_{1}^{*} > x_{1}^{@}, 0 \leq x_{2}^{*} \leq n_{2}}) \\ = P (x_{1}^{*}, x_{2}^{*}), \end{array}

where the inequation is satisfied because $x_{1}^{*}$ from the sample point in set G is always larger than $x_{1}^{@}$ in set A. We then have $P (x_{1}^{*}, x_{2}^{*} | π_{0}) < P (x_{1}^{@} | π_{0})$ . According to the definition of Ω_PV (x₁,x₂), we have $Ω_{P V} (x_{1}^{*}, x_{2}^{*}) \subset Ω_{P V} (x_{1}^{@}, x_{2} = 0)$ , and $P (Ω_{P V} (x_{1}^{*}, x_{2}^{*}) | π) < P (Ω_{P V} (x_{1}^{@}, x_{2} = 0) | π)$ for any π. Therefore, the lower limit for sample points in set A is always smaller than that in set G. For sample points in set A, their exact lower limit is again the Clopper-Pearson exact lower limit [12] for one binomial proportion.

Coverage probability is defined as

P_{π} (π \in (L (X_{1}, X_{2}), 1]) = P ((X_{1}, X_{2}) : L (X_{1}, X_{2}) < π | π) .

(5)

A confidence interval is desirable to respect the nominal confidence level: $P_{π} (π \in (L (X_{1}, X_{2}), 1]) \geq 1 - α$ is true for any π ∈ [0,1].

Theorem 2.1. The two proposed intervals based on the R approach and the PV approach preserve the nominal confidence level.

Proof. From the definition of the lower limits based on the two proposed approaches in Equation (3) and Equation (4), coverage probabilities for the two proposed intervals are guaranteed. The coverage probability is exactly 1−α at realized π values, and these two intervals are conservative at other π values.

We present the following theorem to show the relationship between the two exact limits and the limit based on the I approach.

Theorem 2.2. The two proposed exact lower limits are always less than or equal to that based on the I approach.

Proof of this theorem is given in Appendix A. This theorem shows that the exact lower limits, L_R(x₁,x₂) and L_PV (x₁,x₂), are always less than or equal to L_I(x₁,x₂). We further compare the performance of the exact intervals based on the R approach and the PV approach in the next section.

3. Results

We first use a two-stage design to illustrate the application of three approaches to construct lower limits, then compare the proposed limits with regards to expected length (EL) and simple average length (AL) of the intervals from a range of two-stage designs.

3.1. An example

Suppose a study is designed by using Simon’s two-stage minimax design with π₀ = 0.1,π₁ = 0.2, to attain 80% power at the significance level of 0.05. The minimax twostage design parameters are (r₁,r_t,n₁,n₂) = (4,12,45,33), and the trial is stopped for futility at the first stage if 4 or fewer responses are observed out of n₁ = 45 participants in that stage. As an example, if the observed data is (x₁,x₂) = (8,5), the 95% one-sided lower interval based on the I approach, the R approach, and the PV approach are (0.110,1], (0.102,1], and (0.103,1], and they all reject the null hypothesis as π₀ = 0.1 is not included in these intervals. If the observed data is (x₁,x₂) = (8,4) with one less response observed from the second stage, the 95% one-sided lower intervals are (0.103,1], (0.092,1], and (0.096,1] for the I approach, the R approach, and the PV approach, respectively. The two exact approaches fail to reject the null hypothesis, while the I approach results in the rejection the null hypothesis since the lower limit is greater than π₀. Moreover, the total number of responses in this case is 12, which falls in the non-rejection region, therefore, the traditional interval based on the I approach is anti-conservative for this case.

In addition to the comparison between three confidence intervals for two particular sample points, we also compare the coverage probability for all possible π values from 0 to 1. The coverage probability for each interval can be computed by using Equation (5). Figure 1 displays the coverage probability for the three intervals for the minimax two-stage design (r₁,r_t,n₁,n₂) = (4,12,45,33). It can be seen that the two proposed approaches, the R approach and the PV approach, guarantee the coverage probability, while the traditionally used I approach does not. In fact, with the I approach, the coverage probability for the majority of sample points is lower than the nominal coverage. The minimum coverage for the I approach can be as low as 91.4% which is much less than the nominal level of 95%. It should be noted that all three approaches have the same lower limits for sample points in set A, and their lower limits are smaller than those in set G. For that reason, the coverage probability plots near the left corner are the same for the three approaches.

Figure 1. — Coverage probability comparison among the three intervals for the minimax Simon two-stage design (r₁,r_t,n₁,n₂) = (4,12,45,33) given π₀ = 0.1, π₁ = 0.2, α = 0.05, and β = 0.2.

3.2. Numerical study

The I approach is not going to be included in the numerical comparison due to the unsatisfactory coverage property, and the two exact approaches (the R approach and the PV approach) will be compared with regards to EL and AL of the intervals. As a general rule, an interval with the shortest length among all exact intervals is preferable. The EL is often used to compare confidence interval performance [20,21], and it is calculated as

E L (π) = \sum [1 - L (X_{1}, X_{2})] b (X_{1}, n_{1}, π) b (X_{2}, n_{2}, π),

where b(.) represents the probability density function of a binomial distribution. The length for each sample point is weighted by its probability density function at π, and adding these weighted lengths together is the EL at π. It is obvious that the EL is a function of π, where π ∈ [0,1]. The exact intervals are evaluated over 2000 uniformly distributed π values between 0 and 1. For each π, we first compute EL_R(π) and EL_PV (π) based on the R approach and the PV approach, respectively. Subsequently, their ratio is calculated as Ratio(π) = EL_PV (π)/EL_R(π). For the 2000 ratio values, we compute the proportion of {π : Ratio(π) < 1} as: I(Ratio(π) < 1)/2000, where I(K) is an indicator function with I(K) = 1 if K is true and 0 otherwise. This is the proportion where the PV approach has a shorter expected length than the R approach. The higher the proportion is, the better the performance of the PV approach.

We compare the R approach and the PV approach under various design parameters for the optimal design in Table 1 and the minimax design in Table 2. The designs are calculated to insure 80% or 90% power, to detect π₁ − π₀ = 0.2 when π₀ = 0.05, 0.1, 0.2, 0.3, ··· , and 0.7 at α = 0.05. For the 16 total cases in Table 1, the proportion of {π : Ratio(π) < 1} is at least 77%, and in 10 out of the 16 total cases, the proportion is over 99%. We present some typical plots for the ratio when π₀ = 0.4, 0.5, 0.6, and 0.7 given α = 0.05 and β = 0.2 in Figure 2. Their associated proportions are 99%, 92%, 83%, and 77%. In the case of π₀ = 0.4, the PV approach always has a shorter expected length than the R approach at all π values from 0 to 1. The performance of the R approach increases as π₀ goes up. The R approach has better performance than the PV approach in 23% of the π values in the case of π₀ = 0.7. These plots have a similar pattern: the R approach has better performance than the PV approach at the values that are close to π₀. For the 16 minimax designs in Table 2, the minimum proportion is 92% which occurs in only one case. For all the remaining 15 cases, the proportion is over 99%, demonstrating better performance of the PV approach.

Table 1.

Average length compariSon between the R approach and the PV approach for Simon’s optimal two-stage designs at α = 0.05 when π₁ — π₀ = 0.2.

π₀	π₁	Power	AL_R	AL_PV
0.05	0.25	0.8	0.670	0.641
		0.9	0.628	0.604
0.1	0.3	0.8	0.615	0.582
		0.9	0.596	0.564
0.2	0.4	0.8	0.571	0.527
		0.9	0.560	0.516
0.3	0.5	0.8	0.542	0.486
		0.9	0.522	0.465
0.4	0.6	0.8	0.510	0.443
		0.9	0.489	0.420
0.5	0.7	0.8	0.476	0.406
		0.9	0.450	0.375
0.6	0.8	0.8	0.438	0.375
		0.9	0.416	0.342
0.7	0.9	0.8	0.453	0.408
		0.9	0.366	0.302

Open in a new tab

Table 2.

Average length comparison between the R approach and the PV approach for Simon’s minimax two-stage designs at α = 0.05 when π₁ — π₀ = 0.2.

π₀	π₁	Power	AL_R	AL_PV
0.05	0.25	0.8	0.672	0.646
		0.9	0.642	0.617
0.1	0.3	0.8	0.625	0.595
		0.9	0.598	0.570
0.2	0.4	0.8	0.569	0.527
		0.9	0.556	0.515
0.3	0.5	0.8	0.539	0.485
		0.9	0.534	0.479
0.4	0.6	0.8	0.401	0.369
		0.9	0.485	0.421
0.5	0.7	0.8	0.451	0.384
		0.9	0.452	0.377
0.6	0.8	0.8	0.446	0.375
		0.9	0.424	0.346
0.7	0.9	0.8	0.250	0.210
		0.9	0.363	0.295

Open in a new tab

Figure 2. — Expected length ratio between the PV approach and the R approach, EL_PV */EL*_R for the optimal designs to attain 80% power at α = 0.05 when π₀ =0.4, 0.5, 0.6, and 0.7, and π₁=π₀+0.2. The PV approach has a shorter expected length than the R approach when the ratio is less than 1.

The other measurement to compare confidence interval methods is the simple average length which assigns an equal weight to all sample points in its calculation. The AL is defined as

A L = \frac{\sum_{(X_{1}, X_{2}) \in G} [1 - L (X_{1}, X_{2})]}{N},

where N is the number of sample points (X₁,X₂) in set G. As aforementioned, the confidence interval is the same for both approaches for any sample point in set A in Simon’s design. Therefore, these sample points are excluded in our calculations of interval lengths. The optimal design in Table 1 and the minimax design in Table 2 are also used in the AL comparison between the two exact approaches. As can be seen from Table 1 for the optimal designs, the exact interval based on the PV approach always has a shorter average length than that based on the R approach, and the length saving for the PV approach, (AL_R − AL_PV )/AL_R, ranges from 4% to 22%. Similar results are observed for the comparison between the two exact approaches for designs obtained under the minimax criteria in Table 2. The PV approach performs better than the R approach with regards to simple average length.

4. Discussion

We propose a new exact one-sided confidence limit for the probability of response based on p-value ordering of the sample space in a two-stage design. We compare the proposed PV approach with the existing I approach that computes the confidence interval by inverting the p-value function, and the R approach that uses the total number of responses for the stochastic ordering. The I approach does not guarantee the coverage probability as seen in Figure 1. Therefore, it is not recommended for use. Under the simple AL criteria, the PV approach performs better than the R approach since the PV approach always has a shorter average length. Under the expected length criteria, while the R approach performs slightly better than the PV approach when the probability of response is close to π₀, the PV approach generally performs better in other cases.

Chen and Shan [22] proposed an optimal three-stage design that allows a study to be stopped early for futility or efficacy during the first two stages. However, the ordering of the sample space becomes more complicated in a three-stage design setting. For example, it is not easy to order two sample points where one is stopped in the first stage; the other is stopped in the second; but they have a similar probability of response. One possible solution to sort sample points is to adopt an inductive order as proposed by Wang [23]. It should be noted that this approach could be very computationally intensive in a multi-stage design setting.

When an adaptive two-stage method is used for a clinical trial design [3,5,6,24,25], the second stage sample size depends on the number of responses observed from the first stage. At the end of the study, the number of responses (X₁,X₂) and sample sizes (n₁,n₂) are recorded, and the associated exact confidence limits for the probability of response for this study can be computed by using the proposed approach. We consider this as future work to develop exact one-sided intervals for adaptive two-stage designs.

The two-stage design discussed in the article only allows early stopping due to futility in the first stage, and the hypotheses are one-sided. For these reasons, the lower limit can be used for statistical inference. In the case that a two-stage design allows an early stopping due to either futility or efficacy (e.g., sequential designs [14,16,26]), one may need to compute both the lower limit and the upper limit. The upper limit can be computed by using a similar approach presented in this article for the lower limit. It should be noted that the two-sided interval could be conservative when both the lower and the upper intervals guarantee their nominal confidence levels.

Acknowledgment

The author is very grateful to the Editor, the Associate Editor and two reviewers for their insightful comments that improve the manuscript.

Funding

Shan’s research is partially supported by grants from the National Institute of General Medical Sciences from the National Institutes of Health: P20GM109025.

Appendix A: Proof of Theorem 2.2

Proof. For any sample point (x₁,x₂) in set G, it is easy to show that Ω_R(x₁,x₂) equals to {(X₁,X₂) : X₁+X₂ ≥ x₁+x₂} ⊃ {(X₁,X₂) : X₁ ≥ x₁,X₁+X₂ ≥ x₁+x₂} = Ω_I(x₁,x₂).

Thus, we have L_R(x₁,x₂) < L_I(x₁,x₂). For any sample point in set A, the R approach and the I approach yield the same lower limits. Thus, L_R(x₁,x₂) ≤ L_I(x₁,x₂) is true for any sample point in the sample space.

Now we are going to prove the relationship between L_PV (x₁,x₂) and L_I(x₁,x₂). For any sample point $({x^{'}}_{1}, {x^{'}}_{2})$ in the rejection region based on the I approach Ω_I(x₁,x₂), we have

x_{1}^{'} \geq x_{1}, x_{1}^{'} + x_{2}^{'} \geq x_{1} + x_{2} .

It follows that

Ω_{I} (x_{1}^{'}, x_{2}^{'}) \subseteq Ω_{I} (x_{1}, x_{2}),

Then, by the definition of the p-value from Equation (1), we have

P (x_{1}^{'}, x_{2}^{'} | π_{0}) \leq P (x_{1}, x_{2} | π_{0}),

and

(x_{1}^{'}, x_{2}^{'}) \in Ω_{P V} (x_{1}, x_{2}) .

Therefore, the rejection region based on the I approach is a subset of that based on the PV approach,

Ω_{I} (x_{1}, x_{2}) \subseteq Ω_{P V} (x_{1}, x_{2}) .

Similar to the first part of this proof, it is easy to show that L_PV (x₁,x₂) ≤ L_I(x₁,x₂) is true for any sample point in the sample space. □

References

[1].Simon R Optimal two-stage designs for phase II clinical trials. Controlled clinical trials. 1989. Mar;10(1):1–10. Available from: http://view.ncbi.nlm.nih.gov/pubmed/2702835. [DOI] [PubMed] [Google Scholar]
[2].Mander AP, Thompson SG. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials. Contemporary clinical trials. 2010. November;31(6):572–578. Available from: http://view.ncbi.nlm.nih.gov/pubmed/20678585. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in medicine. 2006. Oct;25(19):3382–3395. Available from: http://view.ncbi.nlm.nih.gov/pubmed/16479547. [DOI] [PubMed] [Google Scholar]
[4].Shan G, Ma C, Hutson AD, et al. Randomized Two-Stage Phase II Clinical Trial Designs Based on Barnard’s Exact Test. Journal of Biopharmaceutical Statistics. 2013. Aug; 23(5):1081–1090. Available from: 10.1080/10543406.2013.813525. [DOI] [PubMed] [Google Scholar]
[5].Shan G, Wilding GE, Hutson AD, et al. Optimal adaptive two-stage designs for early phase II clinical trials. Statistics in Medicine. 2016. Apr;35(8):1257–1266. Available from: 10.1002/sim.6794. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Shan G, Wilding GE, Hutson AD. Computationally Intensive Two-Stage Designs for Clinical Trials. 2017;:1–7 Available from: 10.1002/9781118445112.stat07986. [DOI] [Google Scholar]
[7].Shan G, Chen JJ. Optimal inference for Simon’s two-stage design with over or under enrollment at the second stage. Communications in Statistics - Simulation and Computation. 2017. Mar;:1–11 Available from: 10.1080/03610918.2017.1307398. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Koyama T, Chen H. Proper inference from Simon’s two-stage designs. Statistics in medicine. 2008. Jul;27(16):3145–3154. Available from: http://view.ncbi.nlm.nih.gov/pubmed/17960777. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Zeng D, Gao F, Hu K, et al. Hypothesis testing for two-stage designs with over or under enrollment. Statist Med. 2015. Jul;34(16):2417–2426. Available from: 10.1002/sim.6490. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Wang W, Shan G. Exact confidence intervals for the relative risk and the odds ratio. Biometrics. 2015. December;71(4):985–995. Available from: 10.1111/biom.12360. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Shan G, Wang W. ExactCIdiff: An R Package for Computing Exact Confidence Intervals for the Difference of Two Proportions. The R Journal. 2013;5(2):62–71. [Google Scholar]
[12].Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934. December;26(4):404–413. Available from: 10.1093/biomet/26.4.404. [DOI] [Google Scholar]
[13].Jennison C, Turnbull BW. Confidence Intervals for a Binomial Parameter Following a Multistage Test With Application to MIL STD 105D and Medical Trials. Technometrics. 1983. Feb;25(1):49–58. Available from: 10.1080/00401706.1983.10487819. [DOI] [Google Scholar]
[14].Duffy DE, Santner TJ. Confidence Intervals for a Binomial Parameter Based on Multistage Tests. Biometrics. 1987;43(1):81–93. [Google Scholar]
[15].Jung SHH, Kim KMM. On the estimation of the binomial probability in multistage clinical trials0. Statistics in medicine. 2004. Mar;23(6):881–896. Available from: 10.1002/sim.1653. [DOI] [PubMed] [Google Scholar]
[16].Jennison C, Turnbull BW. Group Sequential Methods (Chapman & Hall/CRC Interdisciplinary Statistics). 1st ed. Chapman and Hall/CRC; 1999. Available from: http://www.worldcat.org/isbn/0849303168. [Google Scholar]
[17].Girshick MA, Mosteller F, Savage LJ. Unbiased Estimates for Certain Binomial Sampling Problems with Applications. The Annals of Mathematical Statistics. 1946. March;17(1):13– 23. Available from: 10.1214/aoms/1177731018. [DOI] [Google Scholar]
[18].Jung SHH. Statistical issues for design and analysis of single-arm multi-stage phase II cancer clinical trials. Contemporary clinical trials. 2015. May;42:9–17. Available from: http://view.ncbi.nlm.nih.gov/pubmed/25749311. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Jovic G, Whitehead J. An exact method for analysis following a two-stage phase II cancer clinical trial. Statistics in medicine. 2010. December;29(30):3118–3125. Available from: http://view.ncbi.nlm.nih.gov/pubmed/21170906. [DOI] [PubMed] [Google Scholar]
[20].Newcombe RG Confidence Intervals for Proportions and Related Measures of Effect Size (Chapman & Hall/CRC Biostatistics Series). CRC Press; 2012. Available from: http://www.worldcat.org/isbn/1439812780. [Google Scholar]
[21].Fagerland M, Lydersen S, Laake P. Statistical Analysis of Contingency Tables. Boca Raton, FL: Chapman and Hall/CRC; 2017. [Google Scholar]
[22].Chen TT. Optimal three-stage designs for phase II cancer clinical trials. Statistics in medicine. 1997. December;16(23):2701–2711. Available from: http://view.ncbi.nlm.nih.gov/pubmed/9421870. [DOI] [PubMed] [Google Scholar]
[23].Wang W On construction of the smallest one-sided confidence interval for the difference of two proportions. The Annals of Statistics. 2010. April;38(2):1227–1243. Available from: 10.1214/09-aos744. [DOI] [Google Scholar]
[24].Shan G, Zhang H, Jiang T. Minimax and admissible adaptive two-stage designs in phase II clinical trials. BMC Medical Research Methodology. 2016. Aug;16(1):90+ Available from: 10.1186/s12874-016-0194-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Shan G, Ma C. Unconditional tests for comparing two ordered multinomials. Statistical methods in medical research. 2016. Feb;25(1):241–254. Available from: 10.1177/0962280212450957. [DOI] [PubMed] [Google Scholar]
[26].Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics. 1982;38(1):143–151. Available from: 10.2307/2530297. [DOI] [PubMed] [Google Scholar]

[R1] [1].Simon R Optimal two-stage designs for phase II clinical trials. Controlled clinical trials. 1989. Mar;10(1):1–10. Available from: http://view.ncbi.nlm.nih.gov/pubmed/2702835. [DOI] [PubMed] [Google Scholar]

[R2] [2].Mander AP, Thompson SG. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials. Contemporary clinical trials. 2010. November;31(6):572–578. Available from: http://view.ncbi.nlm.nih.gov/pubmed/20678585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in medicine. 2006. Oct;25(19):3382–3395. Available from: http://view.ncbi.nlm.nih.gov/pubmed/16479547. [DOI] [PubMed] [Google Scholar]

[R4] [4].Shan G, Ma C, Hutson AD, et al. Randomized Two-Stage Phase II Clinical Trial Designs Based on Barnard’s Exact Test. Journal of Biopharmaceutical Statistics. 2013. Aug; 23(5):1081–1090. Available from: 10.1080/10543406.2013.813525. [DOI] [PubMed] [Google Scholar]

[R5] [5].Shan G, Wilding GE, Hutson AD, et al. Optimal adaptive two-stage designs for early phase II clinical trials. Statistics in Medicine. 2016. Apr;35(8):1257–1266. Available from: 10.1002/sim.6794. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Shan G, Wilding GE, Hutson AD. Computationally Intensive Two-Stage Designs for Clinical Trials. 2017;:1–7 Available from: 10.1002/9781118445112.stat07986. [DOI] [Google Scholar]

[R7] [7].Shan G, Chen JJ. Optimal inference for Simon’s two-stage design with over or under enrollment at the second stage. Communications in Statistics - Simulation and Computation. 2017. Mar;:1–11 Available from: 10.1080/03610918.2017.1307398. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Koyama T, Chen H. Proper inference from Simon’s two-stage designs. Statistics in medicine. 2008. Jul;27(16):3145–3154. Available from: http://view.ncbi.nlm.nih.gov/pubmed/17960777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Zeng D, Gao F, Hu K, et al. Hypothesis testing for two-stage designs with over or under enrollment. Statist Med. 2015. Jul;34(16):2417–2426. Available from: 10.1002/sim.6490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Wang W, Shan G. Exact confidence intervals for the relative risk and the odds ratio. Biometrics. 2015. December;71(4):985–995. Available from: 10.1111/biom.12360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Shan G, Wang W. ExactCIdiff: An R Package for Computing Exact Confidence Intervals for the Difference of Two Proportions. The R Journal. 2013;5(2):62–71. [Google Scholar]

[R12] [12].Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934. December;26(4):404–413. Available from: 10.1093/biomet/26.4.404. [DOI] [Google Scholar]

[R13] [13].Jennison C, Turnbull BW. Confidence Intervals for a Binomial Parameter Following a Multistage Test With Application to MIL STD 105D and Medical Trials. Technometrics. 1983. Feb;25(1):49–58. Available from: 10.1080/00401706.1983.10487819. [DOI] [Google Scholar]

[R14] [14].Duffy DE, Santner TJ. Confidence Intervals for a Binomial Parameter Based on Multistage Tests. Biometrics. 1987;43(1):81–93. [Google Scholar]

[R15] [15].Jung SHH, Kim KMM. On the estimation of the binomial probability in multistage clinical trials0. Statistics in medicine. 2004. Mar;23(6):881–896. Available from: 10.1002/sim.1653. [DOI] [PubMed] [Google Scholar]

[R16] [16].Jennison C, Turnbull BW. Group Sequential Methods (Chapman & Hall/CRC Interdisciplinary Statistics). 1st ed. Chapman and Hall/CRC; 1999. Available from: http://www.worldcat.org/isbn/0849303168. [Google Scholar]

[R17] [17].Girshick MA, Mosteller F, Savage LJ. Unbiased Estimates for Certain Binomial Sampling Problems with Applications. The Annals of Mathematical Statistics. 1946. March;17(1):13– 23. Available from: 10.1214/aoms/1177731018. [DOI] [Google Scholar]

[R18] [18].Jung SHH. Statistical issues for design and analysis of single-arm multi-stage phase II cancer clinical trials. Contemporary clinical trials. 2015. May;42:9–17. Available from: http://view.ncbi.nlm.nih.gov/pubmed/25749311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Jovic G, Whitehead J. An exact method for analysis following a two-stage phase II cancer clinical trial. Statistics in medicine. 2010. December;29(30):3118–3125. Available from: http://view.ncbi.nlm.nih.gov/pubmed/21170906. [DOI] [PubMed] [Google Scholar]

[R20] [20].Newcombe RG Confidence Intervals for Proportions and Related Measures of Effect Size (Chapman & Hall/CRC Biostatistics Series). CRC Press; 2012. Available from: http://www.worldcat.org/isbn/1439812780. [Google Scholar]

[R21] [21].Fagerland M, Lydersen S, Laake P. Statistical Analysis of Contingency Tables. Boca Raton, FL: Chapman and Hall/CRC; 2017. [Google Scholar]

[R22] [22].Chen TT. Optimal three-stage designs for phase II cancer clinical trials. Statistics in medicine. 1997. December;16(23):2701–2711. Available from: http://view.ncbi.nlm.nih.gov/pubmed/9421870. [DOI] [PubMed] [Google Scholar]

[R23] [23].Wang W On construction of the smallest one-sided confidence interval for the difference of two proportions. The Annals of Statistics. 2010. April;38(2):1227–1243. Available from: 10.1214/09-aos744. [DOI] [Google Scholar]

[R24] [24].Shan G, Zhang H, Jiang T. Minimax and admissible adaptive two-stage designs in phase II clinical trials. BMC Medical Research Methodology. 2016. Aug;16(1):90+ Available from: 10.1186/s12874-016-0194-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Shan G, Ma C. Unconditional tests for comparing two ordered multinomials. Statistical methods in medical research. 2016. Feb;25(1):241–254. Available from: 10.1177/0962280212450957. [DOI] [PubMed] [Google Scholar]

[R26] [26].Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics. 1982;38(1):143–151. Available from: 10.2307/2530297. [DOI] [PubMed] [Google Scholar]

PERMALINK

Exact confidence limits for the probability of response in two-stage designs

Guogen Shan

Abstract

1. Introduction

2. Methods

3. Results

3.1. An example

Figure 1.

3.2. Numerical study

Table 1.

Table 2.

Figure 2.

4. Discussion

Acknowledgment

Appendix A: Proof of Theorem 2.2

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Exact confidence limits for the probability of response in two-stage designs

Guogen Shan

Abstract

1. Introduction

2. Methods

3. Results

3.1. An example

Figure 1.

3.2. Numerical study

Table 1.

Table 2.

Figure 2.

4. Discussion

Acknowledgment

Appendix A: Proof of Theorem 2.2

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases