A flexible futility monitoring method with time-varying conditional power boundary

Ying Zhang; William R Clarke

doi:10.1177/1740774510369686

. Author manuscript; available in PMC: 2022 Apr 25.

Published in final edited form as: Clin Trials. 2010 Apr 27;7(3):209–218. doi: 10.1177/1740774510369686

A flexible futility monitoring method with time-varying conditional power boundary

Ying Zhang ¹, William R Clarke ¹

PMCID: PMC9036670 NIHMSID: NIHMS1622707 PMID: 20423927

Abstract

Background

In an ongoing multi-center randomized control clinical trial, the Carotid Occlusion Surgery Study (COSS), the study protocol specifies multiple interim analyses whose results will be reviewed by an independent DSMB to determine if the trial needs to be stopped early due to either efficacy or futility. Conditional power is used as the decision rule for the DSMB to recommend stopping the trial for futility. An aggressive rule for futility stopping sets a relatively high threshold for the conditional power which may result in significant loss of overall power of the study. A conservative rule using a lower threshold may not be able to stop the trial early when there is indeed no treatment efficacy.

Purpose

The goal of this article is to develop a flexible futility monitoring plan with a time-varying conditional power boundary that maintains the overall power of the study well, but has a better chance to stop the trial earlier for futility compared to a futility stopping rule with a fixed value for the minimum conditional power to continue.

Methods

The conditional power boundary for futility is developed using the β-spending function method for sequential test statistics and assuming no interim analysis for efficacy. It is then modified to account for the repeated interim analyses for efficacy.

Results

Simulation studies that mirror the design of the COSS trial show that the proposed method with sample size calculated without considering interim analyses will maintain the designed size and power well when the designed effect size holds, but will have a better chance to exit the trial earlier if the true effect size is smaller than the designed size such that it is not clinically meaningful to conduct the trial.

Limitations

The method is valid for sequential test statistics that constitute of a stochastic process which approximates the Brownian motion. It is not applicable to the monitored process that behaves quire differently from the Brownian motion.

Conclusions

The proposed conditional power method facilitates a flexible futility monitoring plan that can be easily implemented in long-term clinical trials where multiple interim analyses are required. It provides the DSMB an objective guideline to use in considering early stopping for futility.

Introduction

This article is motivated by the ongoing Carotid Occlusion Surgery Study (COSS), a multi-center randomized controlled clinical trial for which the University of Iowa serves as the data coordinating center. The primary goal of this study is to provide strong scientific evidence that the procedure of extracranial/intracranial (ECIC) bypass by surgical anastomosis of the superficial temporal artery to the middle cerebral artery (STA-MCA) when added to best medical therapy significantly reduces subsequent ipsilateral ischemic stroke (fatal and nonfatal) at 2 years in patients with recently symptomatic internal carotid artery occlusion and Stage II hemodynamic failure. Study participants are randomly assigned to treatment (ECIC bypass surgery plus best medical therapy) and control (best medical therapy only). The primary endpoint of this trial is the 2-year ipsilateral ischemic stroke rate since randomization. The study is designed to detect a difference in the primary endpoint of 40% in the control group and 24% in the surgical group. This represents a clinically meaningful absolute risk reduction of 16% and a relative risk reduction of 40%. In the study design, 372 participants (including possibly 5% drop-out) need to be randomized for the two-sided 5% level test of proportions to have 90% power to detect the designed difference. The sample size calculation did not adjust for the effect of interim efficacy and futility analyses. However, the study protocol specifies that the study will be monitored by an independent DSMB at least yearly and that both efficacy and futility will be assessed. Considering the great cost associated with the ECIC bypass surgery and potential complications after the surgery, it is desirable to have a futility stopping rule, which maintains the overall power well when the designed effect size is true, but is also able to recognize the futility signal as early as possible.

Because the DSMB monitoring times are not fixed, the study proposes to monitor using the flexible error spending method [1] with the O’Brien–Fleming type spending function. This method is particularly useful in this COSS study in two ways. (1) The O’Brien–Fleming type spending function has minimal effect on the overall power of the study so it is not necessary to increase the sample size to account for the multiple interim analyses; (2) the crossing boundary can be determined using readily available packages with information accumulated over time. Although both the efficacy and futility boundaries can be similarly established using group sequential designs [2–5], an example [6] showed that the interim monitoring plans with the group sequential designs would result in a good chance of rejecting the null hypothesis at the final analysis even if the futility boundary is crossed at an interim analysis.

Conditional power (CP) is the probability that the final analysis will result in rejection of null hypothesis given accrued data at interim analysis and an effect size. It is widely accepted as a tool for futility monitoring as it is easily computed by the B-value using the Brownian motion techniques [7]. More importantly, it provides a nice interpretation of the projection to the end of study. Due to this consideration, the conditional power is adopted in the COSS protocol as the tool for futility monitoring.

Let R denote the rejection region, and Z_t the test statistic at information time t, $0 < t \leq 1$ where Z₁ is the test statistic at the end of study. The trial is stopped for futility (accepting H₀) if $C P_{t} = P {Z_{1} \in R ∣ Z_{t}, H_{a}} < γ$ (H_a is the alternative hypothesis with the designed effect size) for a pre-specified $0 < γ < 0.5$ [8].

If interim result is monitored continuously, the use of a fixed CP futility stopping rule given by $C P_{t} < γ$ has the overall type II error bounded by $β / (1 - γ)$ [9], where β is the type II error without interim futility monitoring. There is numerical evidence [10] that both type I and type II errors can deviate substantially from their nominal levels when an aggressive stopping boundary (bigger γ) for futility is adopted.

Selection of a futility stopping boundary based on the conditional power is frequently a subjective matter depending on the opinion of the members of DSMB and/or the study investigators. For example, investigators often decide to stop a trial for futility if the conditional power with the designed effect size becomes small, such as $C P_{t} < 0.1$ . This rule is attractive because it is simple and has little influence on the overall type I and type II errors. However, the rule is conservative as it is highly unlikely to recommend for futility early in the trial if the true effect size is smaller than the designed size such that it is not clinically meaningful to conduct the trial. To increase the chance of early stopping for futility, one may increase the value of γ with the trade-off of inflating type II error.

In this article, we propose a flexible CP method for futility monitoring in which the CP boundary is chosen to be a time-varying parameter instead of a fixed value, that is, $C P_{t} < γ_{t}$ with γ_t depending on information time t. This design allows the trial to correctly stop early for futility with larger probability than the conservative stopping rule based on the fixed CP method. It also has little effect on the overall type I error and maintains the overall power well. This futility monitoring method has been implemented in the COSS study. The proposed method will be assessed using simulation studies that mimic various possible outcomes in the COSS study. The advantage of this method will be established by comparing the results to those from the fixed CP method.

Methods

As the primary analysis in the COSS study is to compare the rate of the primary endpoint between the two groups, we illustrate the proposed method using the one-sided test of proportions: $H_{0} : p_{1} = p_{2} vs. H_{a} : p_{1} > p_{2}$ . The method can be trivially extended to the two-sided test. Assume that a total of 2N subjects are randomly assigned to either control or study treatment with N subjects in each group. Let ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ be the sample proportions for the control and treatment groups, respectively. A Z-test statistic given by

Z_{1} = \frac{\sqrt{N} ({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{{\hat{p}}_{1} (1 - {\hat{p}}_{1}) + {\hat{p}}_{2} (1 - {\hat{p}}_{2})}},

(1)

is asymptotically distributed as normal with mean zero and variance one. In order for the one-sided level test Z₁ to achieve 1 − β power to reject the null hypothesis at the designed alternative $H_{a} : p_{1} = p_{2} + δ$ , the sample size N for each group is given by

N = \frac{{(z_{1 - α} + z_{1 - β})}^{2} {p_{1} (1 - p_{1}) + p_{2} (1 - p_{2})}}{δ^{2}},

where z_p is the p-th percentile of the standard normal distribution. The drift parameter defined by $θ = E {Z_{1} ∣ H_{a}}$ [7] can be shown to be approximately equal to $Z_{1 - α} + Z_{1 - β}$ .

Suppose an interim analysis is conducted at information time $t = n / N$ where n is the number of subjects in each group who are available for accessing the primary endpoint and ${\hat{p}}_{1, n}$ and ${\hat{p}}_{2, n}$ are the sample proportions for the control and treatment groups, respectively. When n is large enough, the difference of the sample proportions ${\hat{p}}_{1, n} - {\hat{p}}_{2, n}$ is asymptotically distributed as normal with variance $n^{- 1} {p_{1} (1 - p_{1}) + p_{2} (1 - p_{2})}$ . The inverse of the asymptotic variance $I_{t} = n {p_{1} (1 - p_{1}) + p_{2} (1 - p_{2})}^{- 1}$ is referred to as the information at interim analysis [8] and it can be consistently estimated by ${\hat{I}}_{t} = n {{\hat{p}}_{1, n} (1 - {\hat{p}}_{1, n}) + {\hat{p}}_{2, n} (1 - {\hat{p}}_{2, n})}^{- 1}$ . Hence the Z-test statistic at information time t can be written as

Z_{t} = ({\hat{p}}_{1, n} - {\hat{p}}_{2, n}) {\hat{I}}_{t}^{1 / 2} .

(2)

If k sequential analyses are scheduled at discrete information times $t_{i} = n_{i} / N$ for $n_{1} < n_{2} < \dots < n_{k}$ , it can be demonstrated that the sequential test statistics ${Z_{t_{1}}, Z_{t_{2}}, \dots, Z_{t_{k}}}$ have the canonical joint distribution with information levels ${{\hat{I}}_{t_{1}}, {\hat{I}}_{t_{2}}, \dots {\hat{I}}_{t_{k}}}$ asymptotically [8].

In practice, the B-value [7] is often used as a data monitoring tool. The B-value at information time t is given by $B (t) = Z_{t} t^{1 / 2}$ which has asymptotic expectation $E {B (t) ∣ H_{a}} \approx θ t$ under the designed effect size. In fact, $B (t) - θ t$ can be shown an asymptotically standard Brownian motion process for $0 \leq t \leq 1$ using the same arguments as given in [11]. If no interim analysis for efficacy is designed, the conditional power given the accrued data at information time t and the drift parameter θ can be approximated by

C P_{t} (θ) = P {B (1) = Z_{1} > Z_{1 - α} ∣ B (t), θ} \approx Φ (\frac{Z_{t} \sqrt{t} + θ (1 - t) - z_{1 - α}}{\sqrt{1 - t}}),

(3)

where $Φ (\cdot)$ is the cumulative distribution function of the standard normal random variable [7].

It is a common practice to calculate the conditional power $C P_{t} (θ)$ under the designed effect size and decide to stop the trial by declaring futility if $C P_{t} (θ) < γ$ for a fixed $0 < γ \leq 0.5$ . Including stopping for futility can inflate type II error. As shown in [9], if data are monitored continuously using the futility stopping rule defined as above, the overall type II error β* will be bounded as $β^{*} \leq β / (1 - γ)$ . For example, if γ is chosen to be 0.1, then the inflation of the overall type II error will be less than 11.1% of the designed error β used for determining the sample size. While this rule does not inflate type II error much, it is, nevertheless, unlikely to declare futility early when there is indeed no treatment efficacy. The rationale is that at an early time, say t < 0.5, the value of conditional power implicitly weights the future data (assumed to follow the designed effect size) more than the data accumulated up to time t and hence the conditional power would not be expected to drop from 0.9 to below 0.1 quickly.

We propose a futility stopping rule with a time-varying CP boundary. Instead of setting a fixed threshold γ, we allow the threshold γ_t to be a function of information time, which is larger when t is small and we declare for futility at an interim analysis with information time t if $C P_{t} (θ) < γ_{t}$ . The proposed method is motivated by the flexible error spending function monitoring method [1].

Suppose the trial will be monitored for futility k times at information times $\underline{t} = (t_{1}, t_{2}, \dots, t_{k}) : 0 < t_{1} < t_{2} < \dots < t_{k} < 1$ before the final analysis, we determine $γ_{t_{i}}$ sequentially for $i = 1, 2, \dots, k$ such that

P (C P_{t_{1}} (θ) \geq γ_{t_{1}}, \dots, C P_{t_{l - 1}} (θ) \geq γ_{t_{l - 1}}, C P_{t_{l}} (θ) < γ_{t_{l}}) = f (t_{l}) - f (t_{l - 1}),

(4)

for $l = 1, 2, \dots, k$ , where $f (\cdot)$ is an increasing function with $f (0) = 0$ and $f (1) = β^{*}$ . In view of the Equation (3), if we let $γ_{t} = Φ (η_{t})$ , the conditional power boundary at time t, $C P_{t} (θ) < γ_{t}$ can be directly related to an inequality for the interim Z-test statistic,

\frac{Z_{t} \sqrt{t} + θ (1 - t) - Z_{1 - α}}{\sqrt{1 - t}} < η_{t} .

Hence the critical value c_t for the standard normal variate $Z_{t} - θ \sqrt{t} < c_{t}$ can be written as

C_{t} = η_{t} \sqrt{\frac{1 - t}{t}} - \frac{Z_{1 - β}}{\sqrt{t}},

due to the fact $θ \approx Z_{1 - α} + Z_{1 - β}$ . Therefore the value of the conditional power boundary γ_{t_i}, at time $0 < t_{1} < t_{2} < \dots < t_{k} < 1$ can be computed by

γ_{t_{i}} = Φ (c_{t_{i}} \sqrt{\frac{t_{i}}{1 - t_{i}}} + \frac{Z_{1 - β}}{\sqrt{1 - t_{i}}}), i = 1, 2, \dots, k .

(5)

Because the β-spending equations for the conditional powers (4) can be equivalently written for the Z-test statistics as

P (Z_{t_{1}} - θ \sqrt{t_{1}} \geq c_{t_{1}}, \dots, Z_{t_{l - 1}} - θ \sqrt{t_{l - 1}} \geq c_{t_{l - 1}}, Z_{t_{l}} - θ \sqrt{t_{1}} < c_{t_{l}}) = f (t_{1}) - f (t_{l - 1}),

(6)

for $l = 1, 2, \dots, k$ by the properties of the Brownian motion process of ${Z_{t} \sqrt{t} - θ t : 0 \leq t \leq 1}$ , the values of $\underline{c} = (c_{t_{1}}, c_{t_{2}}, \dots, c_{t_{k}})$ can be computed using the Lan and DeMets method implemented in the online software [12] (http://www.biostat.wisc.edu/landemets/).

The conditional power boundary that is easily determined by (6) and (5) assumes that no interim analysis for efficacy is conducted and hence the decision to declare for futility in the final analysis is given by $Z_{1} \leq Z_{1 - α}$ . However, when interim analyses for the efficacy (usually the primary purpose for monitoring the trial) are implemented, the critical value for the Z-test statistic at the final analysis, Z₁ to reject the null hypothesis is greater than $Z_{1 - α}$ . Following the same lines as above, the conditional power boundary should be given by $Φ (c_{t_{i}} {[t_{i} / (1 - t_{i})]}^{1 / 2} + ε (\underline{t}) z_{1 - β} / {(1 - t_{i})}^{1 / 2})$ , $i = 1, 2, \dots, k$ for some unknown $0 < ε (\underline{t}) < 1$ . Therefore, using the futility stopping boundary given by (5) obviously inflates the overall type II error. To reduce the overall type II error, we propose an ad hoc method by setting the conditional power stopping boundary as

C P_{t_{i}} (θ) \leq γ_{t_{i}}^{*} = Φ (c_{t_{i}} \sqrt{\frac{t_{i}}{1 - t_{i}}} + z_{1 - β}), i = 1, 2, \dots, k

(7)

to offset the impact of the unknown $0 < ε (\underline{t}) < 1$ . This futility monitoring plan is very flexible as the conditional power boundary ${\underline{γ}}^{*} = (γ_{t_{1}}^{*}, γ_{t_{2}}^{*}, \dots, γ_{t_{k}}^{*})$ does not depend on when and how many future interim analyses would be conducted after t_k. This feature would be very attractive for monitoring a long-term clinical trial such as the COSS study.

As an illustration, we consider an artificial monitoring plan with four interim analyses (for both efficacy and futility) scheduled at information times t=0.25, 0.45, 0.65, and 0.80. Sample size is determined for the one-sided standard Z-test at level 0.05 with power 0.90 to detect the drift parameter θ without adjusting for multiple looks of data during the study. We compute the efficacy and futility stopping boundaries separately. For efficacy, the O’Brien–Fleming one-sided boundary of a total type I error 0.05 is adopted. The online program [12] produces the boundary for the Z-test statistic e = (3.7496,2.7016,2.1982,1.9815,1.7419). For futility, the O’Brien–Fleming type β-spending function of a total type II error 0.111 $(β^{*} = β / (1 - 0.1) = 0.111)$ is adopted. The same program produces the boundary for the Z-test statistic centered at the designed effect size $\underline{c} = (- 2.9812, - 2.1190, - 1.7195, - 1.5564)$ that maps to the conditional power boundary ${\underline{γ}}^{*} = (γ_{t_{1}}^{*}, γ_{t_{2}}^{*}, γ_{t_{3}}^{*}, γ_{t_{4}}^{*}) = (0.3301, 0.2627, 0.1442, 0.0335)$ by (7). The decision rule with interim monitoring will be given as follows. at the i-th interim analysis, for $i = 1, \dots, 4$ ,

if $Z_{t_{i}} \geq e_{i}$ , stop for efficacy;
else compute the conditional power $C P_{t_{i}} (θ)$ given accumulated data at information time t_i and the designed effect size, if $C P_{t_{i}} (θ) < γ_{t_{i}}^{*}$ , stop for futility;
else continue the trial and repeat steps 1 and 2 at information time $t_{i + 1}$ ;
at the final analysis, if $Z_{1} \geq e_{5} = 1.7419$ , declare efficacy; otherwise accept the null hypothesis.

The larger conditional power boundary of the proposed time-varying CP method at the early times (t ≤ 0.5) intuitively allows the trial to pick up the futility signal more quickly than the fixed CP method of $C P_{t} (θ) \leq 0.1$ This method is easily implemented using the existed online software but it lacks of rigorous statistical justification in terms of controlling the overall size and power. The performance of the proposed method on maintaining adequate size and power is evaluated by the simulation studies described below.

The simulation studies are conducted to evaluate the proposed monitoring plan that is implemented in the ongoing COSS study. The data are generated by mimicking the COSS trial with various possible parameters that could occur in the study. For the simulation studies, we suppose the trial is designed to demonstrate the treatment efficacy in the 0.05 level one-sided two-sample proportion test powered at 90% to reject the null hypothesis with the designed alternative, p₁ = 40% and p₂ = 24%, a 40% reduction from the control. For the balanced trial as designed in the COSS study, it requires N = 142 subjects in each group. We illustrate the simulations with two candidate monitoring plans: (i) four interim analyses at information times t = (0.2,0.4, 0.6,0.8) and (ii) nine interim analyses at information times t = (0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9). The Lan–DeMets error spending method with the α –spending function $f (t) = 1 - Φ (Z_{0.95} / t^{1 / 2})$ is used to compute the efficacy stopping boundary for the interim Z-test statistic, which produces the boundary similar to the O’Brien–Fleming method. for Plan (i) e = (4.229,2.888,2.298,1.962,1.740); for Plan (ii) e = (6.088,4.229,3.396,2.906,2.579, 2.342, 2.160,2.015,1.895,1.795). For the futility boundary, in addition to the fixed CP method of $C P_{t} (θ) < γ = 0.1$ , we explore the proposed conditional power boundary (7) with four different β-spending functions. (i) $f_{1} (t) = 1 - Φ (Z_{1 - β^{*}} / t^{1 / 2})$ ; (ii) $f_{2} (t) = β^{*} t$ ; (iii) $f_{3} (t) = β^{*} t^{3 / 2}$ ; and (iv) $f_{4} (t) = β^{*} t^{2}$ . We set $β^{*} = 0.1 / (1 - γ) = 0.111$ to make the overall type II error comparable to the fixed CP boundary.

Table 1 presents the boundaries of the proposed time-varying CP method (7) with the four spending functions for the two plans. For each of these β-spending functions, the value of the stopping boundary decreases as information increases. It appears that the futility stopping rules based on the power family for the β-spending functions tend to be more aggressive than that based on the O’Brien–Fleming type β-spending function, because it is easier to accept the null hypothesis at the earlier times with the β-spending functions of power family. This fact can be also manifested graphically.

Table 1.

The conditional power boundaries for the two interim futility monitoring plans with β* = 0.111

Error functions	Information times (t)
Error functions	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
f₁(t)
k=4	–	0.342	–	0.284	–	0.179	–	0.037	–
k=9	0.362	0.342	0.314	0.274	0.223	0.161	0.091	0.028	0.001
f₂(t)
k=4	–	0.609	–	0.405	–	0.200	–	0.025	–
k=9	0.698	0.577	0.470	0.368	0.267	0.168	0.079	0.017	0.0001
f₃(t)
k=4	–	0.547	–	0.361	–	0.186	–	0.030	–
k=9	0.649	0.527	0.425	0.333	0.246	0.161	0.082	0.021	0.0003
f₄(t)
k=4	–	0.489	–	0.314	–	0.163	–	0.029	–
k=9	0.603	0.477	0.378	0.293	0.216	0.143	0.076	0.022	0.001

Open in a new tab

Equating (7) and (3), the futility stopping boundary of the conditional power (7) can be equivalently expressed as

Z_{t} < c_{t} - (\sqrt{1 / t} - \sqrt{(1 - t) / t}) z_{1 - β} + θ \sqrt{t} .

Similarly, the fixed CP futility stopping boundary of $C P_{t} (θ) < 0.1$ can be equivalently expressed as

Z_{t} < Φ^{- 1} (0.1) \sqrt{(1 - t) / t} - Z_{1 - β} / \sqrt{t} + θ \sqrt{t} .

Figure 1 presents the monitoring boundaries of the Z-test statistic for nine interim looks at t = (0.1,0.2,...,0.9) with α = 0.05 and β* = 0.111 It shows that the fixed CP futility stopping boundary is substantially lower than the proposed time-varying CP futility monitoring boundaries early in the trial. Therefore, it is anticipated that the proposed time-varying CP futility monitoring has a better chance to stop the trial for futility earlier than the fixed CP monitoring.

Comparison of the futility boundaries between the proposed time-varying CP method with the four different β-spending functions: (i) $f_{1} (t) = 1 - (Z_{1 - β^{*}} / t^{1 / 2})$ , (ii) $f_{2} (t) = β^{*} t$ , (iii) $f_{3} (t) = β^{*} t^{3 / 2}$ (iv) $f_{4} (t) = β^{*} t^{2}$ and the fixed CP method for $β^{*} = 0.111$ .

In the simulation studies, the 2-year rate of the primary endpoint (ipsilateral ischemic stroke) for the control group is always set at p₁ = 40%. The rate for the surgical group p₂ is set at 24% (the designed effect size) and at 35% (a reduced effect size) for comparing the exit probabilities at interim analyses and the overall power. The value of p₂ is also set at 40% for comparing the exit probabilities and the overall size. The Monte Carlo method is used to estimate the exit probabilities due to either efficacy or futility at interim analyses.

For example, in a simulation study for the case with the designed effect size (p₁ = 40% and p₂ = 24%) under monitoring plan (i) with the O’Brien and Fleming boundary, the simulation proceeds as follows.

For the first interim analysis, we generate a sample of $142 \times 0.2 \approx 28$ dichotomous random observations from Bernoulli(0.4) and Bernoulli(0.24) for the control and study treatment groups, respectively. The Z-statistic at information time t₁ = 0.2, Z_0.2 is calculated using (2) and compares to 4.229. If Z_0.2>4.229, exit the trial for efficacy.
Otherwise compute the conditional power CP_0.2 given the samples and the designed effect size using (3) and compare to 0.342. If CP_0.2<0.342, exit the trial for futility.
If no stopping of the trial is advised, we further generate a new sample of 28 dichotomous random observations from Bernoulli(0.4) and Bernoulli(0.24) for the control and study treatment groups, respectively. Combine the data and recalculate the corresponding Z-statistic and the conditional power at information time t₂ = 0.4, Z_0.4 and CP_0.4, and compare them to the thresholds 2.888 and 0.284, respectively, in the same way as described in Steps 1 and 2.
If no stopping is recommended in any interim analyses, we come to the final analysis. Remaining data are generated from Bernoulli(0.4) and Bernoulli(0.24) for the control and study treatment groups, respectively. Calculate the Z-statistic with all the data, Z₁. If Z₁>1.740, declare efficacy; otherwise declare futility.

We repeat this procedure 100,000 times. The exit probabilities due to efficacy or futility at each interim analysis and final analysis are estimated by the proportions of corresponding exits. The overall power is estimated by the sum of empirical exit probabilities due to efficacy.

Results

Table 2 presents the Monte Carlo simulation results comparing the overall power and exit probabilities between the proposed time-varying CP method and the fixed CP method of $C P_{t} (θ) < γ = 0.1$ for data generated from the designed effect size. The results show that the loss of power is negligible. Increasing the number of interim analyses may increase type II error, but our results indicate that the inflation of type II error is not substantial. Using the proposed monitoring method, the chance of early stopping for futility, though larger than the fixed CP method, is again negligible. For example, if nine interim analyses are scheduled at the information times described in Table 2, the cumulative early exit probabilities due to futility for the monitoring plan (ii) at the middle of the trial (t=0.5) are only 0.007, 0.025, 0.015, and 0.01, respectively, for the proposed time-varying CP method with the four β–spending functions; while this probability is 0.001 for the fixed CP method. Inflations of type II error at early time can be virtually ignored, particularly for the proposed method with the O’Brien–Fleming type β-spending function and the fixed CP method.

Table 2.

Comparison of the exit probabilities between the proposed time-varying CP method with the four different β-spending functions and the fixed CP method at each interim analysis and the overall power based on 100,000 Monte Carlo samples for data generated with p₁ = 40% and p₂ = 24%

Information time (t)	f₁(t)		f₂(t)		f₃(t)		f₄(t)		Fixed CP method with γ = 0.1

	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9
0.1
Efficacy	–	0.000	–	0.000	–	0.000	–	0.000	–	0.000
Futility	–	0.000	–	0.010	–	0.003	–	0.001	–	0.000
0.2
Efficacy	0.006	0.005	0.006	0.005	0.006	0.005	0.006	0.005	0.006	0.005
Futility	0.000	0.000	0.011	0.005	0.005	0.004	0.002	0.002	0.000	0.000
0.3
Efficacy	–	0.046	–	0.046	–	0.046	–	0.046	–	0.046
Futility	–	0.001	–	0.004	–	0.002	–	0.002	–	0.000
0.4
Efficacy	0.163	0.112	0.163	0.112	0.163	0.112	0.163	0.112	0.163	0.112
Futility	0.003	0.002	0.007	0.004	0.006	0.003	0.005	0.002	0.000	0.000
0.5
Efficacy	–	0.159	–	0.159	–	0.159	–	0.159	–	0.159
Futility	–	0.004	–	0.002	–	0.003	–	0.003	–	0.001
0.6
Efficacy	0.330	0.171	0.329	0.170	0.329	0.170	0.330	0.170	0.330	0.171
Futility	0.009	0.003	0.005	0.002	0.006	0.002	0.004	0.003	0.004	0.003
0.7
Efficacy	–	0.140	–	0.139	–	0.140	–	0.140	–	0.140
Futility	–	0.004	–	0.002	–	0.004	–	0.002	–	0.006
0.8
Efficacy	0.252	0.113	0.250	0.111	0.251	0.112	0.252	0.113	0.252	0.113
Futility	0.006	0.003	0.004	0.001	0.004	0.002	0.005	0.003	0.015	0.010
0.9
Efficacy	–	0.082	–	0.080	–	0.081	–	0.082	–	0.083
Futility	–	0.002	–	0.001	–	0.002	–	0.002	–	0.023
1.0
Efficacy	0.142	0.059	0.138	0.056	0.140	0.058	0.142	0.058	0.143	0.059
Futility	0.090	0.093	0.087	0.090	0.090	0.091	0.092	0.093	0.088	0.068
Power	0.893	0.887	0.886	0.878	0.889	0.883	0.893	0.885	0.894	0.888

Open in a new tab

Table 3 presents the results of the same Monte Carlo simulation study but with data in the surgical group generated from Bernoulli(0.35) that is, p₂ = 35%. This represents the case that the surgical treatment in COSS is much less effective than hypothesized and may not have a clinical significance leading to adoption of this treatment in practice considering the large cost and complication associated with the surgery. The results show that the proposed time-varying CP method has almost the same overall power as that based on the fixed CP method. However, the proposed method has a greater advantage in terms of being able to stop the trial earlier for futility. The cumulative exit probabilities due to futility for the monitoring plan (ii) at information time t=0.5 are 15.9%, 22.5%, 19.4%, and 14.9%, respectively, for the proposed time-varying CP method with the four β-spending functions and are much larger than 5.7% using the fixed CP method.

Table 3.

Information time (t)	f₁(t)		f₂(t)		f₃(t)		f₄(t)		Fixed CP method with γ = 0.1

		k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9
0.1
Efficacy	–	0.000	–	0.000	–	0.000	–	0.000	–	0.000
Futility	–	0.000	–	0.047	–	0.022	–	0.010	–	0.000
0.2
Efficacy	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Futility	0.004	0.004	0.088	0.036	0.052	0.027	0.028	0.017	0.000	0.000
0.3
Efficacy	–	0.003	–	0.003	–	0.003	–	0.003	–	0.003
Futility	–	0.023	–	0.056	–	0.041	–	0.029	–	0.001
0.4
Efficacy	0.012	0.009	0.012	0.009	0.012	0.009	0.012	0.009	0.012	0.009
Futility	0.075	0.054	0.096	0.044	0.081	0.051	0.083	0.041	0.012	0.011
0.5
Efficacy	–	0.017	–	0.017	–	0.017	–	0.017	–	0.017
Futility	–	0.078	–	0.042	–	0.053	–	0.052	–	0.045
0.6
Efficacy	0.045	0.027	0.045	0.027	0.045	0.027	0.045	0.027	0.045	0.027
Futility	0.158	0.057	0.099	0.044	0.125	0.049	0.100	0.068	0.132	0.089
0.7
Efficacy	–	0.034	–	0.034	–	0.034	–	0.034	–	0.034
Futility	–	0.080	–	0.043	–	0.073	–	0.051	–	0.122
0.8
Efficacy	0.071	0.038	0.070	0.038	0.071	0.038	0.071	0.038	0.071	0.038
Futility	0.132	0.051	0.084	0.038	0.090	0.048	0.115	0.070	0.254	0.136
0.9
Efficacy	–	0.041	–	0.039	–	0.040	–	0.041	–	0.041
Futility	–	0.061	–	0.035	–	0.041	–	0.065	–	0.167
1.0
Efficacy	0.085	0.046	0.082	0.044	0.084	0.045	0.085	0.046	0.086	0.046
Futility	0.418	0.376	0.424	0.403	0.440	0.380	0.460	0.381	0.387	0.211
Power	0.213	0.215	0.209	0.211	0.212	0.213	0.213	0.215	0.214	0.215

Open in a new tab

Table 4 summarizes the Monte Carlo simulation study for the case where the surgical treatment is no better than the control (p₁ = p₂ = 40%). The results show that the overall sizes are all close to the nominal value of 0.05, especially when only four interim analyses are conducted. The size increases to at most 0.056 when the number of interim analyses increases to nine. When the study treatment is not different from the control at all, the fixed CP method does not detect the futile trial quickly and the cumulative exit probability to stop the trial by the time the study is half completed Is only 16.1% under the monitoring plan (ii). However, the proposed time-varying CP method is able to boost this probability substantially: they are 34.3%, 42.0%, 38.8%, and 31.9%, respectively, for the four selected β-spending functions.

Table 4.

Comparison of the exit probabilities between the proposed time-varying CP method with the four different β-spending functions and the fixed CP method at each interim analysis and the overall size based on 100,000 Monte-Carlo samples for data generated with p1 =p2 = 40%

Information time (t)	f₁(t)		f₂(t)		f₃(t)		f₄(t)		Fixed CP method with γ = 0.1

	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9	k = 4	k = 9
0.1
Efficacy	–	0.000	–	0.000	–	0.000	–	0.000	–	0.000
Futility	–	0.001	–	0.079	–	0.040	–	0.018	–	0.000
0.2
Efficacy	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Futility	0.012	0.011	0.171	0.070	0.111	0.055	0.067	0.038	0.000	0.000
0.3
Efficacy	–	0.001	–	0.001	–	0.001	–	0.001	–	0.001
Futility	–	0.064	–	0.113	–	0.091	–	0.072	–	0.005
0.4
Efficacy	0.002	0.002	0.002	0.002	0.002	0.002	0.002	0.002	0.002	0.002
Futility	0.184	0.126	0.184	0.078	0.168	0.105	0.171	0.094	0.044	0.038
0.5
Efficacy	–	0.004	–	0.004	–	0.004	–	0.004	–	0.004
Futility	–	0.141	–	0.080	–	0.097	–	0.097	–	0.118
0.6
Efficacy	0.010	0.007	0.010	0.007	0.010	0.007	0.010	0.007	0.010	0.007
Futility	0.285	0.104	0.171	0.077	0.222	0.086	0.200	0.126	0.305	0.192
0.7
Efficacy	–	0.009	–	0.009	–	0.009	–	0.009	–	0.009
Futility	–	0.119	–	0.069	–	0.108	–	0.084	–	0.187
0.8
Efficacy	0.017	0.010	0.016	0.010	0.017	0.010	0.017	0.010	0.017	0.010
Futility	0.183	0.073	0.125	0.060	0.136	0.068	0.180	0.100	0.352	0.170
0.9
Efficacy	–	0.010	–	0.010	–	0.010	–	0.010	–	0.010
Futility	–	0.080	–	0.050	–	0.053	–	0.085	–	0.129
1.0
Efficacy	0.021	0.013	0.021	0.012	0.021	0.012	0.021	0.013	0.022	0.013
Futility	0.287	0.227	0.300	0.269	0.313	0.242	0.332	0.232	0.248	0.106
Size	0.050	0.056	0.049	0.055	0.050	0.055	0.050	0.056	0.051	0.056

Open in a new tab

Simulations with other values of p₂ were also conducted (results not shown here) and similar patterns were observed.

Discussion

For a long-term randomized clinical trial, a sequence of interim monitoring for both efficacy and futility may be required by the DSMB. A flexible CP futility monitoring plan with the time-varying boundary is developed that can be easily accommodated with any sequential interim monitoring for efficacy. Compared to the commonly used monitoring plan in practice that the efficacy boundary is derived using the Lan–DeMets α-spending method and the futility boundary is set at a fixed value for the conditional power calculated with the designed effect size, the proposed plan increases the chance of early stopping for futility substantially. This feature of the proposed time-varying CP method is very attractive to the trial that only the minimally clinically meaningful effect size is designed for the test, since it allows investigators to be able to stop the trial earlier when this effect size is unlikely to be realized and thus considerable resources can be saved for other promising trials. However, when the effect size less than the designed is still acceptable or investigators would like to accumulate as more data as possible for secondary analysis, the fixed CP method with small threshold for the conditional power may be more desirable.

The proposed method does not require pre-specifying the number and times of the interim analyses. The stopping boundaries for both efficacy and futility can be sequentially but independently obtained based on the principle of error spending method given the times of interim analysis. The efficacy boundary is determined based on α-spending function for type I error, the futility boundary is similarly obtained based on β-spending function for type II error, and these boundaries can be easily computed separately using the existed online software [12]. Because the number and times of interim monitoring may not be conveniently determined prior to the trial or may be frequently modified as exemplified by the ongoing COSS trial, this approach is obviously appealing to the investigators. Although the proposed method may not control for the overall size and power at the designed levels, our empirical results through the extensive simulation studies show that the proposed monitoring method maintains the designed overall size and power quite well for the two-sample proportion test.

In summary, the proposed time-varying CP method substantially increases the chance of early stopping for futility without substantially inflating the overall power compared to the fixed CP method. Among the four selected β-spending functions for the proposed method, using the β-spending function, $f (t) = β^{*} t$ or $β^{*} t^{1.5}$ leads to a relatively aggressive plan in terms of stopping earlier for futility and it appears to inflate type II error more than other β-spending functions. The proposed time-varying CP futility monitoring method with O’Brien–Fleming type β-spending function agrees closely to the fixed CP futility monitoring method in terms of the overall size and power but has the advantage of having a reasonable chance to pick up the futility signal early in the trial and it is therefore recommended for use in a lengthy and costly randomized clinical trial.

If the number and times of the interim monitoring are exactly specified in the design, one can design a monitoring method to control the overall size and power exactly. For example, if the trial is scheduled to be monitored k times including the final analysis at times $0 < t_{1} < t_{2} < \dots < t_{k} = 1$ with the corresponding efficacy boundary $e_{1}, e_{2}, \dots, e_{k}$ , then the conditional power should be computed by $C P_{t} (θ) = P {B (1) > e_{k} ∣ B (t), θ}$ and therefore the futility boundary will depend on the efficacy boundary. It is possible to compute both efficacy and futility boundaries jointly by the iterative integration method [8] but the computation can be quite involved and no software is currently available for the task. Moreover, there is no guarantee that this approach will yield a decreasing conditional power boundary that has the capability of exiting the trial for futility earlier than the fixed CP method.

For this proposed method, the Z-test statistic is defined as

Z_{t} = \sqrt{n} ({\hat{p}}_{1, n} - {\hat{p}}_{2, n}) / \sqrt{{\hat{p}}_{1, n} (1 - {\hat{p}}_{1, n}) + {\hat{p}}_{2, n} (1 - {\hat{p}}_{2, n})} .

This is different from the standard Z-test statistic for the two-sample proportion test. The reason for doing so is to make the test statistic asymptotically normally distributed with variance one for both the null and alternative hypotheses. If the standard Z-test statistic $Z_{t} = \sqrt{n} ({\hat{p}}_{1, n} - {\hat{p}}_{2, n}) / \sqrt{2 {\hat{p}}_{n} (1 - {\hat{p}}_{n})}$ for ${\hat{p}}_{n} = ({\hat{p}}_{1, n} + {\hat{p}}_{2, n}) / 2$ is adopted, the sample size for the fixed design should be modified to

N = {\frac{z_{1 - α} \sqrt{\bar{p} (1 - \bar{p})} + z_{1 - β} \sqrt{p_{1} (1 - p_{1}) + p_{2} (1 - p_{2})}}{δ}}^{2},

with $\bar{p} = (p_{1} + p_{2}) / 2$ the drift parameter θ is approximately equal to

θ = Z_{1 - α} + Z_{1 - β} \sqrt{\frac{p_{1} (1 - p_{1}) + p_{2} (1 - p_{2})}{2 \bar{p} (1 - \bar{p})}},

and the conditional power given the designed effect size can be computed by

C P_{t} (θ) = Φ {\frac{Z_{t} \sqrt{t} + θ (1 - t) - z_{1 - α}}{\sqrt{1 - t} R (α, β, θ)}},

with $R (α, β, θ) = (θ - Z_{1 - α}) / z_{1 - β}$ Despite the slight difference in the formula for computing the conditional power, the futility stopping boundary turns out to be the same and our simulation experiments (not shown here) indicate that the two versions of the Z-test statistic yield very similar results unless the difference between p₁ and p₂ is very large.

Finally, though the proposed time-varying CP futility monitoring method is developed and illustrated for the two-sample test of proportions motivated by the ongoing COSS study. The method can be applied to other tests as long as the sequential test statistics constitute of the Brownian motion process asymptotically. However, this method may not be applicable to monitored process that behaves quite differently from the Brownian motion.

Acknowledgments

This research is partially supported by the grant, NINDS-5U01NS041895. The authors are thankful to the editor Steven Goodman, an associated editor, and the two anonymous reviewers. Their comments and suggestions greatly help improve this manuscript from an early version.

References

1.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70. 659–63. [Google Scholar]
2.DeMets DL, Ware JH. Group sequential methods for clinical trials with one-sided hypothesis. Biometrika 1980; 67. 651–60. [Google Scholar]
3.DeMets DL, Ware JH. Asymmetric group sequential boundaries for monitoring clinical trial. Biometrika 1982; 69. 661–3. [Google Scholar]
4.Emerson SS, Fleming TR. Symmetric group sequential test designs. Biometrics 1989; 45. 905–23. [PubMed] [Google Scholar]
5.Pampallona SK, Tsiatis AA. Group sequential designs for one-sided and two-sided hypothesis testing with provision for early stopping in favor of the null hypothesis. J Stat Plan Infer 1994; 42. 19–35. [Google Scholar]
6.Freidlin B, Korn EL. A comment on futility monitoring. Control Clin Trials 2002; 23. 355–66. [DOI] [PubMed] [Google Scholar]
7.Lan KKG, Witts J. The B-value. a tool for monitoring data. Biometrics 1988; 44. 579–85. [PubMed] [Google Scholar]
8.Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC, Boca Raton, 2000. [Google Scholar]
9.Lan KKG, Simon R, Halperin M. Stochastically curtailed tests in long-term clinical trials. Commun Stat 1982; 11. 207–19. [Google Scholar]
10.Chang WH, Chuang-Stein C. Type I error and power in trials with one interim futility analysis. Pharm Stat 2004; 3. 51–9. [Google Scholar]
11.Lan KKG, Zucker DM. Sequential monitoring of clinical trials. the role of information and Brownian motion. Stat Med 1993; 12. 753–65. [DOI] [PubMed] [Google Scholar]
12.Reboussin DM, DeMets DL, Kim KM, et al. Computations for group sequential boundaries using the Lan-DeMets spending function method. Control Clin Trials 2000; 21. 190–207. [DOI] [PubMed] [Google Scholar]

[R1] 1.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70. 659–63. [Google Scholar]

[R2] 2.DeMets DL, Ware JH. Group sequential methods for clinical trials with one-sided hypothesis. Biometrika 1980; 67. 651–60. [Google Scholar]

[R3] 3.DeMets DL, Ware JH. Asymmetric group sequential boundaries for monitoring clinical trial. Biometrika 1982; 69. 661–3. [Google Scholar]

[R4] 4.Emerson SS, Fleming TR. Symmetric group sequential test designs. Biometrics 1989; 45. 905–23. [PubMed] [Google Scholar]

[R5] 5.Pampallona SK, Tsiatis AA. Group sequential designs for one-sided and two-sided hypothesis testing with provision for early stopping in favor of the null hypothesis. J Stat Plan Infer 1994; 42. 19–35. [Google Scholar]

[R6] 6.Freidlin B, Korn EL. A comment on futility monitoring. Control Clin Trials 2002; 23. 355–66. [DOI] [PubMed] [Google Scholar]

[R7] 7.Lan KKG, Witts J. The B-value. a tool for monitoring data. Biometrics 1988; 44. 579–85. [PubMed] [Google Scholar]

[R8] 8.Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC, Boca Raton, 2000. [Google Scholar]

[R9] 9.Lan KKG, Simon R, Halperin M. Stochastically curtailed tests in long-term clinical trials. Commun Stat 1982; 11. 207–19. [Google Scholar]

[R10] 10.Chang WH, Chuang-Stein C. Type I error and power in trials with one interim futility analysis. Pharm Stat 2004; 3. 51–9. [Google Scholar]

[R11] 11.Lan KKG, Zucker DM. Sequential monitoring of clinical trials. the role of information and Brownian motion. Stat Med 1993; 12. 753–65. [DOI] [PubMed] [Google Scholar]

[R12] 12.Reboussin DM, DeMets DL, Kim KM, et al. Computations for group sequential boundaries using the Lan-DeMets spending function method. Control Clin Trials 2000; 21. 190–207. [DOI] [PubMed] [Google Scholar]

PERMALINK

A flexible futility monitoring method with time-varying conditional power boundary

Ying Zhang

William R Clarke

Abstract

Background

Purpose

Methods

Results

Limitations

Conclusions

Introduction

Methods

Table 1.

Figure 1.

Results

Table 2.

Table 3.

Table 4.

Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A flexible futility monitoring method with time-varying conditional power boundary

Ying Zhang

William R Clarke

Abstract

Background

Purpose

Methods

Results

Limitations

Conclusions

Introduction

Methods

Table 1.

Figure 1.

Results

Table 2.

Table 3.

Table 4.

Discussion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases