Abstract
In phase III clinical trials, some adverse events may not be rare or unexpected and can be considered as a primary measure for safety, particularly in trials of life-threatening conditions, such as stroke or traumatic brain injury. In some clinical areas, efficacy endpoints may be highly correlated with safety endpoints, yet the interim efficacy analyses under group sequential designs usually do not consider safety measures formally in the analyses. Furthermore, safety is often statistically monitored more frequently than efficacy measures. Since early termination of a trial in this situation can be triggered by either efficacy or safety, the impact of safety monitoring on the error probabilities of efficacy analyses may be non-trivial if the original design does not take the multiplicity effect into account. We estimate the actual error probabilities for a bivariate binary efficacy-safety response in large confirmatory group sequential trials. The estimated probabilities are verified by Monte Carlo simulation. Our findings suggest that type I error for efficacy analyses decreases as efficacy-safety correlation or between-group difference in the safety event rate increases. In addition, while power for efficacy is robust to misspecification of the efficacy-safety correlation, it decreases dramatically as between-group difference in the safety event rate increases.
Keywords: bivariate binary response, efficacy, group sequential test, phase III clinical trial, type I error, type II error, safety
1. Introduction
In some large phase III trials, particularly for life-threatening conditions, safety and efficacy endpoints may be highly correlated. For example, in acute neurological trials (e.g., stroke and traumatic brain injury), mortality is often included in the primary efficacy measure, such as the modified Rankin Scale (mRS) [1, 2] or the Glasgow Outcome Scale (GOS) [3]. Furthermore, serious adverse events may not be rare or unexpected in these studies (e.g., 15% early death rate and 13% congestive heart failure rate in acute stroke patients receiving 25% human serum albumin [4]). In such trials, interim efficacy analyses under the group sequential (GS) design often are not considered frequent enough to sufficiently monitor safety as well. In the currently ongoing large (maximum sample size is 1,100) multi-center randomized controlled trial, the Albumin in Acute Stroke (ALIAS) Part II Trial [4], three interim efficacy analyses are planned at equally-spaced information intervals. The primary efficacy outcome is the binary “good” or “bad” outcome using scores on the mRS and the NIH (National Institutes of Health) Stroke Scale [5] assessed at 3 months from randomization. These relatively infrequent efficacy analyses are insufficient for monitoring safety, particularly if a difference in mortality between two groups exists. Therefore, statistical monitoring for safety is generally proposed more frequently than for efficacy.
GS designs are currently the most popular statistical approach to monitor efficacy in phase III clinical trials [6–8]. For safety monitoring, descriptive statistics (e.g., mean, proportion, risk ratio) with or without formal statistical guidelines are implemented in practice. Some statistical safety monitoring guidelines are often pre-specified for life-threatening conditions with an adverse event of specific interest [9]. In the ALIAS Part II Trial, in addition to the 3 interim efficacy analyses, 11 safety analyses using repeated 99% confidence intervals (CIs) for the risk ratio (RR) of early (within 30 days of randomization) death are incorporated. These safety assessments are performed after every 100 subjects, with an expectation of observing up to 15 deaths. However, since the efficacy and safety parameters are correlated but monitoring boundaries for efficacy and safety are separately constructed, it is unknown how much impact safety monitoring would have on the error probabilities of the efficacy analyses. Also unclear is whether error probability estimation for the efficacy analyses is robust to any misspecification of the safety profiles, such as the efficacy-safety correlation and the safety event rates.
For exploratory phase II trials, a variety of designs have been adopted to account for the multiplicity effect between efficacy and safety evaluations [10–12]. For large confirmatory phase III randomized concurrently-controlled GS trials involving multivariate responses, a majority of the proposed methods are based on a global statistic [13, 14] or based on controlling for a study-wise error rate [15, 16]. One problem with the global method is that it cannot provide the exact marginal error probability for each outcome, making the study results sometimes difficult to interpret clinically, especially when the primary interest is to separately ascertain the efficacy and safety profiles of the test treatment. To address this problem, Cook and Farewell developed a method to calculate the marginal and joint error probabilities for a bivariate continuous efficacy-toxicity response for GS designs [17, 18], but comparable research for a bivariate binary response is sparse. In this paper, we develop a method to calculate the marginal and joint error probabilities for a bivariate binary efficacy-safety response in large GS trials based on multivariate normal approximation. The estimated probabilities are verified by Monte Carlo simulation.
In Section 2, we present a method to calculate the marginal and joint error probabilities for a bivariate binary efficacy-safety response in large phase III trials where the safety event rates are non-trivial and correlated to the efficacy measure. In Section 3, we demonstrate the computation procedure based on a hypothetical example developed from the ALIAS Part II Trial. In Section 4, Monte Carlo simulations are carried out to verify the proposed method. The relationships between the error probabilities of efficacy analyses and safety profiles are shown in Section 5. Limitations and extension of the proposed method are discussed in Section 6.
2. Multivariate normal approximation applied to the estimate of a bivariate binary efficacy-safety response at interim analysis in group sequential designs
2.1 Bivariate normal approximation to estimate a bivariate binary efficacy-safety response
Consider a GS trial comparing a new treatment (A) to a control (B). K − 1 interim analyses (K > 1) and one final analysis are planned to monitor a bivariate binary efficacy-safety response, where the efficacy outcome is success or failure and the safety outcome is occurrence of a specific safety event. Assume subjects are equally accrued at each interim stage. Let ni denote the size of ith sequential group (i = 1, 2, …, K), namely the number of subjects enrolled in each arm between (i−1) th and i th time point (time 0 represents the beginning of the study). Let pjE and pjS denote the true efficacy rate and safety rate in treatment j, j = A, B. In addition, ni is selected so that the expected values of nipjE and nipjS are greater than 5.
Let YijE and YijS denote the number of successes and the number of safety events from ni subjects in treatment j. By normal approximation to binomial distribution,
Denote ΔiE and ΔiS the differences in the numbers of efficacy and safety events between two treatments from the ith sequential group,
| (1) |
ΔiE and ΔiS can be consider as the estimates of treatment effect on efficacy and safety from ni subjects. Due to independence of the two treatment groups, it follows that
where qjE = 1 − pjE, qjS = 1 − pjS. The covariance between ΔiE and ΔiS is given by:
| (2) |
Since YiAE and YiAS (also, YiBE and YiBS) are correlated binomial random variables, let ρA and ρB be the phi correlation coefficients between efficacy and safety in treatments A and B, their covariance are:
| (3) |
From Equations 2 and 3, we get:
Therefore, the estimates of treatment effects on efficacy and safety from ni subjects compose a random vector (ΔiE, ΔiS)', which is distributed as bivariate normal, BVN(μi, Σi), with
| (4) |
| (5) |
2.2 Joint distribution of efficacy and safety estimates from interim analyses
For a GS design with K analyses for both efficacy and safety, the estimates of treatment effect on efficacy and safety form a vector Δ̃ with length of 2K :
| (6) |
Since Δ1E, Δ2E, ⋯ΔKE are mutually independent, as are Δ1S, Δ2S, ⋯ΔKS, using matrix algebra, the vector in Equation 6 has a multivariate normal (MVN) distribution Δ̃ ~ MVN(μ̃, Σ̃) with
| (7) |
| (8) |
where μi and Σi are defined in Equation 4–5. From Δ̃, we can further develop a standardized vector Z = (Z1E, Z1S, Z2E, Z2S, ⋯, ZKE, ZKS), with ZkE and ZkS represent the standardized test statistics for efficacy and safety at time k, (k =1, 2, …, K).
2.3 Calculation of the stopping probabilities for a GS trial
Assume a trial using boundary cE = (c1E, c2E, …, cKE) to monitor efficacy and boundary cS = (c1S, c2S, …, cKS) to monitor safety. ckE and ckS are the critical values for the standardized test statistics ZkE and ZkS. Then, for a one-sided test
the marginal stopping probabilities for efficacy at time k is
| (9) |
and the marginal stopping probabilities for safety at time k is
| (10) |
Furthermore, the probability of stopping for both efficacy and safety at time k is:
| (11) |
In above and below equations, we let [Z0E <c0E]=[Z0S <c0S] = Ω. The overall marginal stopping probabilities for efficacy and safety are then given by:
| (12) |
| (13) |
and the overall probability of stopping for both efficacy and safety reasons is:
| (14) |
In many phase III trials of life-threatening conditions, the primary efficacy outcome may not be assessed for several months or even years after treatments are administered. Meanwhile, efficacy assessments can be suspended by the primary safety outcome – early death. Therefore, the true type I error for efficacy in this circumstance can be expressed as a joint probability that a study stops for efficacy but not for safety concerns, which is:
| (15) |
Likewise, the true power is:
| (16) |
Furthermore, we can use Equations 12 and 14 to derive the joint probabilities in Equations 15 and 16 with the relationships that:
True type I error = Pr(stop for efficacy; H0)−Pr(stop for efficacy and safety; H0),
True power = Pr(stop for efficacy; Ha)−Pr(stop for efficacy and safety; Ha).
3. An illustrative example and computation
We demonstrate our method using an example derived from a currently ongoing phase III trial, the Albumin in Acute Stroke (ALIAS) Part II Trial [4]. For ease of exposition, the original study design has been modified to some extent. For instance, for the hypothetical ALIAS trial, the sample size has been change from 1,100 to 1,200, and the number of safety looks also has been changed from 11 to 12. The phase III trial aims to investigate a neuroprotective drug in acute ischemic stroke. Eligible patients are randomized to either the albumin treatment or saline treatment (control). The primary efficacy endpoint is binary, i.e., treatment success or failure at 3 month from randomization. Assuming the success rate in the control group is 40%, a sample size of 1,200 is computed to adequately detect a 10% difference with 93% power in the primary efficacy analyses under a one-sided O’Brien-Fleming type GS boundary [19], with three equally spaced interim analyses (overall type I error of 0.025). Death within 30 days from randomization is considered the primary measure for safety. Although it does not govern the sample size, the safety outcome is sequentially monitored during the study. Suppose the Data and Safety Monitoring Board (DSMB) requests 12 safety analyses to examine the excess deaths after every 100 subjects are assessed using repeated one-sided tests with p-value of 0.01. The structure of the GS design is illustrated in Figure 1:
Figure 1.

Interim analysis plan
The critical values for the test statistics at kth analysis can be calculated by:
where αE(k) and αS (k) represent the nominal type I error for efficacy and safety analyses at time k under the Lan-DeMets alpha spending guideline [20], and Φ−1 (X) is X percentile of the standard normal distribution. The nominal type I errors for this study at each stage are presented in Table 1.
Table 1.
Nominal type I error at each interim analysis
| Time k |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Overall type I error |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| αE(k)* | 0 | 0 | 0.00001 | 0 | 0 | 0.0015 | 0 | 0 | 0.0092 | 0 | 0 | 0.0220 | 0.025 |
| αS(k)‡ | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.0475 |
3 interim analyses for efficacy using the one-sided O’Brien-Fleming group sequential guideline;
12 interim looks for safety using p-value < 0.01.
Let δE denote the true differences in efficacy rates between two treatments. Let δS denote the true differences in mortality rates. Let ρj be their correlation coefficient in treatment j, j = A, B. In addition, assume the true efficacy and mortality rates in the control are 40% and 15%. Using the proposed method, we calculate the true type I errors (i.e., when δE = 0) and true power (i.e., when δE = 0.10) for efficacy analyses in the following scenarios:
Scenario 1: δS = 0, ρA=ρB=0,
Scenario 2: δS = 0, ρA=ρB=0.3,
Scenario 3: δS = 0.03, ρA=ρB=0.3.
The stopping probabilities under null and alternative hypotheses for these scenarios are presented in Tables 2 and 3.
Table 2.
Stopping probabilities under H0 *
| Scenario | Parameter (δS, ρ) |
Pr(stop for efficacy) | Pr(stop for safety) | Pr(stop for safety and efficacy) | True type I error |
|---|---|---|---|---|---|
| 1 | 0, 0 | 0.0239 | 0.0475 | 0.0001 | 0.0239 |
| 2 | 0, 0.3 | 0.0224 | 0.0473 | 0.0002 | 0.0221 |
| 3 | 0.03, 0.3 | 0.0137 | 0.2915 | 0.0012 | 0.0125 |
H0 : pAE = pBE; δE = 0
Table 3.
Stopping probabilities under Ha*
| Scenario | Parameter (δS, ρ) |
Pr(stop for efficacy) | Pr(stop for safety) | Pr(stop for safety and efficacy) | True power |
|---|---|---|---|---|---|
| 1 | 0, 0 | 0.8990 | 0.0408 | 0.0023 | 0.8966 |
| 2 | 0, 0.3 | 0.9006 | 0.0383 | 0.0033 | 0.8973 |
| 3 | 0.03, 0.3 | 0.7601 | 0.2106 | 0.0273 | 0.7329 |
Ha : pAE > pBE; δE = 0.1
As shown in Tables 2 and 3, the true type I errors for the efficacy analyses are all lower than what is expected (α = 0.025), and the true power are also lower than the expected power (1 − β = 0.93).
Furthermore, to assess the relationship between the correlation coefficient ρj and marginal power for efficacy and safety, we relax the value ρj in Scenario 3, and scan the marginal stopping probabilities as ρj changes from −0.3 to 0.3. The range of ρj, according to Prentice [21], is determined by the marginal probabilities of two responses. For this study,
Thus, when pE = 0.40, pS = 0.15, ρ ∈ [−0.343, 0.514]. As shown in Table 4, as ρj increases, the marginal stopping probability for efficacy increases, while the marginal stopping probability for safety decreases. This result is different from the conclusion in Cook and Farewell’s paper [17] that, the smallest plausible ρ will guarantee the minimum power requirement for both efficacy and safety analyses.
Table 4.
Correlation and marginal stopping probability
| ρ | −0.3 | −0.2 | −0.1 | 0 | 0.1 | 0.2 | 0.3 |
|---|---|---|---|---|---|---|---|
| Marginal stopping probability for efficacy | 0.752 | 0.752 | 0.753 | 0.754 | 0.756 | 0.758 | 0.760 |
| Marginal stopping probability for safety | 0.231 | 0.228 | 0.224 | 0.221 | 0.218 | 0.214 | 0.211 |
ρA = ρB = ρ
In addition, we explore the relationship between the sample size and marginal power. We let δE = 0.10, δS = 0.03, ρA = ρB = 0, and allow the total sample size vary from 1,200 to 2,400. As shown in Table 5, as sample size increases, the marginal stopping probability for efficacy first increases then decreases, while the marginal stopping probability for safety keeps increasing. This is also different from the recommendation from Cook and Farewell [17] that choosing a larger sample size will satisfy the minimum power requirement for both efficacy and safety analyses.
Table 5.
Sample size and marginal stopping probability
| Sample size | 1200 | 1440 | 1680 | 1920 | 2160 | 2400 |
|---|---|---|---|---|---|---|
| Marginal stopping probability for efficacy | 0.754 | 0.766 | 0.767 | 0.762 | 0.755 | 0.746 |
| Marginal stopping probability for safety | 0.221 | 0.238 | 0.253 | 0.268 | 0.281 | 0.294 |
ρA = ρB = 0
4. Simulation study
To assess the validity of the proposed method, we perform Monte Carlo simulations in 5 scenarios with various efficacy and safety parameters (i.e., pjE, pjS and ρj), and apply 100,000 runs for each simulation. We use the same study design as those in Section 3. In the simulation, crossing of the boundary is judged according to the comparison between the p-value from the interim analysis and the nominal alpha in Table 1. The study will be stopped as soon as either the efficacy or safety boundary is crossed. The stopping probabilities from the simulation are then compared with those probabilities resulting from the proposed method in Table 6.
Table 6.
Comparison of estimates from the MVN approximation method and the simulation study
| Scenario | Group: (pE, pS, ρ) | Probability | MVN approximation | Simulation (100,000 times) |
|---|---|---|---|---|
| 1 | A: (0.50, 0.18, 0) | True power | 0.73250 | 0.73440 |
| B: (0.40, 0.15, 0) | Stop for safety | 0.22100 | 0.21872 | |
| Stop for safety and efficacy | 0.02151 | 0.02155 | ||
| 2 | A: (0.50, 0.18, 0.3) | True power | 0.73217 | 0.73129 |
| B: (0.40, 0.15, 0) | Stop for safety | 0.21569 | 0.21660 | |
| Stop for safety and efficacy | 0.02449 | 0.02486 | ||
| 3 | A: (0.48, 0.19, 0) | True power | 0.51330 | 0.51348 |
| B: (0.40, 0.15, 0) | Stop for safety | 0.37027 | 0.36981 | |
| Stop for safety and efficacy | 0.02737 | 0.02751 | ||
| 4 | A: (0.60, 0.25, 0.3) | True power | 0.22893 | 0.22709 |
| B: (0.40, 0.15, 0.3) | Stop for safety | 0.77102 | 0.77291 | |
| Stop for safety and efficacy | 0.11499 | 0.11423 | ||
| 5 | A: (0.70, 0.25, 0.3) | True power | 0.76350 | 0.76104 |
| B: (0.50, 0.20, 0.5) | Stop for safety | 0.23631 | 0.23896 | |
| Stop for safety and efficacy | 0.05753 | 0.05795 |
As shown in Table 6, the approximation results consistently match the simulation results, indicating that the proposed method is valid for power calculation for bivariate binary responses in GS designs.
5. Relationship between error probabilities for efficacy analyses and safety profiles
For trials with an efficacy endpoint and a safety endpoint, it is always desired that the true error probabilities for the efficacy analyses be robust to modest misspecification of the safety profiles. Here, the safety profiles refer to the safety-efficacy correlation (ρj) and the between-treatment differences in safety rates (δS), both of which are unknown in advance. We evaluate the robustness of these relationships in this section.
5.1 Relationship between type I errors for efficacy, ρj and δS
We set the total sample size as 1,200, assume the efficacy and safety rates in control as 0.40 and 0.15, and use the similar statistical boundaries from in Table 1. To obtain the type I error for efficacy, we set δE as 0. By applying different ρj (ρj ∈ [−0.3, 0.3]) and δS (δS ∈ [0, 0.1]), we obtain the true type I errors under different scenarios and plot them in Figure 2. As shown in Figure 2, for all ρj ∈ [−0.3, 0.3], the true type I errors are bounded by the points from the situations when ρA = ρB = −0.3 and ρA = ρB = 0.3, which indicate that true type I error for efficacy analyses decreases as ρj increases. Meanwhile, the type I error is driven by δS – as δS increases, the type I error decreases rapidly. A 5% increase in δS results in a 50% decrease in the true type I error for the efficacy analysis.
Figure 2.
True Type I error, ρj ∈ [−0.3, 0.3]
5.2. Relationship between true power for efficacy, ρj and δS
Using the similar ranges of ρj and δS, we estimate the true power for efficacy under five effect sizes of efficacy, i.e. δE = {0, 0.05, 0.10, 0.15, 0.20}. The results from our work suggest that true power for the efficacy analyses increases as δE increases, while it decreases as δS increases. Figure 3 presents 5 shaded bands representing the true power under 5 different sizes of δE. The width of each band represents the range of variations in power as ρj varies from −0.3 to 0.3 (the mid-line within each band represents the points when ρA = ρB = 0). As shown in Figure 3, most parts of these bands are narrow, especially when δE ≥ 0.10 and δS ≤ 0.05, indicating that power is robust to change in ρj if the true effect size does not have a dramatic deviation from the original assumptions (i.e., δE = 0.10, δS = 0.0). However, Figure 3 also indicates that power for efficacy drops rapidly as δS increases. For instance, when δE is 0.20, a 5% increase in δS leads to more than 20% loss of power.
Figure 3.
True power bands, ρj ∈ [−0.3, 0.3]
In addition to the stopping probability for efficacy, we also plot the stopping probability for safety in Figure 4. It shows that, no matter how optimal the treatment effect is for efficacy, the probability of stopping for safety always rises steeply as δS increases. These results can explain the findings in Figure 3 – due to the increase in stopping probability for safety, the trial is less likely to be stopped for efficacy. In other words, the safety endpoint and efficacy endpoint behave like two competing risks for termination of the study during sequential monitoring. In addition, in Figure 5, we plot the conditional probability of stopping for efficacy given not stopping for safety. From the figure, we can see that the conditional probability always increases as δE increases. Moreover, the conditional probability is not so robust to variations of ρj when δE is small (0 or 0.05) or δS is greater (0.05~0.10).
Figure 4.
Probability of study stopping for safety, ρj ∈ [−0.3, 0.3]
Figure 5.
Probability of study stopping for efficacy given not stopping for safety, ρj ∈ [−0.3, 0.3]
6. Discussion
In this paper, we use a multivariate normal approximation approach to estimate the marginal and joint stopping probabilities for a bivariate binary efficacy-safety response in large confirmatory GS trials. Since normal approximation is valid when the product of the sample size and event rate is greater than 5, the proposed method requires the expected number of events for the first interim analyses to be greater than 5. This requirement is quite reasonable for large confirmatory trials where a specific safety outcome that is being monitored is not rare. When the sample size is small or the adverse event is rare, the proposed method might not be suitable. But in those situations, safety monitoring using descriptive statistics is generally preferred over formal statistical guidelines, and the impact of safety monitoring on the error probabilities of efficacy analyses might be less. Further exploration in the exact joint distribution of bivariate binary responses will provide a more complete answer for small trials with rare safety events.
For ease of exposition, we have presented a simple GS design with equal increments in information for each interim look and use one-sided tests for both efficacy and safety analyses. In practice, this method can be applied in those designs with arbitrary timing of interim analyses and a mixture of one-sided and two-sided tests for safety and efficacy endpoints. The true power for efficacy in this paper is defined as the probability of having a favorable efficacy outcome without a safety concern (Equation 16). We use this joint probability because, in many phase III trials with early death as the primary safety outcome, when a subject reaches the safety endpoint, efficacy observations have to be suspended even if the study drug is truly efficacious for that subject. We recognize this definition is not always applicable and could vary depending on study purposes and diseases of interest. But as shown in Equations 12 and 13, marginal power for efficacy and safety can be easily attained from the proposed method if they are of more interest.
Although the joint distribution of test statistics for bivariate normal responses in GS designs has been studied by Cook and Farewell [17], we believe it is still informative and imperative to make an exploration in the bivariate binary responses. Since the variance-covariance is arbitrarily assumed for the bivariate continuous responses, the method and relevant conclusions in Cook and Farewell’s paper [17] might not be applicable for bivariate binary responses. For example, Cook and Farewell state that, when the correlation of efficacy and safety, ρ, is unknown, using the smallest acceptable ρ will guarantee the greater power for both efficacy and safety analyses. However, our study suggests that marginal power for a bivariate binary response can sometimes decrease as ρ increases, which has been shown in Table 4. In addition, Cook and Farewell [17] recommend that, “if there are power requirements for both the efficacy and toxicity analyses there will be two group sizes, g1 and g2, determined. Taking max(g1, g2) will satisfy the more stringent requirement and provide a more powerful analysis for the remaining outcome”. From Table 5, we can see that the assumption also may not hold for binary data. These discrepancies might be explained by the fact that the covariance matrix of the test statistics for a bivariate binary response is constrained by its means (event rates), while the covariance matrix for a bivariate continuous response is often determined from the external information or is arbitrarily assumed [17, 18, 22].
Like a majority of the GS designs for superiority tests, the method introduced in this paper assumes that the DSMB complies with the statistical guidelines throughout the trial, which may not always be the case in practice. In fact, decision making by the DSMB, especially when it is considering multiple endpoints simultaneously, is based on many facets other than statistical evidence. An extended approach incorporating random effects in the decision making processes of DSMBs is therefore needed to make power estimations more practical and realistic for GS designs.
Finally, for a trial with a bivariate efficacy-safety response, it is always desirable that the estimated error probabilities for the efficacy analyses be robust to modest misspecification of the safety profiles because the exact safety profiles are usually unavailable in advance. However, our findings suggest that joint power as well as marginal power for efficacy analyses are very sensitive to variations in the safety profiles. For instance, in the example, when the effect size for efficacy is 10%, marginal power can decrease by 7% if the between-group difference in safety rates changes from 0 to 2%; when the effect size for efficacy is 20%, a 5% difference in safety event rates could lead to 20% loss of marginal power for the efficacy analyses. Such a decline in power has been shown to be related to the dramatic increase in the stopping probability for safety in Section 5.2. Consequently, if a new treatment has optimal efficacy while accompanied by a slightly increased adverse event rate, which can be common in life-threatening conditions, the use of two univariate GS boundaries for a bivariate response might be problematic. This is because applying two separate boundaries ignores the multiplicity effect between efficacy and safety, leading to an underpowered study. Because of these relationships, it is imperative that investigators use the bivariate GS design to plan their trials if both efficacy and safety endpoints are of interest, and consider the various potential scenarios for efficacy and safety parameters before the study.
Acknowledgements
We would like to thank Dr. Stacia DeSantis for helpful discussions. We also thank the Editor and two referees for their insightful comments, which greatly improved our manuscript. This work was supported by a National Institute of Neurological Disorders and Stroke (NINDS) grant, U01 NS054630.
Reference
- 1.Rankin J. Cerebral vascular accidents in patients over the age of 60. II. Prognosis. Scottish medical journal. 1957;2:200–215. doi: 10.1177/003693305700200504. [DOI] [PubMed] [Google Scholar]
- 2.Bonita R, Beaglehole R. Recovery of motor function after stroke. Stroke; a journal of cerebral circulation. 1988;19:1497–1500. doi: 10.1161/01.str.19.12.1497. [DOI] [PubMed] [Google Scholar]
- 3.Jennett B, Bond M. Assessment of outcome after severe brain damage. Lancet. 1975;1:480–484. doi: 10.1016/s0140-6736(75)92830-5. [DOI] [PubMed] [Google Scholar]
- 4.Ginsberg MD, Palesch YY, Martin RH, Hill MD, Moy CS, Waldman BD, et al. The albumin in acute stroke (ALIAS) multicenter clinical trial: safety analysis of part 1 and rationale and design of part 2. Stroke; a journal of cerebral circulation. 2011;42:119–127. doi: 10.1161/STROKEAHA.110.596072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brott T, Adams HP, Jr, Olinger CP, Marler JR, Barsan WG, Biller J, et al. Measurements of acute cerebral infarction: a clinical examination scale. Stroke; a journal of cerebral circulation. 1989;20:864–870. doi: 10.1161/01.str.20.7.864. [DOI] [PubMed] [Google Scholar]
- 6.Food and Drug Administration. Guidance for Clinical Trial Sponsors: Establishment and Operation of Clinical Trial Data Monitoring Committee. 2006 [Google Scholar]
- 7.Friedman LM, Furberg C, DeMets DL. Fundamentals of clinical trials. 3rd ed. New York: Springer; 1998. [Google Scholar]
- 8.Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Boca Raton: Chapman & Hall/CRC; 2000. [Google Scholar]
- 9.Whitehead J. On being the statistician on a Data and Safety Monitoring Board. Statistics in medicine. 1999;18:3425–3434. doi: 10.1002/(sici)1097-0258(19991230)18:24<3425::aid-sim369>3.0.co;2-d. [DOI] [PubMed] [Google Scholar]
- 10.Bryant J, Day R. Incorporating toxicity considerations into the design of two-stage phase II clinical trials. Biometrics. 1995;51:1372–1383. [PubMed] [Google Scholar]
- 11.Ivanova A, Qaqish BF, Schell MJ. Continuous toxicity monitoring in phase II trials in oncology. Biometrics. 2005;61:540–545. doi: 10.1111/j.1541-0420.2005.00311.x. [DOI] [PubMed] [Google Scholar]
- 12.Ray HE, Rai SN. An evaluation of a Simon 2-Stage phase II clinical trial design incorporating toxicity monitoring. Contemporary clinical trials. 2011;32:428–436. doi: 10.1016/j.cct.2011.01.006. [DOI] [PubMed] [Google Scholar]
- 13.Jennison C, Turnbull BW. Exact Calculations for Sequential t, chi-square and F tests. Biometrika. 1991;78:133–141. [Google Scholar]
- 14.Tang DI, Geller NL, Pocock SJ. On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics. 1993;49:23–30. [PubMed] [Google Scholar]
- 15.Jennison C, Turnbull BW. Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints. Biometrics. 1993;49:741–752. [PubMed] [Google Scholar]
- 16.Kosorok MR, Yuanjun S, DeMets DL. Design and analysis of group sequential clinical trials with multiple primary endpoints. Biometrics. 2004;60:134–145. doi: 10.1111/j.0006-341X.2004.00146.x. [DOI] [PubMed] [Google Scholar]
- 17.Cook RJ, Farewell VT. Guidelines for monitoring efficacy and toxicity responses in clinical trials. Biometrics. 1994;50:1146–1152. [PubMed] [Google Scholar]
- 18.Cook RJ. Coupled error spending functions for parallel bivariate sequential tests. Biometrics. 1996;52:442–450. [PubMed] [Google Scholar]
- 19.O'Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–556. [PubMed] [Google Scholar]
- 20.Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
- 21.Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:1033–1048. [PubMed] [Google Scholar]
- 22.Todd S. An adaptive approach to implementing bivariate group sequential clinical trial designs. Journal of biopharmaceutical statistics. 2003;13:605–619. doi: 10.1081/BIP-120024197. [DOI] [PubMed] [Google Scholar]




