Abstract
Traditional two-arm randomized trial designs have played a pivotal role in establishing the efficacy of medical interventions. However, their efficiency is often compromised when confronted with multiple experimental treatments or limited resources. In response to these challenges, the multi-arm multi-stage designs have emerged, enabling the simultaneous evaluation of multiple treatments within a single trial. In such an approach, if an arm meets an efficacy success criteria at an interim stage, the whole trial stops and the arm is selected for further study. However when multiple treatment arms are active, stopping the trial at the moment one arm achieves success diminishes the probability of selecting the best arm. To address this issue, we have developed a group sequential multi-arm multi-stage survival trial design with an arm-specific stopping rule. The proposed method controls the familywise type I error in a strong sense and selects the best promising treatment arm with a high probability.
Keywords: Multi-arm multi-stage, group sequential design, log-rank test, sequential conditional probability ratio test, time-to-event
1. Introduction
Traditional clinical trial designs, exemplified by the two-arm parallel design, have played a pivotal role in establishing the efficacy of medical interventions. However, their efficiency is often compromised when confronted with multiple experimental treatments or limited resources. In response to these challenges, the multi-arm multi-stage (MAMS) designs have emerged, enabling the simultaneous evaluation of multiple treatments within a single trial. This innovation significantly reduces the overall time, cost, and patient burden required to bring new therapies to market.
Several group sequential MAMS designs have been proposed including Stallard and Friede, 2008; Magirr, Jaki and Whitehead, 2012 (refer to as MJW hereafter); Wason and Jaki, 2012; Magirr et al., 2014; Wu et al., (2023), etc. Notably, Jaki and Magirr (2013) and Wu and Li (2023) introduced a group sequential MAMS trial design with time-to-event endpoints. In their approach, all treatment arms commence simultaneously. If an arm meets predefined futility criteria at an interim stage, it is discontinued from the trial, and the future subjects allocated to that arm are not reassigned to the remaining arms. Conversely, if an arm meets efficacy success criteria at an interim stage, the entire trial is halted, and the arm is selected and declared as the success arm.
However, this design faces inadequacy when multiple arms show potential for meeting efficacy success criteria if provided with the opportunity to continue the trial. Stopping the trial at the moment one arm achieves success diminishes the probability of selecting the best arm. To address this issue, we propose a new group sequential MAMS survival trial design that discontinues an arm upon demonstrating efficacy at an interim analysis but continues the evaluation of all remaining arms; termed arm-specific stopping rule based design. In such MAMS trial designs, the arm(s) selection are made after the final analysis. Either the most effective arm or multiple arms may be selected. The arm(s) selection is more flexible and can be made based on other clinical factors, e.g., treatment cost or toxicity information.
Our proposed method is based on the sequential conditional probability ratio test (SCPRT) procedure (Xiong, 1995) and employs an event-driven approach. The SCPRT procedure offers analytical solutions for establishing futility and efficacy boundaries, adaptable to trials with any number of stages and arms. Importantly, it maintains effective control over the familywise error rate (FWER) through the utilization of the Dunnett correction under a global null hypothesis. Furthermore, through the application of an event-driven approach, our proposed group sequential MAMS designs demonstrate resilience in the face of various challenges, including variations in accrual, issues related to censoring, accrual rate, and accurate specification of survival distributions, which are often challenging to establish accurately at the design stage.
2. Log-rank test
We consider a -arm trial that compares treatment arms to a common control arm and label the arms , where 0 represents the control arm. Let and be the hazard and cumulative survival functions of the th arm, respectively. Assume proportional hazards models between each treatment arm and the control, or , where is the hazard ratio of the th treatment arm to the control arm. Assume that a total of patients are randomized to arms with the allocation ratio for the th arm (), where is the sample size for th arm and is the total sample size. Let be the standardized two-sample log-rank test for comparing the th treatment arm to the control arm. It has been shown (Schaid et al., 1990; or Appendix 1 of supplemental material) that the asymptotic joint distribution of is a multivariate normal distribution with mean , where
| (1) |
and variance-covariance matrix , where
| (2) |
with as the total number of events and as the overall failure probability given by following equations (3) and (4). Assuming the accrual distribution is and recruitment with an accrual duration and follow-up time , and loss to follow-up distribution , then failure probability for arm can be calculated as follows:
| (3) |
where is the study duration and is the hazard function for the arm. The overall failure probability of is then given as follows
| (4) |
In this article, we assume that sample size allocation to each treatment arm is equal and sample size allocation ratio between the treatment and control arms is (), then it is easy to verify by equations (1) and (2) that under the global null hypothesis , the joint distribution of follows a -dimensional multivariate normal , where variance-covariance matrix is with
| (5) |
3. Familywise error rate
For a multi-arm trial, to control overall type I error due to multiple comparisons, it is common to consider the familywise error rate (FWER) which is the probability of rejecting at least one true null hypothesis across a set of null hypotheses . Strong control of the FWER at level is that the FWER is below for all possible values , } in the region of the null hypothesis. Magirr et al. (2012) have shown that the FWER is maximized under the global null for the simultaneous stopping rule (stopping for efficacy results in the trial terminating). Therefore, controlling FWER under the global null hypothesis provides strong control of the FWER. Similar results are applied to the arm-specific stopping rule design (Halabi and Michiels, 2019). In this paper, we consider controlling FWER under the global null hypothesis at level of . We can use the Dunnett correction (Dunnett, 1955) by choosing satisfying
| (6) |
where is a multivariate normal density function with mean 0 and variance-covariance matrix given by equation (5). Using numeric integration, such as in the method of Genz and Bretz (2009) which is implemented in R, critical values can be numerically solved from the above equation (6).
4. Power for multi-arm trial
The power of a multi-arm trial is more complex and has several definitions. Let the hazard ratio be a minimal clinically relevant treatment effect that we want to detect. We define an alternative hypothesis for a particular treatment arm as and the global alternative hypothesis for the -treatment arm trial as , that is . In this paper, we consider two types of powers for multi-arm trial designs as outlined in the following subsections.
4.1. Disjunctive power
The disjunctive power (Wason and Jaki, 2012) is the probability of rejecting at least one null hypothesis under the global alternative hypothesis. Let be the total sample size and be the critical value to reject the null hypothesis based on Dunnett correction. Then the disjunctive power is given as follows:
| (7) |
where is a multivariate normal density function with mean and variance-covariance matrix under the global alternative hypothesis , where and are given by equations (1) and (5).
4.2. Power under least favorable configuration
Under the alternative hypothesis , where represents a clinically relevant improvement and is an effect size such that if , then we would prefer not to proceed further in investigating treatment . This is known as the least favorable configuration (LFC) (Dunnett, 1984). Let be the total sample size and be the critical value to reject the null hypothesis based on Dunnett correction. Then the power under LFC is given as follows:
| (8) |
where is a multivariate normal density function with mean which is calculated using equation (1) under the LFC alternative hypothesis and variance-covariance matrix is given by equation (5).
5. Group sequential MAMS design
5.1. SCPRT group sequential procedure
We now consider a group sequential MAMS trial with -stage and arms and define be the two-sample log-rank test using cumulative data up to the interim look for comparing the treatment arm to the common control, and be the information time at the interim look, where (including the final analysis). Based on the SCPRT procedure (Xiong, 1995), the sequential futility and efficacy boundaries at the interim look are given as follows:
| (9) |
| (10) |
where is the critical value at final stage with one-sided FWER , and is a boundary coefficient which is determined by the maximum conditional probability of discordance (Xiong et al., 2003), the probability of the conclusion obtained by the sequential test at an interim look is being reversed by the test at the end of study. At the interim look, if the two-sample log-rank test satisfies or , the treatment arm is dropped (stop for futility) or graduated (stop for efficacy) from the study. Otherwise, the treatment arm goes to the next stage. At the final analysis (with and ), if , the treatment arm is declared active, otherwise, the treatment arm is futile. The nominal significance levels at the interim look for testing hypothesis are given by
where is the cumulative distribution function of the standard normal. We accept or reject the null hypothesis at interim analysis for treatment arm if the observed -value of the test is greater than or less than ; otherwise the trial continues to the next stage.
It is crucial to choose an appropriate boundary coefficient for the design such that the probability of conclusion from sequential test being reversed by the test at the planned end is small, but not unnecessarily too small. Specifically, Let be the event that the conclusion at an interim time be reversed at the final time and be a drift parameter of the Brownian motion . Let , which is the conditional probability of discordance given the last stage observation , and let , which is the maximum conditional probability of discordance. Boundary coefficient in equations (9) and (10) is determined by choosing appropriate . A smaller results in a larger so that upper and lower boundaries are further apart, which leads to a larger expected sample size. We recommend , as it leads to a maximum discordance probability 0.0056 and results in a SCPRT boundary that is efficient as well as preserving the agreement of conclusions between the test at the early stopping and the test at the planned end. For balanced information time, the boundary coefficient is calculated for a given in Table 1 for . For unbalanced information time, we still can use with balanced information time. This only results in some slight changes in discordance probability (Xiong et al., 2003).
Table 1:
The maximum conditional probability of discordance and boundary coefficient for a -stage (include final analysis) group sequential SCPRT procedure with balanced information times.
| SCPRT boundary coefficient for interim analyses | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.02 | 2.109 | 2.645 | 2.953 | 3.166 | 3.327 | 3.456 | 3.562 | 3.652 | 3.729 |
5.2. Interim analysis
The time for interim analysis is determined by information spending on the trial or information time. For the trial with a time-to-event endpoint, the information is determined by the number of events instead of sample size. Let be the total number of events required for the trial where is the number of events for th arm, . After a combined number of events observed between th treatment arm and the control at interim look, we conduct an interim analysis for that treatment arm (vs the control), where is the pre-specified information time at the look, e.g., for an equal number of events per arm per stage. This procedure needs to be performed for all remaining treatment arms at -stage (these interim analyses are not conducted at same calendar time). This approach depends on the combined number of events between the treatment arm and the control at each interim analysis. Hence, dropping arm(s) due to futility or graduating arm(s) due to efficacy has no impact on the information time. Thus, the proposed procedure for interim analyses can preserve the power for the group sequential MAMS trial with dropping the futility arm(s) or/and graduating the efficacy arm(s) during the trial. With the interim analysis approach based on the combined number of events between a treatment arm and the control, there is a window time period (interim-window) for an interim analysis because the interim analysis occurs at different time points for different experimental treatment arms.
5.3. Group sequential FWER and power
To assess the FWER and power for a group sequential MAMS trial with -arm and -stage, we define following events and for treatment arm at interim look, then, the event rejecting null hypothesis within stages for treatment arm is given by
| (11) |
where represents (full set). Thus, one-sided FWER under the global null can be calculated as follows:
| (12) |
The disjunctive power under global alternative can be calculated as follows:
| (13) |
The power under the LFC alternative can be calculated as follows:
| (14) |
Using these formulae combined with the joint multivariate normal distribution of for and , the FWER and power in the group sequential setting can be calculated through multivariate integrations. It can be shown that the power function based on the SCPRT procedure is approximately same as that for the fixed sample design. Specifically, for a group sequential procedure with interim analyses, let and be the power functions for the fixed sample test and -stage group sequential SCPRT test, where is a drift parameter of the Brownian motion and be the maximum discordant probability of the group sequential SCPRT procedure, respectively. Following Theorem 4.1 in Xiong (1995) for any , we have
which implies that the difference between the two power functions is less than . Thus, with a small maximum discordant probability , the power of a fixed sample design provides approximately the same power for the group sequential trial based on the SCPRT procedure. With recommended maximum conditional probability of discordance , it leads to a maximum discordance probability . More details for computation of the maximum discordance probability can be found in Xiong et al., (2002). We also conduct simulation studies to demonstrate that the proposed group sequential MAMS design preserves the nominal FWER and power (see Section 6).
5.4. Implementation of trial design
The proposed group sequential MAMS design has been implemented in R codes (see supplemental materials). The SCPRT procedure has been implemented in a user-friendly software SCPRTinfWin which can be downloaded at https://www.stjude.org/research/departments/biostatistics/software/scprt (Xiong, 2017). The futility and efficacy boundaries and of the SCPRT procedure with Dunnett correction for the designs with number of arms up to and number of stages are given in Table 2. For example for a MAMS trial with three arms (two treatment arms and a control arm) () and three interim looks () with balanced information times and maximum conditional probability of discordance , we can calculate the boundary coefficient using SCPRTinfWin software (also see Table 1). Then, given FWER of , we use the following R function ‘Dunnett’ to calculate critical value with the Dunnett correction, and R function ‘SCPRT’ to calculate futility and efficacy boundaries and . The required total sample size () and total number of events () for the study designs given in Table 3 can be calculated using R functions ‘DisSize’ and ‘LFCSize’ with disjunctive power and LFC power, respectively.
Table 2:
The SCPRT boundaries and for group sequential -stage -arm designs with Dunnett type adjustment for one-sided FWER .
| 2 | 0.5 | −0.097 | 2.807 | 0.006 | 2.910 | 0.076 | 2.980 |
| 1 | 1.916 | 1.916 | 2.062 | 2.062 | 2.161 | 2.161 | |
| 3 | 1/3 | −0.772 | 2.984 | −0.687 | 3.068 | −0.630 | 3.126 |
| 2/3 | 0.237 | 2.892 | 0.356 | 3.012 | 0.437 | 3.092 | |
| 1 | 1.916 | 1.916 | 2.062 | 2.062 | 2.161 | 2.161 | |
| 4 | 0.25 | −1.147 | 3.063 | −1.074 | 3.136 | −1.024 | 3.185 |
| 0.5 | −0.364 | 3.073 | −0.260 | 3.176 | −0.190 | 3.246 | |
| 0.75 | 0.444 | 2.874 | 0.571 | 3.001 | 0.656 | 3.087 | |
| 1 | 1.916 | 1.916 | 2.062 | 2.062 | 2.161 | 2.161 | |
Table 3:
Total sample size () and number of events () are calculated for multi-arm trials with one-sided FWER of 5% and 90% disjunctive and LFC powers under exponential distributions for various design scenarios: median of the control (months), accrual duration (months), follow-up time (months) various of hazard ratio , and number of treatment arms with and for disjunctive power and and for LFC power. Simulations are conducted to estimate the empirical power (EP) and FWER based on 10,000 simulation runs.
| Disjunctive | LFC | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| FWER | EP | FWER | EP | ||||||
| 1.4 | 6 | 542 | 383 | 0.049 | 0.905 | 731 | 542 | 0.046 | 0.893 |
| 1.5 | 6 | 382 | 264 | 0.050 | 0.905 | 509 | 374 | 0.052 | 0.889 |
| 1.6 | 6 | 290 | 197 | 0.052 | 0.908 | 382 | 278 | 0.050 | 0.882 |
| 1.4 | 12 | 472 | 383 | 0.050 | 0.905 | 643 | 542 | 0.051 | 0.893 |
| 1.5 | 12 | 331 | 264 | 0.048 | 0.908 | 447 | 374 | 0.054 | 0.897 |
| 1.6 | 12 | 251 | 197 | 0.052 | 0.911 | 335 | 278 | 0.052 | 0.893 |
| 1.4 | 16 | 446 | 383 | 0.054 | 0.908 | 611 | 542 | 0.054 | 0.893 |
| 1.5 | 16 | 312 | 264 | 0.051 | 0.906 | 424 | 374 | 0.053 | 0.892 |
| 1.6 | 16 | 236 | 197 | 0.049 | 0.909 | 318 | 278 | 0.051 | 0.892 |
# R function ‘Dunnett’ calculate critical value with Dunnett correction
library(mvtnorm)
Dunnett(alpha=0.05,r=1, K=2)
$critical.value
[1] 1.916
# R function ‘SCPRT’ calculate SCPRT lower and upper boundaries
SCPRT(alpha=0.05, r=1, K=2, frac=c(1/3,2/3,1))
$critical.value
[1] 1.916
$lshape
[1] −0.772, 0.237, 1.916
$ushape
[1] 2.984, 2.892, 1.916
# R function ‘DisSize’ calculate total number of events and sample size
# for disjunctive power
DisSize(kappa=1,m0=7.3,delta=1/1.4,ta=24,tf=6,beta=0.1,r=1,eta=0,K=2)
$total.number.events
[1] 383
$total.sample.size
[1] 542
# R function ‘LFCSize’ calculate total number of events and sample size
# for LFC power
LFCSize(kappa=1,m0=7.3,delta=1/1.4,delta0=1,ta=24,tf=6,beta=0.1,r=1,eta=0,K=2)
$total.number.events
[1] 542
$total.sample.size
[1] 731
All R functions including R functions for group sequential MAMS trial designs and operating characteristics simulations are available in Appendix 2 supplemental material.
6. Simulation
In this section, simulation studies were conducted to determine the performance of the proposed fixed sample designs and operating characteristics of the proposed group sequential MAMS designs. We will also compare the selection probabilities of the most effective arm(s) between the two designs: utilizing a simultaneous stopping rule and an arm-specific stopping rule.
6.1. Performance of fixed sample design
The first simulation study is to verify the accuracy of the proposed sample size (or number of events) calculations for multi-arm fixed sample designs. For sample size calculation, we assume equal sample size allocation (). Sample sizes are calculated under exponential survival distributions . The design parameter configurations are given as follows: number of treatment arms ; uniform accrual with accrual duration (months) and follow-up time and 16 (months); the value of is selected to reflect a median survival time (months) of the control, and inverse hazard ratio is set to be 1.4, 1.5 and 1.6 . We further assume no loss to follow-up (administrative censoring only). Thus, censoring time due to the patient’s staggered entry follows a uniform distribution on interval . Table 3 presents the total number of events and sample size calculated under various scenarios with one-sided FWER of 5% and disjunctive power and LFC power of 90%. The sample size calculation results in Table 3 showed that studies with disjunctive power require the smaller total sample sizes (number of events) than the studies under LFC. Increasing the duration of follow up significantly reduces the total sample size but does not change the total number of events. Simulation results (based on 10,000 simulated trials) showed that the empirical FWERs and powers are all closer to their nominal level 5% and 90%, respectively. Therefore, the proposed sample size formula provides an accurate sample size (number of events) estimation for fixed sample designs.
6.2. Operating characteristics of MAMS design
The second simulation is to study the operating characteristics of the proposed group sequential MAMS design. We considered trials with a number of treatment arms and number of stages . Sample sizes were calculated for fixed sample designs under exponential distribution with one-sided FWER of 5% and power of 90% by assuming a median survival time of 7.3 months for the control group and an inverse hazard ratio ; uniform accrual with accrual duration (months), follow-up time (months) and no loss to follow-up. The empirical FWER and power for the corresponding group sequential MAMS design were obtained from 10,000 simulation runs. The results in Table 4 showed that all simulated empirical FWERs and powers were closer to their nominal values. Thus, we have empirically verified that proposed sample size calculation for the fixed sample design preserves the FWER and power for the group sequential MAMS design. Table 4 also provided the operating characteristics of the proposed group sequential MAMS designs, such as expected total sample size, number of events and study duration. As expected, that two-stage MAMS designs reduced the expected sample size, number of events, and study duration compared to the fixed sample design, and three-stage designs showed only marginal reductions in these quantities compared to the two-stage designs.
Table 4:
The operating characteristics of the proposed group sequential MAMS designs with number of treatment arms , 3, number of stages , accrual duration (months), follow-up time (months) and total study duration (months). Total sample size () and number of events () are calculated with one-sided FWER 5% and power of 90% under exponential model with median survival time 7.3 months for the control group to detect a hazard ratio . The corresponding empirical FWER, disjunctive (Dis) and LFC powers, expected sample size, number of events and study duration are estimated based on 10,000 simulated trials.
| Type | FWER | Power | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | Dis | 474 | 429 | 460 | 384 | 325 | 352 | 36 | 31.1 | 34.8 | 0.054 | 0.907 |
| LFC | 645 | 590 | 610 | 543 | 447 | 466 | 36 | 31.1 | 34.7 | 0.054 | 0.894 | ||
| 2 | 3 | Dis | 474 | 427 | 460 | 384 | 305 | 344 | 36 | 29.2 | 34.2 | 0.055 | 0.910 |
| LFC | 645 | 589 | 611 | 543 | 423 | 448 | 36 | 29.5 | 33.7 | 0.054 | 0.894 | ||
| 3 | 2 | Dis | 592 | 531 | 577 | 476 | 396 | 441 | 36 | 31.7 | 35.6 | 0.051 | 0.910 |
| LFC | 932 | 847 | 870 | 792 | 634 | 657 | 36 | 31.9 | 35.3 | 0.052 | 0.893 | ||
| 3 | 3 | Dis | 592 | 525 | 578 | 476 | 369 | 432 | 36 | 30.0 | 35.4 | 0.053 | 0.913 |
| LFC | 932 | 843 | 868 | 792 | 595 | 628 | 36 | 30.3 | 34.5 | 0.051 | 0.888 | ||
6.3. Comparison of selection probabilities
The goal of a phase II MAMS trial is to select one or more arms for advancement to phase III trial. Thus, the probability of selecting the most effective arm(s) is an important operating characteristic of the trial design. However, theoretical calculation of the selection probability for a MAMS trial is difficult, therefore we conducted a third simulation to compare the selection probabilities between two designs: one utilizing a simultaneous stopping rule (MJW method) which selects the most effective arm only and the other employing an arm-specific stopping rule (proposed method) which can select one or more effective arm(s). The treatment arm(s) selection is based on the observed values of the two-sample log-rank test statistics for both methods.
For the simulation, we consider that the trials involved three experimental arms () each undergoing three interim analyses () compared to a common control. Our analysis evaluated the performance of the established MJW method using O’Brien and Fleming boundary (with a fixed lower bound of zero, Magirr et al., 2012) versus proposed method across four distinct LFC design settings outlined in Table 5. For each design setting, multiple trials were generated under varying ground truth hazard ratios, which could either align with the design setting or deviate to different degrees. Survival times were generated using an exponential distribution with a median survival of 7.3 (months) in the control arm and uniform accrual with accrual duration (months), length of follow-up (months) and no loss to follow-up. The targeted one-sided FWER and power were set at 5% and 90%, respectively. Simulations were conducted with 10,000 simulation runs. The simulation results in Table 5 showed that within each design setting, MJW method had the highest probability to select the most efficacious arm 1 when the data truly matched the design setting, but its performance declined as the second and third arms became more efficacious. In contrast, the proposed method consistently selected the most efficacies arm 1 and chose multiple arms, including arm 1, when the efficacious were more similar among different arms. For instance, in the first design setting with hazard ratios of , when the ground truth hazard ratios were , MJW method only identified arm 1 as the most efficacious arm with a 57.6% chance. In contrast, the proposed method selected arm 1 as one of the efficacious arms with an 88.57% chance. This discrepancy was attributed to the inherent limitations of the MJW method, hindering its ability to differentiate between the most efficacies arm and other competing arms when their efficacious were closer. In contrast, our proposed method, free from such restrictions, demonstrated superior performance across all scenarios, even in the presence of other competing arms. Additionally, the results indicated that the MJW method was underpowered, particularly for a relatively small hazard ratio (large effect size).
Table 5:
Simulations to compare the selection probability of the most effective arm(s) between the MAMS designs using simultaneous stopping rule (MJW, select arm 1 only) or arm-specific stopping rule (select all effective arms) with number of treatment arms , and number of stages .
| Design Parameter | Ground Truth | Selection probability | |||
|---|---|---|---|---|---|
| MJW | Arm 1 | Arm 1 & 2 | All arms | ||
| 0.65,1,1 | 0.65,1,1 | 0.891 | 0.8857 | 0.0200 | 0.0037 |
| 0.65,0.8,1 | 0.836 | 0.8857 | 0.3609 | 0.0166 | |
| 0.65,0.8,0.9 | 0.831 | 0.8857 | 0.3609 | 0.0780 | |
| 0.65,0.7,1 | 0.639 | 0.8857 | 0.6960 | 0.0192 | |
| 0.65,0.7,0.75 | 0.576 | 0.8857 | 0.6960 | 0.4688 | |
| 0.65,0.8,0.8 | 0.65,0.8,0.8 | 0.876 | 0.8826 | 0.3552 | 0.2048 |
| 0.65,0.7,1 | 0.695 | 0.8826 | 0.6913 | 0.0183 | |
| 0.65,0.7,0.75 | 0.635 | 0.8826 | 0.6913 | 0.4593 | |
| 0.5,1,1 | 0.5, 1, 1 | 0.875 | 0.8769 | 0.0197 | 0.0058 |
| (51, 47) | 0.5,0.7, 1 | 0.812 | 0.8769 | 0.3560 | 0.0194 |
| 0.5,0.7,0.8 | 0.802 | 0.8769 | 0.3560 | 0.1166 | |
| 0.5,0.6,1 | 0.703 | 0.8769 | 0.6035 | 0.0221 | |
| 0.5,0.6,0.8 | 0.696 | 0.8769 | 0.6035 | 0.150 | |
| 0.5, 0.8, 0.8 | 0.5,0.8,0.8 | 0.865 | 0.8844 | 0.1659 | 0.0643 |
| (54, 46) | 0.5,0.6,1 | 0.717 | 0.8844 | 0.6223 | 0.0171 |
| 0.5,0.6,0.7 | 0.682 | 0.8844 | 0.6223 | 0.3065 | |
Note: denotes as number of events per group using MJW method and as number of events per group using the proposed method., which are calculated under the corresponding design parameter . denotes as the time (sec) for the corresponding design using MJW method with a simultaneous stopping rule and as the time (sec) for the corresponding design using the proposed method with a arm-specific stopping rule.
The selection probability for simultaneously choosing both Arm 1 and Arm 2 increased in tandem with the efficacy of the second arm in our proposed method. Notably, this probability remained constant even when the third arm exhibited comparatively higher efficacy. This flexibility was attributed to our proposed algorithm’s ability to include all efficacious arms, regardless of their relative efficacy. Furthermore, the selection probability of all arms increased with the enhancement in the efficacy of all three arms, highlighting the robustness and adaptability of our proposed method in handling the case presence of other competing arms.
7. Mixed treatment effects
In this article, the power of the trial is defined under either a global alternative (disjunctive) or LFC. However, in a real trial it is unlikely that only one experimental arm is effective or all experimental arms are effective. It is more likely that there are mixed effects among the experimental arms. Considering a general mixed treatment effect, without loss generality, we assume the first treatments are effective and others are ineffective, that is and , where We conducted simulations under various mixed treatment effect scenarios with the number of treatment arms and number of stages . Survival times were generated using an exponential distribution with a median survival of (months) in the control arm and uniform accrual with accrual duration (months), length of follow-up (months) with all other design parameters are remaining the same as given in previous section. The simulation results in Table 6 showed that the proposed method controls the FWER well and provides sufficient power under various scenarios of the mixed treatment effects. Therefore, when multiple treatment arms are effective, group sequential MAMS trial design with an arm-specific efficacy stopping rule is more suitable for finding the most effective arm and selecting one or multiple effective arm(s).
Table 6:
Simulations are conducted to study the FWER and power under various mixed treatment effects with number of stage and number of arm , where for is the log hazard ratio (HR) of the treatment arm vs the control. The FWER and power are estimated based on 10,000 simulation runs.
| # of arm | log hazard ratio | sample size | # of events | ||||||
|---|---|---|---|---|---|---|---|---|---|
| FWER | EP | FWER | EP | FWER | EP | ||||
| 2 | (0.3,0.1) | 762 | 678 | 0.0494 | 0.8975 | 0.0482 | 0.8981 | 0.0492 | 0.8972 |
| 2 | (0.3,0.2) | 708 | 624 | 0.0530 | 0.9067 | 0.0540 | 0.9062 | 0.0535 | 0.9070 |
| 2 | (0.3,0.25) | 642 | 561 | 0.0498 | 0.9114 | 0.0494 | 0.9122 | 0.0500 | 0.9124 |
| 2 | (0.3,0.3) | 555 | 483 | 0.0532 | 0.9042 | 0.0536 | 0.9043 | 0.0530 | 0.9044 |
| FWER | EP | FWER | EP | FWER | EP | ||||
| 3 | (0.3,0.1,0.1) | 1104 | 984 | 0.0497 | 0.8954 | 0.0497 | 0.8953 | 0.0505 | 0.8953 |
| 3 | (0.3,0.2,0.1) | 1036 | 916 | 0.0490 | 0.9039 | 0.0499 | 0.9041 | 0.0498 | 0.9037 |
| 3 | (0.3,0.25,0.1) | 940 | 828 | 0.0498 | 0.9030 | 0.0504 | 0.9031 | 0.0508 | 0.9032 |
| 3 | (0.3,0.2,0.2) | 984 | 864 | 0.0510 | 0.8994 | 0.0501 | 0.9003 | 0.0517 | 0.8992 |
| 3 | (0.3,0.25,0.2) | 904 | 792 | 0.0494 | 0.9014 | 0.0503 | 0.9005 | 0.0496 | 0.9022 |
| 3 | (0.3,0.25,0.25) | 844 | 736 | 0.0488 | 0.9071 | 0.0490 | 0.9078 | 0.0503 | 0.9065 |
| 3 | (0.3,0.3,0.3) | 692 | 600 | 0.0541 | 0.9044 | 0.0540 | 0.9047 | 0.0545 | 0.9033 |
8. Conclusion
In this paper, we address the need for more efficient and adaptable clinical trial designs, particularly in the evaluation of multiple experimental treatments with time-to-event endpoints. The existing literature employs a simultaneous stopping rule, which halts the trial if any arm meets predefined efficacious criteria. However, this may prematurely terminate the trial and fail to capture the true efficacy of individual treatments, particularly when multiple arms show potential for success.
To address this issue, we propose a novel group sequential MAMS survival trial design that incorporates an arm-specific stopping rule. Our approach discontinues an arm upon demonstrating efficacy at an interim analysis but continues the evaluation of all remaining arms. This allows for a more nuanced assessment of treatment efficacy and enhances the probability of selecting the best-performing arm.
We conducted extensive simulation studies to assess the operating characteristics of our proposed MAMS design with arm-specific stopping rule, and to compare it to trial designs employing a simultaneous stopping rule like the MJW method. Our simulations demonstrated that the proposed method effectively controlled the FWERs and maintained the designed power. Moreover, while the MJW method excelled in selecting the most efficacious arm under ideal conditions, its performance diminished as the efficacy of other arms increased. In contrast, our proposed method consistently identified the most efficacious arm or accurately selected multiple arms, especially when their efficacious closely matched. This robust pattern held true across scenarios where either a single experimental arm was effective or where efficacy was observed across multiple arms in practical settings.
Finally, adaptive seamless phase II/III designs have become increasingly popular which are conducted in two stages with treatment selection at the first stage (Stallard and Friede, 2008; Jenkins et al., 2011). The proposed MAMS design provides a high probability of correctly selecting the most effective arm(s), making it suitable for the first stage of treatment selection in an adaptive seamless phase II/III trial and increasing the likelihood of trial success. This will be a future research topic.
Supplementary Material
Acknowledgments
Dr. Wu’s research was supported by the University of New Mexico Comprehensive Cancer Center Support Grant National Cancer Institute (NCI) P30CA118100 and Dr. Li’s research was supported by the Comprehensive Cancer Center at St. Jude Children’s Research Hospital and American Lebanese Syrian Associated Charities (ALSAC).
Footnotes
CONFLICT OF INTEREST
The authors have declared no conflict of interest.
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
- Dunnett C (1955). A multiple comparison procedure for comparing several treatments with a control. Journal of American Statistics Association 1955; 50:1096–1121. [Google Scholar]
- Genz A, Bretz F. Computation of Multivariate Normal and t Probabilities. 2009; Heidelberg: Springer. [Google Scholar]
- Halabi S, Michiels S. Textbook of clinical trials in oncology. Chapter 9, 2019, CRC Press, New York. [Google Scholar]
- Jaki T, Magirr D. Considerations on covariates and endpoints in multi-arm multi-stage clinical trials selecting all promising treatment. Statistics in Medicine 2013; 32:1150–1163. [DOI] [PubMed] [Google Scholar]
- Jaki T, Magirr D, Pallmann P. MAMS: Designing Multi-Arm Multi-Stage Studies. R package version 1.2, 2107; URL http://CRAN.R-project.org/package=MAMS [Google Scholar]
- Jenkins M, Stone A, Jennison C. An adaptive seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints. Pharmaceutical Statistics 2011; 10:347–356. [DOI] [PubMed] [Google Scholar]
- Magirr D, Jaki T, Whitehead J. A generalised Dunnett test for multi-arm, multi-stage clinical studies with treatment selection. Biometrika 2012; 99:494–501. [Google Scholar]
- Magirr D, Stallard N, Jaki T, Flexible sequential designs for multi-arm clinical trials. Statistics in Medicine 2014; 33:3269–3279. [DOI] [PubMed] [Google Scholar]
- Schaid DJ, Wieand S, Therneau TM. Optimal two-stage screening designs for survival comparisons. Biometrika 1990; 77:659–663. [Google Scholar]
- Stallard N, Friede T. A group-sequential design for clinical trials with treatment selection Statistics in Medicine 2008; 27:6209–6227 [DOI] [PubMed] [Google Scholar]
- Stallard N, Todd S. Sequential designs for Phase III clinical trials incorporating treatment selection. Statistics in Medicine 2003; 22:689–703. [DOI] [PubMed] [Google Scholar]
- Wason JMS, Jaki T. Optimal design of multi-arm multi-stage trials. Statistics in Medicine 2012; 31:4269–4279. [DOI] [PubMed] [Google Scholar]
- Wu J, Li Y. Group sequential multi-arm multi-stage survival trial design with treatment selection, Journal of Biopharmaceutical Statistic, DOI: 10.1080/10543406.2023.2235409, 2023. [DOI] [PubMed] [Google Scholar]
- Wu J, Li Y, Zhu L. Group sequential multi-arm multi-stage trial design with treatment selection, Statistics in Medicine, 42:1480–1491, 2023. [DOI] [PubMed] [Google Scholar]
- Xiong X A class of sequential conditional probability ratio tests. Journal of American Statistics Association 1995; 90:1463–1473. [Google Scholar]
- Xiong X A computer program for SCPRT on information time. Version 1.0, 2017. [Google Scholar]
- Xiong X, Tan M, Boyett J. Sequential conditional probability ratio tests for normalized test statistic on information time. Biometrics 2003; 59:624–631. [DOI] [PubMed] [Google Scholar]
- Xiong X, Tan M, and Kutner MH. Computational Methods for Evaluating Sequential Tests and Post-test Estimations via Sufficiency Principle, Statistica Sinica 12(4):1027–1041, 2002. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
