Abstract
The group sequential design has been well understood and widely applied in designs of late phase clinical trial to enable potentially early stopping for efficacy or futility. The information fraction (IF) is one of the key elements to determine the decision boundary at the interim analyses. The family-wise error rate (FWER) control is highly critical for clinical trials with multiple endpoints to be tested. In this article, we illustrate the importance of properly defining the information fraction for each individual endpoint regarding the FWER control through the numerical evaluation and a case study.
Keywords: Information fraction, Group sequential design, Family-wise error rate, Clinical trial
1. Introduction
In late phase clinical trials, group sequential designs [1] have been popularly used to enable potentially early stopping for efficacy or futility to make promising drugs available to patients at an earlier time or allow ineffective drugs to be terminated earlier to save development cost and allow patients to take other promising treatments. The alpha spending approaches [2,3], such as The Lan-DeMets approach with Pocock or O'Brien-Fleming type [4,5], are commonly utilized for group sequential designs to determine decision boundaries at the planned interim and final analyses such that the overall type I error is strongly controlled at a pre-specified level (e.g. one-sided 0.025). The information fraction (IF), defined as the fraction of total information expected at the scheduled end of the trial [3], is an important parameter of the alpha spending functions. In addition, it is generally assumed [6] that the correlation of standardized test statistics at two interim analyses equals , where and are the corresponding IFs. Calendar time and information time are two types of popularly methods to define the information fraction. Lan and DeMets [7] compared the performance of calendar time and information time on the type I error and power, and pointed out that the type I error rate would be inflated when the information fraction is overestimated. They showed that if O'Brien-Fleming alpha spending function with one-sided significance level of 0.025 is used in a trial with 4 interim analyses, the one-sided type 1 error rate is inflated to 0.034 when the interim analyses are conducted at the estimated IFs while the actual IFs are.
Here we conduct a simple numerical evaluation for better illustration. Assume that for an endpoint E, one interim analysis is planned to be conducted at the estimated IF , where . Let the standardized test statistics at the interim and final analyses be and , where is the actual IF at the interim analysis. Denote the corresponding efficacy boundary calculated based on the IF as and . Then the probability of crossing efficacy boundary throughout the trial is expressed as
Under the null hypothesis, the joint distribution of is known to asymptotically follow a bivariate normal distribution [8].
It was shown in Ref. [9] that is increasing w.r.t. . Under the null hypothesis, there is if , where α is the pre-specified significance level. Therefore, type I error rate would be inflated if under the null hypothesis.
We further examine the magnitude of type I error inflation in two ways. Fig. 1 (A) depicts the change of type I error rate w.r.t. the actual IF , given that the efficacy boundary is calculated using O'Brien-Fleming boundary with the estimated IF and a one-sided significance level 0.025. Fig. 1 (B) depicts the change of type I error rate w.r.t. the estimated IF , given that the actual IF and efficacy boundary calculated using O'Brien-Fleming boundary with the estimated IF and a one-sided significance level 0.025. Fig. 1 (B) shows that the type I error could be inflated as large as 0.03 if the IF is significantly overestimated.
Fig. 1.
Plot of type 1 error rate.
2. A case study
In Phase 3 confirmatory oncology clinical trials, the hierarchical testing procedure is frequently used for testing multiple endpoints. The strong control of family-wise error rate (FWER) is required for the statistical testing for multiple endpoints to be considered valid, where the FWER is defined as the probability of falsely rejecting at least one null hypothesis. Glimm et al. [8] and Tamhane et al. [10] showed that the type I error rate for each ranked point must be strongly controlled and thus the alpha spending function needs to be pre-specified for each ranked endpoint under the group sequential setting. However, there was no further discussion on type 1 error rate inflation for each ranked endpoint that could be caused by the overestimation of IFs.
In practice, due to the operational convenience for regulatory submissions with the Phase 3 data, it is always preferred that the interim analysis for all key endpoints to be conducted at the same time, where the timing is driven by the data maturity of the primary endpoint(s). Furthermore, for the same reason, the same alpha spending function applied for primary endpoint(s) is often used for other secondary endpoints, including the specification of IFs. However, due to heterogeneity among the nature of data maturity, such as those for overall survival (OS), progression free survival(PFS), and overall response rate (ORR), applying the same IFs for all endpoints could cause significant underestimation/overestimation of IFs. It would potentially lead to the FWER inflation as discussed in Section 1. A commonly used alternative approach is to allocate conservative α to all secondary endpoints at interim analyses [7]. Although the strategy remedies the concern of type I error inflation, it scarifies the opportunity of early stopping for all key secondary endpoints, and thus could cause a waste of development costs.
To overcome the aforementioned drawbacks, we propose to pre-specify the definition of IFs for each secondary endpoint by utilizing the preliminary results from Phase 1/2 clinical trials. For illustrative purpose, we consider a case study on developing the specification of IFs for each endpoint.
In a Phase 3 oncology trial, the primary endpoint is OS, and all key secondary endpoints are 1) overall response rate (ORR), 2) complete response(CR) rate, 3) ORR rate by the initiation of Cycle 2. One interim analysis is planed to occur when of total OS events are accrued. To strongly control the FWER, the primary endpoint and all key secondary endpoints are to be tested in a hierarchical testing procedure as illustrated in Fig. 2. The total sample size is set as 400.
Fig. 2.
Hierarchical testing procedure of primary and key secondary endpoints.
It is known that the data maturity of response rate endpoints (CR rate and ORR) [11], which is the proportion of subjects with matured ORR or CR data, is different from time-to-event endpoints(OS). Therefore, we specify the IFs for CR rate and ORR at the interim analysis by considering the preliminary results of time to CR and time to ORR from Phase 2 trials as summarized in Table 1.
Table 1.
The summary of time to CR and ORR.
| 25th Pctl | Median | 75th Pctl | 90th Pctl | Min | Max | |
|---|---|---|---|---|---|---|
| Time to ORR (months) | 1.5 | 2.0 | 3.5 | 6.0 | 1.0 | 12.0 |
| Time to CR (months) | 2.5 | 4.0 | 6.0 | 9.0 | 1.0 | 15.0 |
To better understand the data maturity of CR rate and ORR, a discussion is held within cross-functional study team to review the preliminary results aforementioned. It turns out that the percentile is considered to be a clinically reasonable cutoff. Therefore, the definitions for IFs for CR rate and ORR are formed as outlined in Table 2.
Table 2.
The definition of information fraction.
| Endpoint | IF Definition | Estimated IF |
|---|---|---|
| OS | Fraction of total OS events accrued | 50% |
| ORR | Fraction of total patients with at least 6 months follow up since enrollment | 250/400 = 52.5% |
| CR rate | Fraction of total patients with at least 9 months follow up since enrollment | 150/400 = 37.5% |
| ORR rate by the initiation of Cycle 2 | Fraction of total patients who started Cycle 2 Day 1 treatment or discontinued by the end of Cycle 1 | 100% |
For instance, given that at the time of interim analysis, there are 250 subjects with 6 month follow-up, 150 subjects with 9 month follow-up, and all 400 subjects have either started the cycle 2 treatment or discontinued treatment by the end of Cycle 1, a estimation of IFs is presented in Table 2.
As demonstrated in Table 2, our proposed framework provides more objective estimates of the IFs for each endpoint according to its corresponding data maturity mechanism. If the same IF as OS ( ) is applied to all endpoints, there is high risk that the IF of CR rate would be overestimated and the IFs of ORR and ORR by initiation of Cycle 2 would be underestimated. Correspondingly, the FWER could be inflated by testing CR rate and there is also potential power loss if ORR or ORR by initiation of Cycle 2 is tested at the interim analysis.
3. Discussion
Although the operating characteristics of the group sequential design has been well understood for each individual endpoint, attention still needs to be paid on the specification of IFs for group sequential designs with multiple endpoints. The numerical evaluation illustrates that the type I error could be inflated if the IFs are overestimated. On the other hand, allocating a conservative alpha for secondary endpoints at interim analysis would cause a waste of development cost in case that the trial could be stopped early if adequate α is allocated. We propose a framework that the IFs of each secondary endpoints are defined based on the outcomes from early phase trials. The framework provides a more objective way to estimate IFs such that the risk of severely overestimating/underestimating the IFs is mitigated. It thus preserves the FWER and allocates adequate α to each individual endpoint at the interim analysis. Especially, for those endpoints that is fully matured at the interim analysis, the full α could be utilized for testing.
Disclosure
This manuscript was sponsored by AbbVie, Inc. AbbVie contributed to the design, research, and interpretation of data, writing, reviewing, and approving the content. Tu Xu, Qin Qin, and Xin Wang are employees of AbbVie, Inc. The authors thank the Associate Editor and referees for their constructive comments and suggestions.
Footnotes
Supplementary data related to this article can be found at https://doi.org/10.1016/j.conctc.2018.03.005.
Appendix A. Supplementary data
The following is the supplementary data related to this article:
References
- 1.Emerson S., Fleming T. Symmetric group sequential test designs. Biometrics. 1989;45:905–923. [PubMed] [Google Scholar]
- 2.Wang S., Tsiatis A. Approximately optimal one-parameter boundaries for group sequential trials. Biometrics. 1987;43:193–199. [PubMed] [Google Scholar]
- 3.DeMets D. Interim analysis: the alpha spending function approach. Stat. Med. 1994;13:1341–1352. doi: 10.1002/sim.4780131308. [DOI] [PubMed] [Google Scholar]
- 4.Lan G., DeMets D. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. [Google Scholar]
- 5.Lan G. Information and information fractions for design and sequential monitoring of clinical trials. Commun. Stat. Theor. Meth. 1994;23:403–420. [Google Scholar]
- 6.Lan G., Wittes J. The B-value: a tool for monitoring data. Biometrics. 1988;44:579–585. [PubMed] [Google Scholar]
- 7.Lan G., DeMets D. Group sequential procedures: calendar versus information time. Stat. Med. 1989;8:1191–1198. doi: 10.1002/sim.4780081003. [DOI] [PubMed] [Google Scholar]
- 8.Glimm E., Maurer W., Bretz F. Hierarchical testing of multiple endpoints in group-sequential trials. Stat. Med. 2010;29:219–228. doi: 10.1002/sim.3748. [DOI] [PubMed] [Google Scholar]
- 9.Kotz S., Balakrishnan N., Johnson N. Wiley; 2000. Continuous multivariate distributions, volume 1: models and applications. [Google Scholar]
- 10.Tamhane A., Mehta C., Liu L. Testing a primary and a secondary endpoint in a group sequential design. Biometrics. 2010;66:1174–1184. doi: 10.1111/j.1541-0420.2010.01402.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kelly K., Halabi S. Demos Medical Publishing; 2009. Oncology Clinical Trials: Successful Design, Conduct and Snalysis. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


