Abstract
Many oncology phase II trials are single arm studies designed to screen novel treatments based on efficacy outcome. Efficacy is often assessed as an ordinal variable based on a level of response of solid tumors with four categories: complete response, partial response, stable disease and progression. We describe a two-stage design for a single-arm phase II trial where the primary objective is to test the rate of tumor response defined as complete plus partial response, and the secondary objective is to estimate the rate of disease control defined as tumor response plus stable disease. Since the goal is to estimate the disease control rate, the trial is not stopped for futility after the first stage if the disease control rate is promising. The new design can be generated using easy-to-use software that is available at http://cancer.unc.edu/biostatistics/program/ivanova/.
Keywords and phrases: Simon’s design, Two-stage design, Ordinal outcome, Disease control, Tumor response
1. INTRODUCTION
Phase two trials in oncology are usually single arm studies to screen oncology treatments. According to the International Harmonization Project for Response Criteria [1], responses to treatment fall into one of four categories: complete response (CR), partial response (PR), stable disease (SD) and progressive disease. In most trials, these categories are combined into a dichotomous primary outcome, tumor response (TR), which is defined as complete or partial response. Disease control (DC), defined as complete or partial response or stable disease, is often of interest as well. Alternatively, clinical benefit, defined as disease control observed for at least a certain period of time, can be used. Often, one outcome (such as tumor response) is assessed as the primary endpoint, and another (such as disease control) is assessed as a secondary endpoint. These types of trial designs have been used in various disease groups. Abrey et al. [2] described a phase II trial in patients with recurrent or progressive brain metastases with tumor response as a primary outcome and disease control as a secondary outcome. Similarly, in a large phase II trial in non-small cell lung cancer [3], both response rate and disease control were reported as important endpoints of the study. Studies of advanced breast cancer often include clinical benefit as a secondary endpoint [4].
Our motivating example is a University of North Carolina Lineberger Comprehensive Cancer Center single arm phase II trial to investigate the efficacy of the daily use of a novel mTOR inhibitor in combination with other agents in the setting of HER2-positive breast cancer brain metastases. The primary endpoint was tumor response defined as complete or partial response, but also of particular interest was the clinical benefit endpoint. Clinical benefit was defined as complete or partial response or stable disease lasting for at least 3 months. The tumor response rate of standard of care regimen without the addition of the novel mTOR inhibitor was thought to be approximately 0.05, and the TR rate for the alternative hypothesis was set to 0.20. Initially the investigators were going to test TR rate by using Simon’s optimal two-stage design [5]. The total of n = 29 provide a power of 80% with type I error rate of at most α = 0.05. The trial is stopped for futility if no TRs are observed in the first 10 patients. The null hypothesis is to be rejected if 4 or more TRs are observed among 29 patients. This design yields an actual type I error rate of 0.0468. Since the clinical benefit rate was also of interest, it was decided to stop the trial for futility after the first stage only if none of the patients in stage 1 had clinical benefit; that is, to apply the stopping rule for futility to the outcome of clinical benefit, defined as TR or lasting SD, rather than TR alone. If the probability of lasting SD is high, the likelihood of stopping for futility is low and the type I error rate of the trial will be, in fact, higher than 0.0468 and possibly higher than the nominal level of 0.05. If the probability of stopping the trial for futility is close to 0, the type I error rate is close to 0.0548, the type I error rate of a single-stage design. Therefore, one cannot simply replace TR with clinical benefit in the futility stopping rule in a two-stage design. Instead, our new two-stage design that yields the desired type I and II errors under certain assumptions on lasting SD rate should be used.
In Section 2 of this paper we describe how to construct two-stage designs with relaxed stopping for futility that yield the specified type I error rate. In Section 3 we describe how to estimate rates of interest after the trial. We give examples in Section 4.
2. TWO-STAGE DESIGN WITH RELAXED STOPPING FOR FUTILITY
Let pT, pS and pD denote the true unknown tumor response, stable disease and disease control rates respectively, pD = pT + pS. Without loss of generality we will use these three outcomes to describe our method with DC defined as TR or SD. Other outcomes can be used as well, for example, clinical benefit defined as TR or lasting SD, or tumor response defined as complete response or partial response. Let p0T be an uninteresting TR rate, and pAT be a rate which would be considered promising for future study. Consider the one-sided hypothesis about TR rate
Consider a two-stage design, where n is the total number of patients in the trial, n1 is the number of patients in stage 1, n1 < n. Let and be random variables describing the number of patients with tumor response and stable disease in stage 1, out of n1 patients, and and the number of patients with TR and SD at the end of the trial out of n patients. and . Let and be the corresponding observed quantities. Simon’s, or other two-stage, design testing H0 will stop for futility if the observed number of tumor responses in stage 1, is less than or equal to the r1, ( ), otherwise n-n1 patients are enrolled. H0 is rejected if . For a given type I error rate α and type II error rate β, parameters r1 and r2 in a two-stage design are such that
In our situation, the primary endpoint is TR, but the investigators are also interested in estimating the DC rate, if the DC rate is promising. Therefore, if a low TR rate is observed in stage 1, but the DC rate is promising (because many patients had stable disease), we do not want to stop for futility at the interim. Consider the following two-stage design: stop after stage 1 and declare treatment uninteresting only if the number of TRs plus the number of patients with SD is low with . We will also stop after stage 1 if the number of patients with TR is so low that it is impossible to reject H0 at the end of the trial even all n - n1 patients in stage 2 have TR, that is, we stop if . If pS = 0, the two designs are the same since . However, when pS > 0, changing the futility stopping rule from to will affect type I and type II error rates in a two-stage design. The probability of stopping for futility will decrease, and, therefore, both type I error rate and power (1 minus type II error rate) will increase.
To compute type I and type II error rates in the modified design, we need to make assumptions on the rate of stable disease. Note that both the type I and II error should be computed under the same assumptions on pS. If assumptions on pS are not the same under the null and alternative hypotheses, the type I and II error rates are computed for two different designs. For example, if we assume that, under H0, the SD rate is equal to the smallest possible value, pS = 0, the type I error rate is computed for the two-stage design that stops for futility when . If we assume that, under H1, the SD rate is equal to the largest possible value, pS = 1 - pAT, the type II error rate is computed for a single stage design since the probability of stopping for futility is 0. Since we need to have the same design under both H0 and H1, the assumptions on pS should be the same under both hypotheses.
We assume that . We are interested in finding two-stage designs such that
(1) |
When the treatment response rate is fixed at pT, is increasing in pS. Therefore, the power is minimized when and the type I error rate is maximized when . Hence conditions (1) can be written as
(2) |
That is, to control type I error rate for all possible values of pS we need to control it for and to ensure required power for all values of pS we need to ensure it for .
As there are many designs satisfying criteria (2), Simon proposed to consider the minimax and the optimal designs. The minimax design minimizes the maximum total sample size n, while the optimal design minimizes the expected sample size under the null hypothesis EN0. When there is more than one minimax or optimal design we choose the design with the smallest value of EN0 + n. Such defined minimax and optimal designs are unique. To compute the expected sample size under H0 for our design, we need to know the distribution of pS. Assume that random variable PS is distributed with density function f(x) defined for , then probability of stable disease is a random variable defined as PD = p0T + PS. The probability of early stopping under H0, PES0, is computed as PES0 = EPD [Pr(Y≤ r1|PD)], where Y is a number of patients with disease control out of n1 patients. The expected sample size under the null hypothesis is EN0 = PES0n1 + (1 − PES0)n.
Apart from the minimax and optimal designs, it is helpful to calculate other admissible designs, designs that minimize wEN0 + (1 − w)n for some w, such that 0 ≤ w ≤ 1 [6] since admissible designs often provide different sample size breakdowns over the two stages. These other admissible designs are often preferred by investigators because they allow a design with a more balanced first and second stage sample size to be chosen. We developed easy-to-use software that generates all admissible designs with relaxed stopping for futility; it can be found at http://cancer.unc.edu/biostatistics/program/ivanova/. We also give an example write-up that can be used in a clinical trial protocol.
Table 1 presents several examples of input parameters and all admissible designs for these parameters. The designs are generated by our online software. The software assumes that PS has a uniform distribution. Since the power of the test is minimized when and the type I error rate is maximized when , power and type I error rate depend only on the range of PS, [ ], and do not depend on the distribution of PS. Thus, all designs presented in Table 1 guarantee (1). The only quantity that depends on the distribution of PS is EN0. When there is more than one design that minimizes wEN0 + (1 − w)n for a given w, we choose the design with the smallest value of EN0 + n, hence our designs are unique.
Table 1.
|
N | n1 | r1 | r2 | EN0 | PES | w | Design | |
---|---|---|---|---|---|---|---|---|---|
p0T = 0.05, pAT = 0.2 | |||||||||
| |||||||||
0 | 27 | 13 | 0 | 3 | 19.8 | 0.51 | [0.598,1] | Minimax | |
0 | 28 | 11 | 0 | 3 | 18.3 | 0.57 | [0.414,0.597] | Admissible | |
0 | 29 | 10 | 0 | 3 | 17.6 | 0.60 | [0,0.413] | Optimal | |
0.1 | 27 | 13 | 0 | 3 | 23.1 | 0.28 | [0.443,1] | Minimax | |
0.1 | 28 | 11 | 0 | 3 | 22.3 | 0.34 | [0,0.442] | Optimal | |
0.2 | 27 | 13 | 0 | 3 | 24.6 | 0.17 | [0.208,1] | Minimax | |
0.2 | 28 | 11 | 0 | 3 | 24.3 | 0.22 | [0,0.209] | Optimal | |
| |||||||||
p0T = 0.5, pAT = 0.7 | |||||||||
| |||||||||
0 | 37 | 23 | 12 | 23 | 27.7 | 0.66 | [0.556,1] | Minimax | |
0 | 39 | 16 | 8 | 24 | 25.2 | 0.60 | [0.304,0.555] | Admissible | |
0 | 43 | 15 | 8 | 26 | 23.5 | 0.70 | [0,0.303] | Optimal | |
0.1 | 37 | 11 | 4 | 23 | 32.3 | 0.18 | [0.267,1] | Minimax | |
0.1 | 46 | 15 | 8 | 28 | 29.1 | 0.55 | [0,0.266] | Optimal | |
0.2 | 37 | 29 | 15* | 23 | 32.9 | 0.51 | [0,1] | Minimax, Optimal | |
| |||||||||
p0T = 0.4, pAT = 0.6 | |||||||||
| |||||||||
0 | 39 | 34 | 17 | 20 | 34.4 | 0.91 | [0.815,1] | Minimax | |
0 | 41 | 17 | 7 | 21 | 25.6 | 0.64 | [0.182,0.814] | Admissible | |
0 | 46 | 16 | 7 | 23 | 24.5 | 0.72 | [0,0.181] | Optimal | |
0.1 | 42 | 20 | 0 | 22 | 36.3 | 0.26 | [0.857,1] | Minimax | |
0.1 | 43 | 15 | 6 | 22 | 30.3 | 0.45 | [0,0.856] | Optimal | |
0.2 | 42 | 33 | 15* | 22 | 37.7 | 0.48 | [0.455,1] | Minimax | |
0.2 | 45 | 13 | 5 | 23 | 35.2 | 0.31 | [0,0.454] | Optimal | |
0.3 | 42 | 35 | 16 | 22 | 37.8 | 0.60 | [0,1] | Minimax, Optimal |
Additional stopping for TR: the trial is stopped for futility if the number of TRs in stage 1 is such that it is impossible to reject H0 after stage 2, that is, the number of TRs is less than r2 − (n − n1).
3. ESTIMATION OF TUMOR RESPONSE AND STABLE DISEASE RATES AFTER THE TRIAL
The goals of a trial with relaxed stopping for futility are to test the null hypothesis based on the primary endpoint and provide point and interval estimates for the primary and secondary endpoints at the end of the trial. The testing and estimation procedures should take into account the interim analysis after stage 1, otherwise the estimates can be biased. Jovic and Whitehead [7] adopted the methodology of Fiarbanks [8] to estimate the rate of the primay outcome after Fleming’s two-stage design [9]. In this section we show how to obtain p-values and estimates with confidence intervals for primary and secondary endpoints in a two-stage trial with relaxed futility stopping, taking design into account. For that, we enumerate all of the outcomes that are as extreme or more extreme than the one we observed. Let . Define two functions
and
The one-sided p-value for testing H0 is computed as P (p0T), the 95% confidence interval for pT is given by ( ) where and , and the median unbiased estimate of pT is given by , where .
To estimate the rate of the secondary endpoint, disease control, pD, define
and
As before, the 95% confidence interval for pD is given by ( ) where and , and the median unbiased estimate of pD is given by , where .
4. EXAMPLES
In the introduction we described the Lineberger Comprehensive Cancer Center Phase II trial in HER2-positive breast cancer brain metastases with the null TR rate of p0T = 0.05 and alternative rate of pAT = 0.20. The alternative TR rate of 0.2 was set based on previous studies [10] that have shown a tumor response rate of about 0.2 in a similar patient population. Desired power was 0.8 and the type I error rate was 0.05. The initial plan was to use Simon’s optimal design with n = 29, n1 = 10, r1 = 0, r2 = 3. However, the investigators were also interested in the rate of clinical benefit and wanted to have full enrollment if the rate of clinical benefit was promising even though the rate of tumor response was small. Therefore, a stopping rule for futility after stage 1 of the form was used. Calculations show that for any stable disease rate higher than 0.048, type I error rate of the design defined above exceeds 0.05. Therefore taking Simon’s optimal design and modifying its stopping rule from to is not going to yield a design with type I error rate below or at the nominal level of 0.05. It was assumed that pS ~ Unif (0, 0.2) in the HER2-positive breast cancer brain metastases trial, and the optimal design with n = 28, n1 = 11, r1 = 0, r2 = 3 was chosen for the study. In fact, in stage 1 there was at least one patient with lasting stable disease, and therefore the trial continued to stage 2 without waiting for responses from the remaining stage 1 patients still in follow-up. Table 1 presents all admissible designs satisfying condition (2) for this trial under various assumptions on pS.
As far as estimation, if, for example, the following counts were observed in a HER2-positive breast cancer trial . The trial is not stopped for futility since . The maximum likelihood estimate of disease control rate is (1+6)/29 = 0.24 with corresponding 95% naïve exact confidence interval (0.10, 0.44). Adjusted median unbiased estimate with corresponding 95% exact confidence interval using methodology from Section 3 is 0.23(0.10, 0.40). The adjusted estimate is slightly smaller becuase the trial did not stop for futility after stage 1.
Our second example is a Lineberger Comprehensive Cancer Center Phase II trial in older patients with previously untreated diffuse large B-cell lymphoma. The primary endpoint in this trial was complete response. The null hypothesis that the CR rate is 0.5 was tested against the one-sided alternative. Required power is 80% when CR rate is 0.7 and the type I error rate is 0.05. Partial response was also of interest; therefore, stopping after stage 1 was based on the number of CRs plus PRs. With partial response pP ~ Unif (0, 0.2), the only design that minimizes wEN0 + (1 − w)n is the design with n = 37, n1 = 29, r1 = 15, r2 = 23 and additional futility rule to stop after stage 1 if the number of CRs in stage 1 is less than or equal to 14. This design was used for the study. The study was stopped early for futility because the number of CRs observed in stage 1 was not high enough to reject H0 at the end of the trial. The number of PRs was, actually higher than expected.
Two-stage designs without and with relaxed stopping for futility for these examples are displayed in Table 1. The original Simon’s minimax and optimal designs [5], that is, designs without relaxed stopping for futility are presented in the lines where . Relaxing the stopping rule for futility often leads to less efficient designs compared to designs in [5]. For example, in the case of testing p0T = 0.4 versus pAT = 0.6 (last example in Table 1), there are three admissible designs without relaxed futility stopping, minimax with n = 39, an admissible design with n = 41 and the optimal design with n = 46. If we would like to relax stopping for futility and assume , there is only one available design, and it requires n = 42. Its maximum total sample size, n = 42, is larger compared to designs in [5], n = 39 for the minimax design. If we are interested in minimizing the expected sample size under the null hypothesis it yields EN0 = 37.8 compared to the best design with , the optimal design with EN0 = 24.5.
5. DISCUSSION
This work was motivated separately by the two investigators leading the trials described in Section 4. They were each unsatisfied with the lack of flexibility in a standard two-stage design which does not allow for a trial to continue if important secondary endpoint shows promise in stage 1. We have developed easy to use software to generate designs we describe here. From the user’s input values of α, 1 − β, p0T, pAT, , the software calculates the sample sizes (n and n1), and decision rules (r1 and r2), probability of early stopping under the null hypothesis, EN0 and weight w for each admissible design. The website also provides a recommended write-up to help with interpretation and to be used in the protocol.
Another solution to designing a trial with two outcomes CR and TR is to test the intersection hypothesis about CR and TR [11, 12] instead of testing CR alone. In this case the trial is stopped for futility if both CR and TR are low and the drug is considered promising if either CR or TR is good. This approach is also a part of our software together with various Phase II methods that are frequently used at the Lineberger Comprehensive Cancer Center. Among methods available at http://cancer.unc.edu/biostatistics/program/ivanova/ are Simon’s and Fleming’s two-stage designs and the method to generate stopping boundary for continuous toxicity monitoring in a Phase II trial [13].
Acknowledgments
Software development and Ivanova’s work was supported in part by the National Institutes of Health RO1 CA120082-01A1. The authors thank Vladimir Khvostov for his expertise in developing software, Dominic Moore for helpful comments, and the PIs of the studies used as examples: Steven Park and Carey Anders.
Contributor Information
Anastasia Ivanova, Department of Biostatistics, CB #7420, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, USA. Lineberger Cancer Center at the University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7295, USA.
Allison M. Deal, Biostatistics Core, Lineberger Comprehensive Cancer Center at the University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7295, USA
References
- 1.Cheson BD, Pfistner B, Juweid ME, Gascoyne RD, Specht L, Horning SJ, Coiffier B, Fisher RI, Hagenbeek A, Zucca E, Rosen ST, Stroobants S, Lister TA, Hoppe RT, Dreyling M, Tobinai K, Vose JM, Connors JM, Federico M, Diehl V. International harmonization project on lymphoma. Revised response criteria for malignant lymphoma. Journal of Clinical Oncology. 2007;25(5):579–86. doi: 10.1200/JCO.2006.09.2403. [DOI] [PubMed] [Google Scholar]
- 2.Abrey LE, Olson JD, Raizer JJ, Mack M, Rodavitch A, Boutros DY, Malkin MG. A phase II trial of temozolomide for patients with recurrent or progressive brain metastases. Journal of Neuro-Oncology. 2001;53(3):259–65. doi: 10.1023/a:1012226718323. [DOI] [PubMed] [Google Scholar]
- 3.Fukuoka M, Yano S, Giaccone G, Tamura T, Nakagawa K, Douillard JY, Nishiwaki Y, Vansteenkiste J, Kudoh S, Rischin D, Eek R, Horai T, Noda K, Takata I, Smit E, Averbuch S, Macleod A, Feyereislova A, Dong RP, Baselga J. Multi-institutional randomized phase II trial of gefitinib for previously treated patients with advanced non-small-cell lung cancer (The IDEAL 1 Trial) [corrected] Journal of Clinical Oncology. 2003;21(12):2237–46. doi: 10.1200/JCO.2003.10.038. Erratum in: Journal of Clinical Oncology, 2004, 22 (23) 4863. [DOI] [PubMed] [Google Scholar]
- 4.Robertson JF, Llombart-Cussac A, Rolski J, Feltl D, Dewar J, Macpherson E, Lindemann J, Ellis MJ. Activity of fulvestrant 500 mg versus anastrozole 1 mg as first-line treatment for advanced breast cancer: results from the FIRST study. Journal of Clinical Oncology. 2009;27(27):4530–5. doi: 10.1200/JCO.2008.21.1136. [DOI] [PubMed] [Google Scholar]
- 5.Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials. 1989;10(1):1–10. doi: 10.1016/0197-2456(89)90015-9. [DOI] [PubMed] [Google Scholar]
- 6.Jung SH, Lee T, Kim K, George SL. Admissible two-stage designs for phase II cancer clinical trials. Statistics in Medicine. 2004;23(4):561–9. doi: 10.1002/sim.1600. [DOI] [PubMed] [Google Scholar]
- 7.Jovic G, Whitehead J. An exact method for analysis following a two-stage phase II cancer clinical trial. Statistics in Medicine. 2010;29(30):3118–25. doi: 10.1002/sim.3837. [DOI] [PubMed] [Google Scholar]
- 8.Fairbanks K, Madsen R. P-values for tests using a repeated significance test design. Biometrika. 1982;69(1):69–74. [Google Scholar]
- 9.Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics. 1982;38(1):143–51. [PubMed] [Google Scholar]
- 10.Cardoso F, Dieras V, Campone M, Massacesi C, Manlius C, Thevenaz P, Zhang Y, Jerusalem G, Sahmoud T, Andre F, Gianni L. Everolimus, trastuzumab and paclitaxel or vinorelbine in the treatment of HER2+ metastatic breast cancer pre-treated with lapatinib-containing therapy: pooled analysis of 2 phase 1 studies. Cancer Research. 2009;69(24 supplement):5064. [Google Scholar]
- 11.Lu Y, Jin H, Lamborn KR. A design of phase II cancer trials using total and complete response endpoints. Statistics in Medicine. 2005;24:3155–70. doi: 10.1002/sim.2188. [DOI] [PubMed] [Google Scholar]
- 12.Ivanova A, Monaco J, Stinchcombe T. Efficient designs for phase II oncology trials with ordinal outcome. Statistics and Its Interface. 2012;5(4):463–469. [Google Scholar]
- 13.Ivanova A, Qaqish BF, Schell MJ. Continuous toxicity monitoring in phase II trials in oncology. Biometrics. 2005;61(2):540–5. doi: 10.1111/j.1541-0420.2005.00311.x. [DOI] [PubMed] [Google Scholar]