Skip to main content
Journal of Clinical Oncology logoLink to Journal of Clinical Oncology
. 2018 Apr 26;36(19):1902–1904. doi: 10.1200/JCO.2017.77.0479

Bias, Operational Bias, and Generalizability in Phase II/III Trials

Boris Freidlin 1,, Edward L Korn 1, Jeffrey S Abrams 1
PMCID: PMC6075858  PMID: 29698104

Background

Adaptive clinical trial designs improve the efficiency of the drug development process by allowing treatment decisions to be made earlier, while protecting the interests of the patients in the trials. In settings in which a randomized phase II efficacy signal is required to justify starting a large phase III trial, a phase II/III (seamless) design can sometimes be used to adaptively combine randomized phase II and phase III trial components.1-4 After the phase II part of the trial is enrolled and the phase II outcome is sufficiently mature, a go/no-go decision is made regarding whether to continue accrual to the phase III part of the trial. The phase II patients are used in the final phase III analysis, and a separate phase III protocol need not be developed. This design offers efficiency gains relative to performing an independent phase III trial after a positive phase II trial. In general, the phase II analysis is performed confidentially solely to make a go/no-go decision, and the phase II results are not released to the public.

When the phase II outcome events (eg, progression-free survival [PFS]) are occurring slowly, or when the same end point (eg, overall survival [OS]) is used for the phase II and the phase III analyses, an accrual suspension while waiting for the phase II data to mature may be needed to minimize patient exposure to an ineffective therapy and reap the efficiency benefits of the design.2 Although suspending/restarting accrual in a phase III trial represents a logistical challenge, this approach is being used in the National Cancer Institute National Clinical Trials Network trials (eg, NCT02502266: Cediranib Maleate and Olaparib or Standard Chemotherapy in Treating Patients With Recurrent Platinum-Resistant or -Refractory Ovarian, Fallopian Tube, or Primary Peritoneal Cancer; NCT02339571: Nivolumab and Ipilimumab With or Without Sargramostim in Treating Patients With Stage III-IV Melanoma That Cannot Be Removed by Surgery; NCT02152982: Temozolomide With or Without Veliparib in Treating Patients With Newly Diagnosed Glioblastoma Multiforme; and NCT01810913: Radiation Therapy With Cisplatin, Docetaxel, or Cetuximab After Surgery in Treating Patients With Stage III-IV Squamous Cell Head and Neck Cancer). However, the US Food and Drug Administration (FDA) recently raised a new concern regarding the use of phase II/III designs when it responded to a question about whether phase II patients could be used in the final phase III analysis of a planned trial that had an accrual suspension (End of Phase 2 Meeting Minutes, Reference ID 4096437):

“No, as currently designed this [an accrual suspension] would not be appropriate. In general, an adaptive design may be acceptable if the trial is well designed. In the currently proposed design, the potential 18-month delay from the time of completion of enrollment in the ‘Phase II’ portion of the trial and initiation of the ‘Phase III’ portion may introduce bias. FDA recommends that DCTD [Division of Cancer Treatment and Diagnosis] consider modifying the trial design to a single randomized trial with an interim analysis of PFS [progression-free survival] for futility or for sample size re-estimation with proper alpha adjustment.”

In this commentary, we dissect and evaluate the FDA’s concern and find that it does not sufficiently justify a general prohibition of using phase II/III trial designs with accrual suspension.

The Proposed Trial and Alternative Trial Designs

The phase II/III trial under consideration specified the maximum sample size of 468 patients (randomly assigned at a ratio of 2:1 between the experimental and control arms) on the basis of a phase III design targeting an OS hazard ratio (HR) of 0.73 (80% power with a 0.025 one-sided significance level), with the final analysis scheduled to occur after 364 deaths occurred. The phase II design was based on randomly assigning 171 patients targeting a PFS HR of 0.65 (85% power with a 0.15 one-sided significance level), with the phase II go/no-go decision scheduled for when 98 PFS events have been observed. To allow the phase II data to mature, the design specified an accrual suspension after the first 171 patients had been randomly assigned until the go/no-go decision was made, which was expected to take approximately 18 months. This was deemed advisable because the PFS events were expected to occur slowly, with the median PFS being 18 months.

The first and second rows of Table 1 describe the operating characteristics of the proposed design and a phase II/III trial design with no accrual suspension. To address the FDA suggestion of using a design that incorporates PFS-driven futility analyses, the third row presents an OS-driven phase III design that includes PFS futility analyses (as described in Goldman et al5). (Presumably, this is what was meant by the FDA’s suggestion of a single trial with futility monitoring for PFS.) The difference between the designs in rows 1 and 2 versus the design in row 3 is that the phase II bar for PFS in a phase II/III design will be higher than a futility monitoring bar for PFS, the former requiring some designated benefit of the experimental treatment over the control treatment. The three designs were simulated under the null hypothesis and the targeted alternative hypothesis (see Table 1 footnotes for details), with average sample sizes and average durations tabulated for each design.

Table 1.

Comparison of a Phase II/III Trial Design (with and without accrual suspension) and a Phase III Design With PFS/OS Interim Monitoring

graphic file with name JCO.2017.77.0479t1.jpg

When the experimental treatment is not beneficial (the null hypothesis), FDA’s proposed design has poor characteristics compared with the phase II/III designs: a much larger average sample size and a longer average duration (Table 1). A comparison of the two phase II/III designs demonstrates that the suspension results, on average, in a longer trial (by 6 months) but with 25% fewer patients exposed to an ineffective therapy, which is the raison d’etre for the suspension. Under the alternative hypothesis, the average sample sizes are about the same, with the suspension leading to a longer trial when compared with a design without a suspension.

Bias

In terms of protecting patients from ineffective therapies, the phase II/III design with a suspension is the best choice. However, if the trial design introduced a bias (a systematic deviation between the estimated treatment effect and its true value) that made the results of such a design less interpretable, then its use would be unjustified. For example, repeated analyses of interim data introduce bias (unless a proper multiplicity adjustment is used). A more subtle example of a design-induced bias is given by trials that use outcome-adaptive randomization (changing the randomization ratio during the study to favor the arm that appears to be doing better). If the prognosis of the patient population is improving over the course of the trial, a standard comparison of the treatment arms can make the experimental-treatment arm appear better than it truly is.6

Interim monitoring in standard phase III and phase II/III designs is known to result in a slight inflation of the estimated treatment effect. With a properly designed interim monitoring plan, the inflation is quite small7,8 and is typically ignored. (Note that this inflation is unrelated to whether the design has an accrual suspension.) Furthermore, if the experimental and control treatments are equally effective for all patients, then a randomized phase II/III trial will preserve the type I error rate. Interestingly, the sample size re-estimation designs suggested by the FDA in the above comment can lead to bias.9-11

Operational Bias and Generalizability

Perhaps the bias the FDA had in mind was operational bias, which is defined in “Guidance for Industry: Adaptive Design Clinical Trials for Drugs and Biologics”12:

Operational Bias: Many study adaptations call for un-blinding of the analysts charged with implementing the planned design revisions. Access by these analysts to the interim unblinded results raises concern about the possibility that the analysts might influence investigators in how they manage the trial, manage individual study patients, or make study assessments, bringing into question whether trial personnel have remained unequivocally objective.”

First, consider concerns that the operational deviations that result from the knowledge of interim results can lead to error in estimating the true treatment effect in the study population. The relevant issue is not whether there is potential for this type of operational bias in phase II/III trials, but rather whether phase II/III designs increase potential for this bias relative to a standard phase III trial design. In both designs, the trial statisticians will have confidential access to unblinded interim results. In phase II/III designs (with or without accrual suspension), continuing to the phase III stage makes it more generally known that the phase II bar has been passed. (Note that there is no access to the actual phase II results.) However, in a standard phase III trial, continuing accrual makes it more generally known that the futility boundary has been passed. More importantly, in the settings in which phase II/III designs are conducted, the appropriate comparator for the phase II/III design is a randomized phase II trial followed by a phase III trial. In this situation, the phase III trial is conducted with full knowledge of the phase II results, so all the concerns about potential for operational bias resulting from the investigators’ and patients’ knowledge of the phase II results would be applicable to the phase III trial. Actually, the concerns would be higher for the phase II trial followed by a separate phase III trial because, unlike in phase II/III design, the detailed results of the phase II analysis would be known.

Another FDA concern with the proposed design was the effect of temporal population trends on the results because of the suspension (End of Phase 2 Meeting Minutes, Reference ID 4096437):

“FDA stated that the primary concern is that as science and/or standard of care evolve over time, an 18-month pause in enrollment may result in a shift in patient characteristics between patients enrolled before the pause in accrual and patients enrolled after enrollment is re-initiated after analysis of PFS for the ‘Phase II’ portion of the study.”

This concern is about the generalizability of the trial results. Although it is a legitimate consideration in interpreting trial results, this concern would similarly apply to a standard phase III trial that took an extra 18 months to complete its enrollment because of slow accrual. In any event, differences in the patient population accrued over 47 months instead of 65 months are probably swamped by the difference between patients who participate in clinical trials and the general patient population, a difference investigators acknowledge and reluctantly accept.

Conclusion

When there is not enough existing evidence to warrant conducting a phase III trial, a phase II/III trial design can lead to finding improved treatments more quickly than performing a phase II trial followed by an independent phase III trial. In some situations, however, an accrual suspension will be necessary to avoid accruing most, or all, of the phase III patients while waiting for the phase II data to mature. Having an accrual suspension in a phase II/III study does not increase potential for a biased estimate of the treatment effect (relative to standard trial designs). Theoretical concerns that the suspension will lead to additional generalizability difficulties do not seem to be of a magnitude of practical importance. There are well-known drawbacks to using phase II/III designs,2,3,12-14 and they are not appropriate for all situations, but whether they have an accrual suspension should not be a deciding factor.

AUTHOR CONTRIBUTIONS

Manuscript writing: All authors

Final approval of manuscript: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

Bias, Operational Bias, and Generalizability in Phase II/III Trials

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/jco/site/ifc.

Boris Freidlin

No relationship to disclose

Edward L. Korn

No relationship to disclose

Jeffrey S. Abrams

No relationship to disclose

REFERENCES

  • 1.Bretz F, Schmidli H, König F, et al. Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: General concepts. Biom J. 2006;48:623–634. doi: 10.1002/bimj.200510232. [DOI] [PubMed] [Google Scholar]
  • 2.Korn EL, Freidlin B, Abrams JS, et al. Design issues in randomized phase II/IIItrials. J Clin Oncol. 2012;30:667–671. doi: 10.1200/JCO.2011.38.5732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang M, Dignam JJ, Zhang QE, et al. Integrated phase II/III clinical trials in oncology: A case study. Clin Trials. 2012;9:741–747. doi: 10.1177/1740774512464724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Redman MW, Goldman BH, LeBlanc M, et al. Modeling the relationship between progression-free survival and overall survival: The phase II/III trial. Clin Cancer Res. 2013;19:2646–2656. doi: 10.1158/1078-0432.CCR-12-2939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Goldman B, LeBlanc M, Crowley J. Interim futility analysis with intermediate endpoints. Clin Trials. 2008;5:14–22. doi: 10.1177/1740774507086648. [DOI] [PubMed] [Google Scholar]
  • 6.Korn EL, Freidlin B. Outcome: Adaptive randomization—Is it useful? J Clin Oncol. 2011;29:771–776. doi: 10.1200/JCO.2010.31.1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Goodman SN. Stopping at nothing? Some dilemmas of data monitoring in clinical trials. Ann Intern Med. 2007;146:882–887. doi: 10.7326/0003-4819-146-12-200706190-00010. [DOI] [PubMed] [Google Scholar]
  • 8.Freidlin B, Korn EL. Stopping clinical trials early for benefit: Impact on estimation. Clin Trials. 2009;6:119–125. doi: 10.1177/1740774509102310. [DOI] [PubMed] [Google Scholar]
  • 9.Bauer P, Posch M. Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections, by H. Schäfer and H.-H. Müller. Stat Med. 2004;23:1333–1334. doi: 10.1002/sim.1759. [DOI] [PubMed] [Google Scholar]
  • 10.Magirr D, Jaki T, Koenig F, et al. Sample size reassessment and hypothesis testing in adaptive survival trials. PLoS One. 2016;11:e0146465. doi: 10.1371/journal.pone.0146465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Freidlin B, Korn EL. Sample size adjustment designs with time-to-event outcomes: A caution. Clin Trials. 2017;14:597–604. doi: 10.1177/1740774517724746. [DOI] [PubMed] [Google Scholar]
  • 12.U.S. Department of Health and Human Services, Food and Drug Administration . Rockville, MD: 2010. Guidance for industry: Adaptive design clinical trials for drugs and biologics [draft for comment]https://www.fda.gov/downloads/drugs/guidances/ucm201790.pdf [Google Scholar]
  • 13.Emerson SS, Fleming TR. Adaptive methods: Telling “the rest of the story”. J Biopharm Stat. 2010;20:1150–1165. doi: 10.1080/10543406.2010.514457. [DOI] [PubMed] [Google Scholar]
  • 14.Cuffe RL, Lawrence D, Stone A, et al. When is a seamless study desirable? Case studies from different pharmaceutical sponsors. Pharm Stat. 2014;13:229–237. doi: 10.1002/pst.1622. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Clinical Oncology are provided here courtesy of American Society of Clinical Oncology

RESOURCES