Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2022 Sep 26;115(1):14–20. doi: 10.1093/jnci/djac185

Augmenting randomized clinical trial data with historical control data: Precision medicine applications

Boris Freidlin 1,, Edward L Korn 2
PMCID: PMC10089586  PMID: 36161487

Abstract

As precision medicine becomes more precise, the sizes of the molecularly targeted subpopulations become increasingly smaller. This can make it challenging to conduct randomized clinical trials of the targeted therapies in a timely manner. To help with this problem of a small patient subpopulation, a study design that is frequently proposed is to conduct a small randomized clinical trial (RCT) with the intent of augmenting the RCT control arm data with historical data from a set of patients who have received the control treatment outside the RCT (historical control data). In particular, strategies have been developed that compare the treatment outcomes across the cohorts of patients treated with the standard (control) treatment to guide the use of the historical data in the analysis; this can lessen the potential well-known biases of using historical controls without any randomization. Using some simple examples and completed studies, we demonstrate in this commentary that these strategies are unlikely to be useful in precision medicine applications.


Advancements in cancer biology and genomics have allowed the development of therapies that are specific for molecularly driven cancer subtypes (1). These precision medicine strategies hold considerable promise, however, clinical testing of new treatments in small subpopulations can be challenging and time consuming, and will become more so as the relevant clinical populations become increasingly refined into smaller and smaller subgroups (2). One of the clinical trial approaches that is frequently proposed to accelerate drug development for molecularly selected and/or rare subpopulation is to utilize patients who have been previously treated with a standard control treatment (3-7).

The desire to use nonrandomized control data in evaluating new therapies and debates about the hazards of using such control data to draw reliable conclusions is not new (8–10). In the ensuing years, there have been many examples demonstrating that nonrandomized studies can yield biased results, sometimes suggesting a treatment is better when it is worse (11,12). Well-known examples of bad recommendations from nonrandomized studies include the use of high-dose chemotherapy with autologous stem cell rescue for metastatic breast cancer (13) and the use of third-generation chemotherapy regimens for advanced non-Hodgkin lymphoma (14). [Note that the use of real-world evidence has not resolved this issue (15).] When a dramatic benefit is seen by a new treatment, there is no need for a randomized clinical trial (RCT) or even a control group [eg, the use of imatinib in patients with chronic myeloid leukemia (16,17)]. Unfortunately, most cancer treatment advances are not dramatic and thus require comparison to a relevant control group to assess the potential benefit as well as the relative toxicities.

To mitigate concerns associated with nonrandomized studies that rely solely on historical controls, Pocock (18) suggested a design in which data from the control arm of a RCT is augmented with data from an “acceptable” historical control cohort. (“Historical control cohort” is used throughout as a shorthand for a nonrandomized control cohort.) The criteria given for when the historical data would be acceptable were quite stringent (Box 1), implicitly assuming that known baseline prognostic variables capture most of the variability in patient outcomes. Although this is an assumption that is impossible to completely validate in practice, sometimes Pocock conditions for the acceptability of using historical control data can appear to be approximately satisfied. For example, sequential one-armed trials of new anticancer therapies conducted by the US National Cancer Institute’s Children’s Oncology Group, which enrolls a very high percentage of eligible US pediatric cancer patients, ensure that the trial population is not too different from trial to trial (19). In general, however, the Pocock criteria would rarely be satisfied. In particular, any time trends in the patient population, ancillary care, or diagnostic/response methodology can lead to a violation of Pocock criteria and misleading results (20).

Box 1.

Conditions for the acceptability using data from a historical control cohort with control-arm data from a randomized clinical trial according to Pocock (18).

  1. Such a cohort must have received a precisely defined standard treatment, which must be the same as the treatment for the randomized controls.

  2. The cohort must have been part of a recent clinical study, which contained the same requirements for patient eligibility.

  3. The methods of treatment evaluation must be the same.

  4. The distributions of important patient characteristics in the cohort should be comparable with those in the new trial.

  5. The previous study must have been performed in the same organization with largely the same clinical investigators.

  6. There must be no other indications leading one to expect differing results between the randomized and historical controls. For instance, more rapid accrual on the new study might lead one to suspect less enthusiastic participation of investigators in the previous study so that the process of patient selection may have been different.

The statistical rationale behind the augmentation approach is that by borrowing information from the historical controls, the design can increase the precision of the estimated control treatment outcomes for use in estimating the true experimental vs control treatment effect (δ). The simplest version of this approach pools data from the RCT control and historical control patients without consideration of whether the outcome data for the historical controls look similar to those from the RCT. In this simplest version of so-called static borrowing, the potentially large amount of historical control data can swamp the relatively small amount of RCT control data leading to essentially a historically controlled study [with all the ensuing biases of such a nonrandomized study (21)]. This has led to the development of many modeling approaches that are designed to lessen the potential bias by using the outcomes of the historical control and RCT control patients to help decide how much of the historical information to “borrow” (ie, how much weight, if at all, to give the historical control data in the analysis). One approach is to downweight the historical control data as compared with the RCT control data with more downweighting if the outcomes of these 2 control datasets are more disparate [dynamic borrowing (21); see Figure 1]. A simple version of dynamic borrowing uses a binary decision when comparing the outcomes of the RCT control and historical control cohorts: if they are too disparate by some predetermined criteria even after controlling for baseline variables, then the analysis proceeds using only the RCT data; otherwise, the historical control data is pooled with the RCT control data to estimate the experimental vs control treatment effect [test-then-pool (21)].

Figure 1.

Figure 1.

Schematic diagram of differential weighting of historical control data using dynamic weighting augmentation for estimating a treatment effect δ. Large ovals represent trial cohorts. Outcomes of patients treated in the historical control cohort are represented by empty circles; outcomes of the randomized clinical trial control and experimental arm patients are represented by empty triangles and filled triangles, respectively. RCT = randomized clinical trial.

A different modeling approach (meta-analytic approach) using historical control outcome data has been developed for settings where multiple historical control trial cohorts are available representing all the clinical trials in the relevant patient populations (22). Similar to meta-analysis, this approach is based on isolating between-trial and within-trial sources of variability. The approach then adjusts for the between-trial variability: the more disparate are trial-level outcomes, the less weight is given to the historical control data in the analysis (Figure 2). Intuitively, high between-trial variability increases the variability of the treatment effect estimated using these data, suggesting that the data should be downweighted. Furthermore, if there are systematic differences between the RCT control data and the historical control cohorts (Figure 3), then one needs to use more sophisticated models that downweight the historical control data even further if it appears to be divergent from the RCT control data (as is done with dynamic borrowing).

Figure 2.

Figure 2.

Schematic diagram of differential weighting of historical control data using a meta-analytic approach for estimating a treatment effect δ. Large ovals represent trial cohorts. Outcomes of patients treated in the historical control cohorts are represented by empty circles; outcomes of the randomized clinical trial control and experimental arm patients are represented by empty triangles and filled triangles, respectively. RCT = randomized clinical trial.

Figure 3.

Figure 3.

Schematic diagram of differential weighting of historical control data using a meta-analytic approach when accounting for possible nonexchangeability. Large ovals represent trial cohorts. Outcomes of patients treated in the historical control cohorts are represented by empty circles; outcomes of the randomized clinical trial control and experimental arm patients are represented by empty triangles and filled triangles, respectively. RCT = randomized clinical trial.

With increasing interest in evaluating treatments in small biomarker-defined subsets, augmenting RCT data with historical control data using dynamic borrowing or meta-analytic approaches is ostensibly attractive; the historical control data are used to improve precision of estimated treatment effects only to the extent that it is safe to do so. In this commentary, we evaluate the applicability of these strategies for precision medicine applications.

Operating characteristics of augmentation

There are numerous simulation studies and theoretical calculations of the operating characteristics of various augmentation strategies in the literature, with excellent discussions available (21,23–25). To summarize, unless one knows or is willing to assume a priori how close the outcomes are for the patients in the control arm of the RCT and the patients in the historical data, inclusion of the historical data in the analysis (in some manner) can 1) inflate the type 1 error for testing the null hypothesis that the new treatment is no better than the control treatment (false-positive rate), 2) reduce the power of rejecting the null hypothesis when the new treatment is truly better than the control treatment, and/or 3) introduce bias and/or reduce the precision of the estimated treatment effect. We present 2 simple calculations to demonstrate intuitively the limitations of these methods in the precision-medicine context.

Dynamic borrowing

Consider an RCT with 20 patients in each arm that has 60% power to detect improvement in a mean outcome of 10 units (eg, lowering mean systolic blood pressure by 10 mmHg) using a one-sided type 1 error of 0.025. A 60% power is lower than what would usually be used (eg, 80% or 90%), but we are assuming we are in the situation where there is at most 40 patients available to participate in a reasonable timeframe (eg, because the targeted subpopulation is rare). (Note that if a larger number of patients were available, a standard RCT without augmentation could be used.) To improve the power, consider an augmentation strategy that will potentially utilize data from 200 historical control patients. To assess whether to include these patients, consider a 95% confidence interval for the parameter (β) representing the bias in using the historical control data (ie, the difference in expected mean outcomes between the RCT control and historical control patents). With these sample sizes, the half-width of this confidence interval is 6.6 (see the Supplementary Methods, available online). This implies that even if we observe RCT control and historical control means that are identical, the 95% confidence interval of their difference will be -6.6 to 6.6, which is large in relation to the targeted trial target difference of 10. Note that increasing further the number of historical control patients does not help here, as the half-width of the 95% confidence interval can never get smaller than 6.3. (This is because this precision is limited by the relatively small sample size of the RCT control data.) The underlying problem is that there is not enough precision to rule out relevant differences between the RCT control and the historical control outcomes to allow one to decide how, and even if, to augment the RCT control data with the historical control data.

It is sometimes suggested that an unbalanced randomization be used in the RCT in favor of the experimental arm (eg, 2:1 or 3:1) so that with inclusion of the historical control data, the amount of information will be more equal for the standard and experimental treatments (18,26,27). This approach can be problematic. Continuing with the simple example above, suppose a 3:1 randomization was used so that there were 30 patients treated with the experimental therapy and 10 patients with the control treatment. When the observed RCT control mean (from 10 patients) and the observed historical control mean (from 200 patients) are identical, the 95% confidence interval for the difference in control means would be -9.1 to 9.1, worse than when a 1:1 randomization is used. Furthermore, if the outcome data suggest the historical control data should not be used, then one is left with an unbalanced RCT, which will have less power than a trial with equal sample sizes in the trial arms (48% instead of 60% in this example). Several authors proposed addressing this problem by adjusting the randomization ratio according to accumulating information (on the amount of borrowing from the historical data) during the trial in order to balance the final experimental vs pooled historical and RCT control sample sizes (3,22). Unfortunately, interim changes in the RCT randomization ratio lead to statistical inefficiency and thus are not recommended (28).

If one knows or is willing to assume something about the range of the bias β before seeing the outcome data, then this could potentially help with the incorporation of the historical data. For example, one could ask a group of experts what they believed was the likely range of β. With this prior distribution, one could perform a Bayesian analysis that combines the historical data with the RCT data, perhaps downweighting the influence of the historical data (29). Unfortunately, eliciting the prior distribution of β is a daunting task and, to the extent the prior does not match reality, the resulting inference for the treatment effect may be biased. To avoid this difficulty, it is frequently suggested that a “noninformative” prior be used, which in this case would specify that all values of β are equally likely. However, this yields a Bayesian 95% posterior interval for β that is identical to the frequentist 95% confidence interval (30), resulting in the same issues regarding the ability to reliably determine the degree of borrowing as described earlier in this section. Therefore, unless one is willing to assume something about the range and distribution of the bias β (before seeing the outcome data), dynamic borrowing is not useful (24).

Meta-analytic approaches

These approaches rely on estimating the between-trial variability of the outcomes of the historical control cohorts. Unfortunately, in precision medicine applications, it may be impossible to estimate this variability reliably enough to help with decision making, because in these applications there are unlikely to be many historical trial cohorts available where the patients have been evaluated with the relevant biomarker that determines the targeted population and who received the relevant control treatment. Moreover, there are unlikely to be many biomarker-positive patients in each of these cohorts. [Note that one cannot simply use historical control patients regardless of biomarker status because biomarker positivity may be prognostic (31).] Consider the situation as described in the previous section (RCT restricted to biomarker-positive patients with 20 patients in each arm, targeting a treatment effect of 10 units with 60% power). Suppose there are 3 historical trial cohorts with 10 biomarker-positive patients in each of these cohorts. Furthermore, suppose the standard deviation of the between-trial results is 5 units, a large quantity (as compared with the targeted treatment difference of 10) implying that the historical control data is not useful. However, it can be shown via simulation that 36% of the time the estimated between-trial standard deviation will be 0 (see the Supplementary Methods, available online), suggesting that one could use the historical control data by simply pooling it with the RCT control data. If one did this, the actual type 1 error would be 7.1% instead of the nominal 2.5% for cases where the estimated between-trial standard deviation was 0 (see the Supplementary Methods, available online).

This variability provides an upper bound on the utility of using the historical data as we have assumed the ideal situation where the current trial cohort can be viewed as coming from the same population of cohorts as these historical trials [exchangeability (32)]. If exchangeability is too strong an assumption, one can attempt to downweight the historical control data as is done with dynamic borrowing using the distance between the trial means for the RCT control data and the historical control cohorts (Figure 3). However, with a small number of cohorts as is likely in precision-medicine applications, this will not be possible unless one makes strong a priori assumptions about the range of trial effects.

Applications using augmentation strategies

We consider examples of published studies where the analysis strategy for including the historical control data with the RCT data was prespecified before the RCT was concluded. The examples were identified by first considering 2 papers: a seminal 1976 paper by Pocock (18) that originated this methodology and Viele et al. (21), a comprehensive review authored by experts in these designs. We then systematically examined the 258 papers that cited the Pocock paper (18) and the 210 papers that cited the Viele et al. paper (21) for relevant trials, and we also examined if these citing papers referenced relevant trials. Seven studies were identified: 3 using dynamic borrowing with a single historical control cohort and 4 using meta-analytic approaches with multiple historical control cohorts; 1 of these studies (33) was stopped after only 6 patients were enrolled because of the results of another study and is not considered further here. Although these studies did not involve precision medicine, many involved rare disease settings, and thus they can shed light on operational issues in using augmentation strategies with small relevant populations. For studies that end up augmenting with historical data, the “effective sample size” of these data represent how many RCT control patients their data are equivalent to (22). This is an intuitive way to quantify to what degree the historical data was downweighted when combined with the RCT control data. For example, an effective sample size of 20 for 200 historical control patients means that one could have obtained the same precision in estimating the treatment effect if one had randomly assigned another 20 patients to the control arm instead of using the historical control data.

Examples using a single historical cohort and dynamic borrowing

Bodey et al. (34) conducted a RCT of 3 different antibiotic regimens (ticarcillin-vancomycin; ticarcillin-vancomycin-ceftazidime; ceftazidime-vancomycin [CV]) for adult cancer patients with neutropenic fever using a 2:2:1 randomization (number of evaluable patients = 170, 189, and 97 with special interest in those with gram-negative infections = 43, 20, and 14, respectively). The study plan was to augment the CV RCT data with historical control data on 25 patients with gram-negative infections treated on the CV arm of a previous RCT (35). However, the investigators noted a large difference between the 2 CV cohorts in the activity of the CV regimen against gram-negative infections and (wisely) decided not to use the historical data in the analysis noting, “It was unfortunate that we chose to use a 2:1 randomization in this study, so that fewer patients were assigned to CV” (34).

Yamaue et al. (36) conducted an overall survival noninferiority RCT of alternate-day vs standard daily S-1 treatment for unresectable advanced pancreatic cancer using a 2:1 randomization (number of evaluable patients = 121 and 64, respectively). The study plan was to use a test-then-pool strategy to augment the daily S-1 RCT data with historical control data from 280 patients on the daily S-1 arm of a previous RCT (37). The overall survival difference between the current RCT daily S-1 treatment arm and the previous S-1 patient data was too large to allow use of the historical control data. Noninferiority was not demonstrated (perhaps because of the wide confidence interval for the estimated hazard ratio), and the investigators noted, “historical S-1 GEST results were not pooled, resulting in the smaller sample size in the daily S-1 treatment group. This might have affected several aspects of the study, including the assessment of the primary endpoint and its subgroup analyses” (36).

Salloway et al. (38) conducted a RCT of gantenerumab, solanezumab, and placebo (numbers of individuals = 52, 50, and 40, respectively) for dominantly inherited Alzheimer disease. The prespecified analysis plan included dynamic-borrowing augmentation of the RCT placebo data with data from 69 individuals from an observational study conducted by the same investigators. The (negative) results reported (38) appear to be restricted to the randomly assigned patients and do not use the historical control data.

Examples using multiple historical cohorts with a meta-analytic approach

Hueber et al. (39) conducted a RCT of secukinumab (an anti-interleukin 17A monoclonal antibody) vs placebo for patients with moderate to severe Crohn disease using a 2:1 randomization (number of patients = 39 vs 20, respectively). The RCT placebo data was augmented with placebo data from 6 prior studies (number of patients = 671, effective sample size = 20). The results of the study with the augmented data were reported as negative. However, it appears that the analysis of the primary endpoint would have been the same if the augmented historical control data had not been used.

Baeten et al. (40) conducted a RCT of secukinumab vs placebo for patients with active ankylosing spondylitis using a 4:1 randomization (number of patients = 24 vs 6, respectively). The RCT placebo data was augmented with placebo data from 8 prior studies (number of patients = 533, effective sample size = 43). The results of the study using the augmented data were reported as positive. However, it appears that the analysis of the primary endpoint would have been the same if the augmented historical control data had been not used.

Holmes et al. (41) conducted a RCT to assess noninferiority or superiority of a sirolimus-eluting stent vs intracoronary vascular brachytherapy (control treatment) for treatment of restenosis following bare-metal stent implantation using a 2:1 randomization (number of evaluable patients = 259 and 125, respectively). The RCT control data were augmented with data from 2 prior trials where patients were treated with vascular brachytherapy. Neither the sample size of the historical data nor its effective sample size is provided in the publication, although it is stated that it was assumed for planning that there would be 256 historical control patients (41). The results of the trial demonstrated the superiority of the sirolimus-eluting stent and were reported to remain the same regardless of whether the historical data were used in the analysis.

Discussion

Theoretically, using outcome data from the RCT control and historical control patients to decide how to incorporate the historical control data into the analysis is an attractive idea. However, the operating characteristics of this strategy suggest that for precision-medicine applications, the sample sizes will typically be too small to effectively guide dynamic borrowing, and the limited number of historical trial cohorts will render meta-analytic approaches unreliable. Moreover, in precision medicine applications, an additional complication is that molecular classifications involving novel biomarkers tend to be regularly updated, which makes it difficult to match current and historical control populations (42). This would argue for the importance of more systematic banking of specimens for future biomarker evaluation to identify targeted subpopulations. Although this may ameliorate some of the concerns, considerable challenges remain for accurate molecular matching of the subgroups as well as accurate matching of the background diagnostic, clinical care, and follow-up standards.

There are some lessons that can be drawn from the limited number of published applications with augmented RCTs. First, if an unbalanced randomization is used in the RCT (as frequently suggested) and the historical control information proves to be unusable, then one is left with the analysis of an inefficient unbalanced RCT, which is particularly suboptimal with a small number of patients. Secondly, the number of historical control patients may be an order of magnitude larger than the effective sample size of these patients [this can also be seen in retrospective studies of historical control cohorts (22,43)]. This implies a high heterogeneity in the control cohorts, raising serious concern about the utility of including them in analyses.

For a phase II trial, one could argue that the risk of bias due to incorporating historical controls is acceptable (because the trial would be followed by a definitive phase III trial). However, with the use of relaxed type I error (eg, α = 10%) and intermediate endpoints, randomized phase II trials targeting robust treatment effects can be relatively small (eg, 60), lessening the need to augment with historical controls (44). Alternatively, in a setting where the clinical outcomes across multiple trials are consistently poor (eg, metastatic pancreatic cancer, diffuse intrinsic pontine glioma, or metastatic melanoma in the precheckpoint-inhibitor era), the historical data from these trials can be used to reliably set an activity benchmark for a future single-arm phase II trial (45–47).

In settings where a disease is uncommon or a rare molecularly defined subgroup is of interest, it can be challenging to conduct a timely well-powered RCT, and alternative approaches become more attractive. One approach to reducing required sample size is relaxing the evidentiary threshold [ie, using a relaxed type I error (3,19,48,49); see reference (50) for an example of a precision-medicine trial using a one-sided 0.15 type I error]. However, it should be appreciated that in the precision oncology setting, it will often be impossible to reliably demonstrate the benefit of a new treatment that is only modestly better than the current standard of care (48). On the other hand, if one is targeting a large treatment effect (which would be justified in precision oncology studies evaluating targeted agents in rare molecularly defined subgroups), a small RCT can be performed [eg, (51)], and the average sample size and duration of the RCT can be further reduced with the use of interim monitoring (48); if randomization is impossible and Pocock conditions are approximately satisfied, then a single-arm trial with no randomization can be considered. In summary, the strategy that augments RCT data with historical control data does not appear useful for precision-medicine applications.

Funding

None.

Notes

Role of the funder: Not applicable.

Disclosures: None.

Author contributions: Conceptualization, all authors; Writing-original draft, all authors; Writing-review and editing, all authors.

Supplementary Material

djac185_Supplementary_Data

Contributor Information

Boris Freidlin, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.

Edward L Korn, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.

Data availability

No primary data was used in this manuscript. All figures/tables cited are from published sources.

References

  • 1. Jameson JL, Longo DL.. Precision medicine–personalized, problematic, and promising. N Engl J Med. 2015;372(23):2229-2234. [DOI] [PubMed] [Google Scholar]
  • 2. Nass SJ, Rothenberg ML, Pentz R, et al. Accelerating anticancer drug development– opportunities and trade-offs. Nat Rev Clin Oncol. 2018;15(12):777-786. [DOI] [PubMed] [Google Scholar]
  • 3. US Food and Drug Administration. Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products Guidance for Industry; 2019. https://www.fda.gov/media/133660/download. Accessed September 3, 2022.
  • 4. Beaulieu-Jones BK, Finlayson SG, Yuan W, et al. Examining the use of real-world evidence in the regulatory process. Clin Pharmacol Ther. 2020;107(4):843-852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Feinberg BA, Gajra A, Zettler ME, et al. Use of real-world evidence to support FDA approval of oncology drugs. Value Health. 2020;23(10):1358-1365. [DOI] [PubMed] [Google Scholar]
  • 6. Rahman R, Ventz S, McDunn J, et al. Leveraging external data in the design and analysis of clinical trials in neuro-oncology. Lancet Oncol. 2021;22(10):e456-e465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Yap TA, Jacobs I, Andre EE, et al. Application of real-world data to external control groups in oncology clinical trial drug development. Front Oncol. 2021;11:695936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Chalmers TC, Block JB, Lee S.. Controlled studies in clinical cancer research. N Engl J Med. 1972;287(2):75-78. [DOI] [PubMed] [Google Scholar]
  • 9. Gehan EA, Freireich EJ.. Non-randomized controls in cancer clinical trials. N Engl J Med. 1974;290(4):198-203. [DOI] [PubMed] [Google Scholar]
  • 10. Byar DP, Simon RM, Friedewald WT, et al. Randomized clinical trials. Perspectives on some recent ideas. N Engl J Med. 1976;295(2):74-80. [DOI] [PubMed] [Google Scholar]
  • 11. Freidlin B, Korn EL.. Assessing causal relationships between treatments and clinical outcomes: always read the fine print. Bone Marrow Transplant. 2012;47(5):626-632. [DOI] [PubMed] [Google Scholar]
  • 12. Prasad V, Vandross A, Toomey C, et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc. 2013;88(8):790-798. [DOI] [PubMed] [Google Scholar]
  • 13. Mello MM, Brennan TA.. The controversy over high-dose chemotherapy with autologous bone marrow transplant for breast cancer. Health Aff (Millwood). 2001;20(5):101-117. [DOI] [PubMed] [Google Scholar]
  • 14. Fisher RI, Gaynor ER, Dahlberg S, et al. Comparison of a standard regimen (CHOP) with three intensive chemotherapy regimens for advanced non-Hodgkin’s lymphoma. N Engl J Med. 1993;328(14):1002-1006. [DOI] [PubMed] [Google Scholar]
  • 15. Boyle JM, Hegarty G, Frampton C, et al. Real-world outcomes associated with new cancer medicines approved by the Food and Drug Administration and European Medicines Agency: a retrospective cohort study. Eur J Cancer. 2021;155:136-144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Johnson JR, Bross P, Cohen M, et al. Approval summary: imatinib mesylate capsules for treatment of adult patients with newly diagnosed Philadelphia chromosome-positive chronic myelogenous leukemia in chronic phase. Clin Cancer Res. 2003;9(6):1972-1979. [PubMed] [Google Scholar]
  • 17. Cohen MH, Williams G, Johnson JR, et al. Approval summary for imatinib mesylate capsules in the treatment of chronic myelogenous leukemia. Clin Cancer Res. 2002;8(5):935-942. [PubMed] [Google Scholar]
  • 18. Pocock SJ. The combination of randomized and historical controls in clinical trials. J Chronic Dis. 1976;29(3):175-188. [DOI] [PubMed] [Google Scholar]
  • 19. Renfro LA, Ji L, Piao J, Onar-Thomas A, et al. Trial design challenges and approaches for precision oncology in rare tumors: experiences of the Children’s Oncology Group. J Clin Oncol Precis Oncol. 2019;3:PO.19.00060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Dodd LE, Freidlin B, Korn EL.. Platform trials–beware the noncomparable control group. N Engl J Med. 2021;384(16):1572-1573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Viele K, Berry S, Neuenschwander B, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat. 2014;13(1):41-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ.. Summarizing historical information on controls in clinical trials. Clin Trials. 2010;7(1):5-18. [DOI] [PubMed] [Google Scholar]
  • 23. Cuffe RL. The inclusion of historical control data may reduce the power of a confirmatory study. Stat Med. 2011;30(12):1329-1338. [DOI] [PubMed] [Google Scholar]
  • 24. Galwey NW. Supplementation of a clinical trial by historical control data: is the prospect of dynamic borrowing an illusion? Stat Med. 2017;36(6):899-916. [DOI] [PubMed] [Google Scholar]
  • 25. Kopp-Schneider A, Calderazzo C, Wiesenfarth M.. Power gains by using external information in clinical trials are typically not possible when requiring strict type I error control. Biom J . 2020;62(2):361-374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Hobbs BP, Carlin BP, Sargent DJ.. Adaptive adjustment of the randomization ratio using historical control data. Clin Trials. 2013;10(3):430-440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Mishra-Kalyani PS, Amiri Kordestani L, Rivera DR, et al. External control arms in oncology: current use and future directions. Ann Oncol. 2022;33(4):376-383. [DOI] [PubMed] [Google Scholar]
  • 28. Korn EL, Freidlin B.. Time trends with response-adaptive randomization: the inevitability of inefficiency. Clin Trials. 2022;19(2):158-161. [DOI] [PubMed] [Google Scholar]
  • 29. Ibrahim JG, Chen MH, Gwon Y, Chen F.. The power prior: theory and applications. Statist Med. 2015;34(28):3724-3749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Spiegelhalter DJ, Abrams KR, Myles JP.. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York: Wiley, 2003. [Google Scholar]
  • 31. McShane LM, Hunsberger S, Adjei AA.. Effective incorporation of biomarkers into phase II trials. Clin Cancer Res. 2009;15(6):1898-1905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Gray CM, Grimson F, Layton D, et al. A framework for methodological choice and evidence assessment for studies using external comparators from real- world data. Drug Saf. 2020;43(7):623-633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Lake SL, Quintana MA, Broglio K, et al. Bayesian adaptive design for clinical trials in Duchenne muscular dystrophy. Stat Med. 2021;40(19):4167-4184. [DOI] [PubMed] [Google Scholar]
  • 34. Bodey GP, Fainstein V, Elting LS, et al. Beta-lactam regimens for the febrile neutropenic patient. Cancer. 1990;65(1):9-16. [DOI] [PubMed] [Google Scholar]
  • 35. Anaissie EJ, Fainstein V, Bodey GP, et al. Randomized trial of beta-lactam regimens in febrile neutropenic cancer patients. Am J Med. 1988;84(3 pt 2):581-589. [DOI] [PubMed] [Google Scholar]
  • 36. Yamaue H, Shimizu A, Hagiwara Y, et al. Multicenter, randomized, open-label phase II study comparing S-1 alternate-day oral therapy with the standard daily regimen as a first-line treatment in patients with unresectable advanced pancreatic cancer. Cancer Chemother Pharmacol. 2017;79(4):813-823. [DOI] [PubMed] [Google Scholar]
  • 37. Ueno H, Ioka T, Ikeda M, et al. Randomized phase III study of gemcitabine plus S-1, S-1 alone, or gemcitabine alone in patients with locally advanced and metastatic pancreatic cancer in Japan and Taiwan: GEST study. J Clin Oncol. 2013;31(13):1640-1648. [DOI] [PubMed] [Google Scholar]
  • 38. Salloway S, Farlow M, McDade  E, et al. ; for the Dominantly Inherited Alzheimer Network-Trials Unit. A trial of gantenerumab or solanezumab in dominantly inherited Alzheimer’s disease. Nat Med. 2021;27(7):1187-1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Hueber W, Sands BE, Lewitzky S, et al. ; for the Secukinumab in Crohn’s Disease Study Group. Secukinumab, a human anti-IL-17A monoclonal antibody, for moderate to severe Crohn’s disease: unexpected results of a randomised, double-blind placebo-controlled trial. Gut. 2012;61(12):1693-1700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Baeten D, Baraliakos X, Braun J, et al. Anti-interleukin-17A monoclonal antibody secukinumab in treatment of ankylosing spondylitis: a randomised, double-blind, placebo-controlled trial. Lancet. 2013;382(9906):1705-1713. [DOI] [PubMed] [Google Scholar]
  • 41. Holmes DR, Teirstein P, Satler L, et al. Sirolimus-eluting stents vs vascular brachytherapy for in-stent restenosis within bare-metal stents: the SISR randomized trial. 2006;295(11):1264-1273. [DOI] [PubMed] [Google Scholar]
  • 42. Irwin MS, Naranjo A, Zhang FF, et al. Revised neuroblastoma risk classification system: a report from the Children’s Oncology Group. J Clin Oncol. 2021;39(29):3229-3241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Gsteiger S, Neuenschwander B, Mercier F, Schmidli H.. Using historical control information for the design and analysis of clinical trials with overdispersed count data. Stat Med. 2013;32(21):3609-3622. [DOI] [PubMed] [Google Scholar]
  • 44. Rubinstein LV, Korn EL, Freidlin B, Hunsberger S, Ivy SP, Smith MA.. Design issues of randomized phase II trials and a proposal for phase II screening trials. J Clin Oncol. 2005;23(28):7199-7206. [DOI] [PubMed] [Google Scholar]
  • 45. Korn EL, Liu PY, Lee SJ, et al. Meta-analysis of phase II cooperative group trials in metastatic stage IV melanoma to determine progression-free and overall survival benchmarks for future phase II trials. J Clin Oncol. 2008;26(4):527-534. [DOI] [PubMed] [Google Scholar]
  • 46. Baxter PA, Su JM, Onar-Thomas A, Billups CA, et al. A phase I/II study of veliparib (ABT-888) with radiation and temozolomide in newly diagnosed diffuse pontine glioma: a Pediatric Brain Tumor Consortium study. Neuro Oncol. 2020;22(6):875-885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Philip PA, Chansky K, LeBlanc M, et al. Historical controls for metastatic pancreatic cancer: benchmarks for planning and analyzing single-arm phase II trials. Clin Cancer Res. 2014;20(16):4176-4185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Korn EL, McShane LM, Freidlin B.. Statistical challenges in the evaluation of treatments for small patient populations. Sci Transl Med. 2013;5(178):178sr3. [DOI] [PubMed] [Google Scholar]
  • 49. Parmar MK, Sydes MR, Morris TP.. How do you design randomised trials for smaller populations? A framework. BMC Med. 2016;14(1):183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Brown PA, Kairalla JA, Hilden JM, et al. FLT3 inhibitor lestaurtinib plus chemotherapy for newly diagnosed KMT2A-rearranged infant acute lymphoblastic leukemia: Children’s Oncology Group trial AALL0631. Leukemia. 2021;35(5):1279-1290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Gounder MM, Mahoney MR, Van Tine BA, et al. Sorafenib for advanced and refractory desmoid tumors. N Engl J Med. 2018;379(25):2417-2428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Searle SR, Casella G, McCulloch CE.. Variance Components. New York: Wiley; 1992. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

djac185_Supplementary_Data

Data Availability Statement

No primary data was used in this manuscript. All figures/tables cited are from published sources.


Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES