Leveraging external control data in the design and analysis of neuro-oncology trials: Pearls and perils

Mei-Yin C Polley; Daniel Schwartz; Theodore Karrison; James J Dignam

doi:10.1093/neuonc/noae005

. 2024 Jan 22;26(5):796–810. doi: 10.1093/neuonc/noae005

Leveraging external control data in the design and analysis of neuro-oncology trials: Pearls and perils

Mei-Yin C Polley ^1,^2,^✉, Daniel Schwartz ³, Theodore Karrison ^4,⁵, James J Dignam ^6,⁷

PMCID: PMC11066907 PMID: 38254183

Abstract

Background

Randomized controlled trials have been the gold standard for evaluating medical treatments for many decades but they are often criticized for requiring large sample sizes. Given the urgent need for better therapies for glioblastoma, it has been argued that data collected from patients treated with the standard regimen can provide high-quality external control data to supplement or replace concurrent control arm in future glioblastoma trials.

Methods

In this article, we provide an in-depth appraisal of the use of external control data in the context of neuro-oncology trials. We describe several clinical trial designs with particular attention to how external information is utilized and address common fallacies that may lead to inappropriate adoptions of external control data.

Results

Using 2 completed glioblastoma trials, we illustrate the use of an assessment tool that lays out a blueprint for assembling a high-quality external control data set. Using statistical simulations, we draw caution from scenarios where these approaches can fall short on controlling the type I error rate.

Conclusions

While this approach may hold promise in generating informative data in certain settings, this sense of optimism should be tampered with a healthy dose of skepticism due to a myriad of design and analysis challenges articulated in this review. Importantly, careful planning is key to its successful implementation.

Keywords: biases, concurrent control, external control, glioblastoma, type I error

Randomized controlled trials (RCTs) have been the gold standard for evaluating the efficacy and safety of medical treatments for many decades. Randomization, a central tenet of RCTs, ensures that treatment assignment is independent of baseline characteristics, thereby reducing the risk of systematic difference in expected prognosis between treatment arms. In turn, the observed improvement in clinical outcomes can be attributed to the experimental intervention under study. The probability that the observed difference (or something greater) could have arisen from chance imbalances in baseline characteristics is captured by the P-value. Despite these methodological appeals, RCTs are often criticized for requiring substantially larger sample sizes and taking longer to complete compared to their single-arm trial counterparts. Further, randomization may be difficult, infeasible, or unethical in some clinical scenarios. For example, patients with life-threatening illnesses such as brain cancer may seek clinical trials where their chance of receiving the experimental therapy is high (or certain) and thus are reluctant to participate in trials that offer an equal chance of receiving the standard therapy and the investigational therapy. Relatedly, trial sponsors and investigators, particularly those engaged in neuro-oncology trials, may wish to maximize the number of patients randomized to the investigational product to better characterize its safety profile. In these scenarios, incorporating external control (EC) data may be compelling as it represents an opportunity to obviate the need for a randomized control (RC) arm completely or to reduce the number of patients allocated to the standard therapy.

The concept of leveraging EC data in a clinical trial is not new,¹ but with the wider availability of data sources to draw upon for ECs, recent years have seen a growing interest in utilizing data outside of RCTs in the design and analysis of clinical trials, especially trials in neuro-oncology.^2–10 These data sources may include data from registry studies, data from prior clinical trials, and data derived from electronic health records (EHRs). In principle, incorporating existing information can increase the efficiency of the trial design. However, the adoption of this approach is not yet common due to concerns about admitting bias and the limited experience available in practice. For valid statistical inferences and sound clinical and regulatory decision-making, the clinical trial community needs a deeper understanding of these methods to responsibly leverage these available resources.

The purpose of this article is to provide an in-depth appraisal of the use of EC data in the context of randomized clinical trials with a focus on neuro-oncology trials. In Section Clinical Trial Designs Utilizing External Information, we describe several clinical trial designs including the single-arm design (SAD), the RCT design, the externally controlled SAD, and a hybrid design that uses ECs to augment the RC arm, with particular attention to how external information is utilized in the design and/or analysis of each approach. In Section The Fallacies of Using External Control Data, we address several common fallacies that may lead to inappropriate adoptions of EC data. In Section Use of External Control Data in GBM Trials, using 2 completed trials in glioblastoma conducted by NRG Oncology, a member of the National Cancer Institute (NCI)-sponsored clinical trial network groups, we illustrate the use of an assessment tool that lays out a blueprint for assembling a high-quality EC dataset. In Section Comparison of RCD and EAD via Statistical Simulations, using statistical simulations, we elucidate the potential gain in design efficiency when using EC data and draw caution from scenarios where these approaches can fall short of controlling type I error. In Section Other Practical and Statistical Considerations, we address several practical and statistical challenges associated with implementing EC designs in practice.

Clinical Trial Designs Utilizing External Information

In this section, we review several clinical trial designs and delineate the manner by which external information is used in the design and/or analysis of each approach. Table 1 summarizes the salient features of each design. Figure 1 gives the schema for these designs. Literature abounds with considerations regarding the pros and cons of single-arm versus randomized trial designs. Statistical evaluations on the likelihood of false positive results in GBM single-arm trials as well as the risks and benefits of integrating external data for trial result predictions and interim decision-making for GBM trials have also been discussed in the literature. Recent reviews provide an excellent resource for gaining familiarity specifically in the neuro-oncology and glioblastoma space.^2,11–16

Table 1.

Clinical trial designs

Trial design	Description	Use of external information	Advantage(s)	Limitation(s)
1) Randomized controlled design (RCD)	Concurrent randomization between the experimental therapy (ET) arm and randomized control (RC) arm Uses some historical benchmark (eg, 1-year survival) for efficacy estimate of the RC arm in the design Analysis and statistical inference of treatment effect are based on patient-level data from ET arm and RC arm only.	Little	Randomization ensures independence between treatment assignment and confounders.	Requires large sample size → feasibility may be called into question in certain settings
2) Single-arm design (SAD)	No randomization—all trial patients receive ET Uses some historical benchmark (eg, objective response rate or 1-year survival) for efficacy evaluation (under the null) Analysis is based on data from trial patients only. Statistical inference of the treatment effect is based on comparison with the historical benchmark.	To determine benchmark only	Small sample size (because the design treats the historical benchmark as a fixed constant, ignoring the statistical uncertainty around the estimate)	Susceptible to biases due to confounders, data quality issues, differences in outcome definition, follow-up frequency/method, drift in patient prognosis, change in supportive care, etc. If the historical benchmark is inadequate, may lead to inaccurate evaluation of treatment efficacy.
3) Externally controlled single-arm design (EC-SAD)	No randomization—all trial patients receive ET Depends on the availability of an external cohort of patients who had received the standard therapy (EC cohort). The design, analysis, and statistical inference of treatment effect are based on patient-level data from ET arm and EC arm.	all	Smaller “in-trial” sample size since the trial only enrolls patients to the ET arm Possible to adjust for differences in measured confounders between ET arm and EC arm	Less susceptible to biases than SAD because of the possibility to adjust for differences in baseline factors Biases may still persist due to unmeasured confounders, data quality issues, differences in outcome definition, follow-up frequency/method, drift in patient prognosis, change in supportive care, etc.
4) Externally augmented design (EAD)	Includes a randomization component (to ET or RC) and the availability of an external control (EC) cohort. The design, analysis, and statistical inference of treatment effect are based on patient-level data from the randomization component and the EC cohort.	Some/all	Potentially smaller “in-trial” sample size than its RCD counterpart Possible to adjust for differences in measured confounders between RC arm and EC arm Allows for an interim comparison of clinical outcome between RC and EC—the result of which informs the use of EC at the second stage of the trial	Less susceptible to biases than SAD because of the possibility to adjust for differences in baseline factors Biases may still persist due to unmeasured confounders, data quality issues, differences in outcome definition, follow-up frequency/method, drift in patient prognosis, change in supportive care, etc. Larger ‘in-trial’ sample size than EC-SAD (due to the need for an RC arm)

Open in a new tab

Figure 1. — Schemas of various clinical trial designs: (A) randomized control design (RCD); (B) single-arm design (SAD); (C) externally controlled single-arm design (EC-SAD); (D) externally augmented design (EAD). R = randomization between standard therapy and experimental therapy; R1 = first randomization (balanced randomization between standard therapy and experimental therapy); R2 = second randomization (2:1 randomization in favor of the experimental therapy).

Design 1: Randomized Controlled Design

The randomized controlled design (RCD) refers to a trial design that concurrently randomizes patients to either the standard treatment or the experimental treatment (ET). The study arm in which patients receive the standard treatment is referred to as the RC arm. The process of randomization minimizes the possibility of investigator bias in patient selection and limits the extent to which patients differ in characteristics that may confound the outcome comparison.

In an RCD, external information is typically used at the design stage of the trial but not for analyzing the final data. To determine the sample size, investigators often base the treatment efficacy of the RC arm upon some historical benchmark from past studies of patients with a similar disease condition who had received the standard treatment. The trial is then powered to detect some clinically meaningful improvement associated with the experimental therapy (eg, to increase 1-year survival rate from a historical benchmark of 30%–50%). No patient-level data from historical studies are used in the design or the analysis of an RCD. In the final analysis after the trial is completed, the estimate of treatment effect is based on a direct comparison of patient-level data between ET and RC.

From a design efficiency perspective, RCDs tend to require large sample sizes compared to their SAD alternatives. This is because the study design assumes statistical uncertainties around the efficacy estimates in the two treatment arms. In oncology, for example, trials of solid tumors with cytotoxic therapies may seek to obtain an estimate of the tumor response rate, defined as the proportion of patients with sufficient tumor shrinkage (these patients are referred to as the “responders”). Suppose that the percent of responders based on historical studies of the standard therapy is believed to be 20%, an RCD that randomizes patients to either the standard or the experimental therapy in a 1:1 randomization ratio will require 112 patients (56 per arm) in order to achieve 80% statistical power to rule out the null (undesirable) benchmark of 20% in favor of a superior response rate of 40% (the alternative hypothesis), using a 1-sided test with 10% type I error. As new experimental therapies continue to emerge at an accelerated pace, the cost and resources involved in conducting RCDs will likely limit the number of clinical questions that can be answered. The precision medicine paradigm further complicates this issue by selecting patient subgroups for trial eligibility using molecular biomarkers. These biomarker-defined subgroups are often small, resulting in substantial challenges in completing randomized trials in a timely fashion.

Design 2: Single-Arm Design

SADs are common in phase II oncology trials where the primary objective is to obtain a preliminary estimate of the treatment efficacy of the experimental therapy in order to inform its future therapeutic development pathway. These studies are typically designed to test the superiority of a new treatment compared with a historical benchmark based on standard treatment. In this design, the historical benchmark for the standard therapy is treated as a fixed constant. No actual patient-level data from historical studies are used in the design or the analysis of an SAD. Because this design ignores the statistical uncertainty around the historical benchmark estimate, it requires a smaller sample size compared to its RCD counterpart. To contrast its design efficiency with an RCD, suppose that the response rate based on historical studies of the standard therapy is believed to be 20%, an SAD will require only 24 patients to have 80% power against the null response rate of 20% in favor of a 40% response rate using a 1-sided 10% type I error—less than a quarter of the sample size of an RCD.

Despite its advantage in sample size requirement, major challenges with single-arm trials arise from the risk of biases due to differences in patients between the current trial and historical studies. When substantial differences exist, a comparison of treatment efficacy from the trial to the historical benchmark can lead to an inaccurate evaluation of the ET. Without randomization, it is rarely possible to achieve satisfactory comparability between trial and historical patients, even with the most meticulous specifications of inclusion criteria to match the historical patients. Table 2 provides several types of biases in clinical trials; an explanation of each type of bias is given together with descriptive examples in neuro-oncology.^17,18

Table 2.

Potential biases in clinical trials

	Explanation	Example in neuro-oncology
Selection bias	Trial patients are not representative of the targeted patient population of interest	The trial may be “enriched” to select patients for trial eligibility using some molecular biomarker. If the enrichment biomarker confers a prognostic value, the trial patients may not be representative of the overall patient population of interest (eg, a GBM trial may be enriched for patients with un-methylation MGMT status).¹⁷
Time-trend bias	Prognosis of patients changes over time	In GBM, patient survival may have improved over time as clinicians become more inclined to re-treat tumors with radiation therapy, perform multiple surgeries, or offer better supportive care.¹⁸
Performance bias	Systematic differences in the care provided, differential follow-up schedules, differences in how endpoints are assessed, etc.	Patients in disease registries may be followed less frequently for disease status than clinical trial patients. Modern GBM trials may use more advanced imaging techniques to assess disease progression than historical studies.
Confounding bias	Systematic differences in patients’ characteristics that may impact prognosis	The external control GBM patients may on average have worse performance status than patients enrolled in GBM trials.

Open in a new tab

Design 3: Externally Controlled Single-Arm Design

The Externally Controlled Single-Arm Design (EC-SAD) refers to a trial design that consists of a contemporaneous single-arm study, combined with an external cohort of patients who were treated with the standard therapy (the EC arm) for whom patient-level data are available. Patients enrolled on the single-arm study will receive the ET. The EC cohort is used as a comparator to evaluate the efficacy of the ET. As such, this design relies on a set of well-curated pretreatment covariates and clinical outcomes being available from the EC. In contrast to SAD where the evaluation of the experimental therapy is based on a single historical benchmark, the data analysis and treatment effect estimate in an EC-SAD include patient-level data from the EC cohort. The treatment effect estimate may be obtained using statistical adjustment methods to account for differences in pretreatment patient characteristics between the trial patients and the EC cohort that may influence clinical outcomes (i.e. potential confounders). Such methods include stratification, multiple regression, Cox proportional hazards regression (for time-to-event endpoints), and propensity score matching, among others.^19–26 However, when patients in the 2 study arms differ in some unknown or unmeasured factors, the treatment effect estimate may still be distorted by such unrecognized biases that cannot be corrected with statistical adjustments. Furthermore, if the size of the EC cohort is substantially larger than that of the experimental cohort, the former may “swamp” the latter leading to a highly precise but biased estimate of the treatment effect. One potential solution may be to limit the size of the EC cohort (eg, by basing the sample size on what would have been required from a randomized study) but again this does not address the issue of potential biases.

Design 4: Externally Augmented Design

The Externally Augmented Design (EAD) refers to a trial design that consists of an RCD component, which concurrently randomizes patients to the ET arm or the RC arm, coupled with an EC cohort comprising patients previously treated with the standard therapy and for whom patient-level data are again available. In the first stage, patients are randomized to ET or RC with some fixed randomization ratio (eg, 1:1). An interim analysis is conducted to assess the comparability of outcomes between RC and EC. If there is evidence of differences (eg, clinical outcome of interest differs between RC and EC after adjusting for observed confounders), the EC cohort is abandoned and the second stage of the trial continues with the same randomization ratio between ET and RC. An estimate of the treatment effect is then based on a direct comparison of ET arm and RC arm. On the other hand, if no notable difference is identified between the control groups at the interim analysis, a different randomization ratio is used in the second stage of the study to favor the ET arm (eg, 2:1 randomization). At the end of the second stage, the efficacy of the ET arm is compared against a pooled control arm that combines the RC and EC cohorts. Of course, if an EAD design is to be used, details should be specified in advance in the study protocol: when the interim analysis should occur and sample sizes, source of the ECs (previously completed clinical trials, EHRs, disease registries, etc.), which outcome(s) are to be compared, statistical methods for performing the comparison, the alpha level for testing whether the EC and RC groups are comparable, etc.

Distinct from the EC-SAD, which utilizes the EC cohort in its entirety when estimating the treatment effect, the interim analysis in an EAD offers an early decision point for retaining or foregoing the EC cohort. The potential to increase the randomization probability to the experimental therapy is an attractive feature of the EAD in that fewer patients may be assigned to the standard therapy at the second stage of the trial. Despite its practical appeal, the EAD suffers the same design limitations as the EC-SAD in that differences in pretreatment characteristics between RC and EC can introduce biases into the treatment comparison, thereby undermining the validity of the eventual treatment effect estimate. Similarly, if the EC cohort is much larger than the RC cohort, the RC may be “swamped” by EC leading to essentially a nonrandomized EC-SAD.

Another example of a hybrid design is the platform trial.²⁷ In platform trials, multiple experimental arms are compared against a single, constant control group. ETs may be dropped if the data indicate a lack of efficacy, and new ETs added. At any given point in time, patients are randomized to the control or existing set of experimental arm(s). In this case, the design is equivalent to an RCD if the comparison between experimental and control groups is limited to those control patients that were simultaneously randomized when the given experimental arm was in play. If, however, all control patients are used, it is like an EAD, where some controls are contemporaneously randomized (RC) and some are “external” (EC). Methods for dealing with temporal drift in the controls over time are discussed in Saville et al.²⁸

The Fallacies of Using External Control Data

In this section, we describe several fallacies that may lead to inappropriate adoption of EC data in a clinical trial. We first provide the statement, followed by an explanation as to why the statement is problematic.

Fallacy 1: The Large Sample Size Fallacy

Statement: “If I have a large external control cohort, I could effectively replace or supplement the randomized control arm in a randomized trial.”

Explanation: A sufficiently large sample size is important as it increases the statistical power of the trial (ie, the likelihood that a trial will detect the targeted treatment effect when it indeed exists) and the precision of the treatment effect estimate. However, if the EC data are not free of biases, incorporation of EC has the potential to skew the treatment effect estimate, leading to an inaccurate assessment of the treatment efficacy. The likelihood of making an incorrect claim regarding treatment efficacy actually increases with the size of the EC cohort when systematic biases are present. On the other hand, a large external dataset does provide more opportunity for improved covariate adjustment by allowing for the incorporation of a greater number of predictors in regression modeling, better propensity score or other types of matching, more ability to overcome or deal with missing data, and so forth.

Fallacy 2: The Statistics Magic Fallacy

Statement: “If I deploy state-of-the-art statistical techniques, I can adjust away the biases.”

Explanation: Differences between the EC and RC can arise from the imbalances in the baseline characteristics of patients, thereby leading to disparities in clinical outcomes. To some extent, similarities in baseline characteristics can be achieved by careful selection of EC data through alignment of the trial eligibility criteria to match those of the RC. However, imbalances may still persist which may be further mitigated by employing statistical adjustment techniques. Many methods developed for observational studies can be readily applied to this context. For example, as mentioned above, the propensity score matching technique matches patients in the EC and RC cohorts with respect to the probability that a patient belongs to a specific cohort, based on a set of measured confounders known about the patient (ie, the propensity score). Other popular methods to reduce the imbalances in measured prognostic factors include exact matching, propensity score weighting, outcome regression model adjusting for baseline variables, etc.^19–26 It is important to note that no statistical adjustment method can guarantee that all confounders are accounted for. For example, suppose that the trial patients have a higher prevalence of a molecular biomarker that confers a better disease prognosis. If this biomarker was not collected in the EC cohort, then it would not be possible to account for the difference in patient prognosis due to this biomarker.

The concern for differences in the distribution of clinical outcomes has led to the recent development of statistical techniques intended to lessen these differences. Generally, these data-dependent approaches entail statistical modeling to determine the degree to which EC data are utilized based on the degree of outcome disparity between RC and EC.^5,28–31 Some approaches propose to give less weight to EC if its (covariate-adjusted) outcome appears to be inconsistent with that of the RC. This is known as “dynamic borrowing.” Of note, the interim analysis in the EAD is a special case of dynamic borrowing in that an interim decision is made to determine if any data in the EC cohort should be carried forward to the second stage or not (ie, the weight is either 0 or 1). More specifically, if the outcome difference exceeds some prespecified statistical criterion, the EC cohort is abandoned completely at the interim (ie, weight = 0) and the trial proceeds to the second stage with RC only; otherwise, the entire EC cohort is combined with the RC cohort (ie, weight = 1) to form the “new” control arm, which will be compared with the ET arm for the final estimation of treatment efficacy. This specific form of dynamic borrowing is referred to as the “test-then-pool” method in the literature.⁵ Other dynamic borrowing methods include the power priors, commensurate priors, and the meta-analytic predictive priors.^29–32 No 1 single adjustment method is universally preferred over other methods. Of critical importance is the recognition that, in the absence of randomization, no statistical adjustment method can ensure that treatment assignment is independent of confounders. Sometimes, the validity of various techniques depends on certain assumptions that cannot be fully verified from the data. A good strategy is to employ multiple statistical approaches for covariate adjustment and see if there is consistency in the treatment effect estimate.

Fallacy 3: The External Control for Confirmatory Evidence Fallacy

Statement: “If I have a high-quality external control cohort, I can do away with the need for a randomized control arm in a confirmatory RCT.”

Explanation: In phase III confirmatory studies, strict type I error rate control is of prime importance, as the conduct of such trials aims to validate a prior finding of a potentially successful new therapy. Often these new findings garner much attention but prove difficult to replicate with respect to the magnitude of benefit. An increase in type I error represents a heightened chance of a trial erroneously declaring a treatment effect when none exists. Unfortunately, no statistical adjustment method mentioned above guarantees a strict type I error control. While trial decision rules may be fine-tuned to keep the type I error under some desired level at the study design stage (eg, via statistical simulations), it is not possible to perform exhaustive simulations to account for all true unknown trial scenarios. Further, guarding type I error does not necessarily protect against biases in treatment effect estimate. For these reasons, the inclusion of ECs is not likely accepted as the primary efficacy analysis in phase III confirmatory trials at the current time. It may, however, serve as a useful complementary analysis to supplement an analysis based on only the RC arm in certain circumstances. For example, Liau et al. recently published a phase III, externally controlled nonrandomized study comparing overall survival (OS) in patients with newly diagnosed glioblastoma (nGBM) and recurrent glioblastoma (rGBM) treated with autologous tumor lysate-loaded dendritic cell vaccine (DCVax-L) plus standard of care (SOC) versus externally matched control patient cohorts treated with SOC.¹⁰ The original phase III trial focused on newly diagnosed GBM and was designed as a randomized placebo-controlled trial with a 2:1 randomization ratio favoring the DCVax-L plus SOC arm. The trial was activated in 2007 but the accrual was terminated early due to financial reasons. The trial’s primary endpoint is progression-free survival (PFS), with OS being a secondary endpoint. Patients on the SOC arm were allowed to cross over to receive DCVax-L upon disease progression. The efficacy data of the original trial appeared in the Journal of Translational Medicine.³³ Curiously, this primary report only published data on the secondary endpoint (OS) and the comparison of OS was not made by treatment groups. Further, the primary endpoint of PFS has not yet been evaluated or reported to date. The motivation to preview immature OS data, a practice not consistent with the reporting standards for phase III trials, was entirely unclear.^34,35 Therefore, while the subsequent externally controlled study claimed that “adding DCVax-L to SOC resulted in clinically meaningful and statistically significant extension of survival for patients,” we believe that the results from this unplanned ad hoc analysis (based on a secondary endpoint of the original trial) using externally controlled data should be viewed as exploratory at best. In contrast, Medicenna recently announced its intent to conduct a prospective phase III hybrid registration trial that incorporates the use of matched EC data to definitively evaluate the efficacy of MDNA55, an interleukin-4 (IL-4)-guided toxin targeting recurrent glioblastoma. Per a press release, this decision was made after consultation with the FDA.³⁶ The proposed trial will include a concurrent 3:1 randomized portion in favor of MDNA55 and an additional matched control arm. The FDA also expressed a willingness to consider an interim analysis (presumably, to assess the comparability between the concurrent control and EC arms). At face value, the proposed study design resembles the EAD described above. This trial will represent the first in-kind in GBM that incorporates the use of EC data for registration purpose. With rigorous study design, execution, analysis, and a healthy dose of caution toward data interpretation, this approach may have the potential to generate informative data in a timely manner.

Use of External Control Data in GBM Trials: A Case Study

Two Radiation Therapy Oncology Group Trials in GBM

For newly diagnosed GBM, the clinical trial landscape has seen little progress since the establishment of the SOC, which consists of concurrent radiation therapy with temozolomide (TMZ) followed by adjuvant TMZ.³⁷ Many subsequent resource-intensive phase III trials were conducted with little success in identifying new effective therapies. Given the urgent need for better therapies to treat these patients, it has been argued that data collected from patients treated with the SOC from past GBM trials can provide high-quality EC data to supplement future GBM trials, thereby reducing the number of patients treated with the SOC and increasing efficiency of drug development. This concept is particularly appealing to patients with life-threatening illnesses such as GBM who may be seeking clinical trials where their chance of receiving investigational treatments is high.

Here we use 2 randomized phase III trials conducted by the Radiation Therapy Oncology Group (now part of NRG Oncology) to illustrate the use of EC data in GBM. RTOG 0525 was a randomized phase III trial to determine if dose-dense adjuvant TMZ treatment will prolong patient survival compared with standard adjuvant TMZ treatment in patients with nGBM.³⁸ RTOG 0825 was a randomized phase III trial to study whether the addition of adjuvant bevacizumab (BEV) would improve survival compared with the SOC in nGBM patients.³⁹ Since RTOG 0525 and RTOG 0825 were conducted by the same US cooperative group in the modern era and patients in the control arm of both trials received the same standard (known as “Stupp”) regimen with similar clinical follow-up, we set out to determine if the control arm in RTOG 0525 may have served as an adequate EC cohort for the subsequent RTOG 0825 trial.

An Externally Augmented Design

The original RTOG 0825 trial randomized 621 nGBM patients between the BEV arm and the standard TMZ arm in a 1:1 randomization ratio. OS was the primary efficacy endpoint of the trial. Here we consider an EAD using the control arm of RTOG 0525 as the EC cohort. The schema corresponds to Figure 1D. A total of 411 patients were treated in the control arm (TMZ) of RTOG 0525 (N_EC = 411). The EAD consists of 2 stages. Suppose at the first stage, half of the sample size from the original design of RTOG 0825 is to be randomized equally between BEV and TMZ (ie, N₁ = 310, or 155 to each arm). At the end of the first stage, a prespecified interim comparability analysis entails the comparison of survival outcome between the RTOG 0825 TMZ arm and the external TMZ cohort from RTOG 0525. If the test for survival equivalence between the 2 TMZ cohorts is rejected at a 2-sided 30% alpha level using a log-rank test (indicating incomparable survival outcomes), the RTOG 0525 TMZ cohort is abandoned and the trial proceeds to the second stage, which continues to randomize the remaining sample size (N₂ = 311) between the BEV arm and TMZ arm in a 1:1 ratio. Of note, in the context of the interim comparability analysis, type I error represents the probability that an EC cohort that is equivalent to the RC is erroneously abandoned. In this context, a more liberal type I error is reasonable to ensure that the test has sufficient power to reject an incomparable external cohort (ie, to minimize the type II error). If the comparability test is rejected, the estimate of the treatment effect is based on a direct survival comparison of the BEV arm and TMZ arm. Otherwise, if the test for survival equivalence is not rejected, the randomization ratio is adapted to 2:1 favoring the BEV arm. In this case, at the end of the second stage, the treatment effect estimate is based on comparing the BEV arm with a hybrid TMZ cohort that combines the concurrently randomized TMZ arm with the external TMZ cohort from RTOG 0525. Note that due to the concerns of prognosis drift between the EC arm and the “late enrollees” (eg, stage 2 patients), one could additionally build in a repeated test of comparability between EC and the entire RC cohort (stage 1 + stage 2) at the end of stage 2 so that the decision to include EC may still be reversed if it’s outcome was not comparable with that of patients in the concurrent control arm.

Qualitative and Quantitative Evaluations

Although the interim analysis in an EAD offers an opportunity to evaluate the similarity in clinical outcome between the EC and RC, differences may persist due to imbalances in baseline patient/disease characteristics. As such, at the design stage of the study, it is imperative to identify any possible source of differences that may lead to outcome differences and to mitigate these differences to the extent possible. Similarities in baseline characteristics can in part be achieved by close alignments of trial eligibility criteria. Careful examination of distributions of baseline variables can shed further insights into possible dissimilarities between the control cohorts, thereby facilitating appropriate statistical adjustments of important prognostic factors. Pocock proposed a set of conditions to be met for an acceptable historical control.¹ More recently, Hatswell et al. extended Pocock’s criteria to a framework for comparing differences based on both the study design and observed data.⁴⁰ While Hatswell et al.’s tool has some overlap with Pocock’s criteria, their tool additionally requires the presentation of the study results; specifically, the domains of evaluation include (1) the study and patient characteristics, (2) disease process and intervention effects, (3) outcome measurements, and (4) patient selection to the study. More specifically, distinct from Pocock’s set of criteria, the Hatswell tool additionally calls for the comparison of data between control cohorts, leading to a less subjective data-driven approach.

In the context of the GBM trials, we use Hatswell’s tool to evaluate the comparability between RTOG 0525 EC and RTOG 0825 RC in the first stage of the EAD described above. The results of the comparison are given in Table 3. Overall, the 2 trials have close resemblances in many aspects of the study design (inclusion/exclusion criteria, intervention administered, definition of efficacy endpoints, assessment methods for efficacy endpoints, frequency of disease follow-up, etc.). We note that the distributions of baseline characteristics that are significantly different between 2 control cohorts include race, KPS, MGMT methylation status, RPA, and extent of surgery. For example, a notably higher percent of patients in the RTOG 0825 TMZ cohort were White (95%) versus 78% in RTOG 0525, and received tumor biopsy (35.5%) compared to 3.4% in RTOG 0525 (Table 3). In terms of outcome comparison, the interim log-rank test for survival equivalence between the TMZ cohorts yields a P-value of .80, which exceeds the prespecified 0.30 significance level (the null hypothesis of survival equivalence is not rejected). Figure 2 gives the Kaplan–Meier curves of survival by TMZ cohort. Based on the interim comparison, the external RTOG 0525 TMZ cohort proceeds to the second stage of the trial and is combined with the RTOG 0825 TMZ cohort to form the hybrid control arm. The remaining sample size is allocated in a 2:1 ratio in favor of BEV (157 patients to BEV vs. 79 patients to TMZ) at the second stage. At the final analysis, the EAD yields an estimated treatment hazard ratio (HR) of 1.12 (95% CI: 0.95–1.32). The estimated HR in the original RTOG 0825 trial was HR = 1.13 (95% CI: 0.93, 1.37)—in close approximation to the EAD. Figure 3 gives the survival Kaplan–Meier curves by treatment arms from the original RTOG 0825 trial (Figure 3A) versus the EAD (Figure 3B).

Table 3.

Comparison of RTOG 0525 and RTOG 0825 control arms using the Hatswell et al. tool

Study title and year	RTOG 0525	RTOG 0825
What dates were patients enrolled?	January 2006–June 2008	April 2009–May 2011
What was the design of the study?	Randomized phase III trial	Randomized phase III trial
What were the location and setting of the study?	North America and Europe (RTOG/EORTC)	North America and Europe (RTOG/NCCTG/ECOG)
What were the patient selection and inclusion/exclusion criteria^*?	WHO grade 4 astrocytoma	WHO grade 4 astrocytoma
	Inclusion criteria:	Inclusion criteria:
	• Age ≥ 18	• Age ≥ 18
	• KPS ≥ 60	• KPS ≥ 70
	• Tumor must have a supratentorial component	• Tumor must have a supratentorial component
	• Stable/decreased steroid doses for 5 days before study registration	• Stable/decreased steroid doses for 5 days before study registration
	• Protocol treatment must begin ≤5 days	• Protocol treatment must begin ≤5 days
	• Adequate bone marrow, renal, and hepatic functions	• Adequate bone marrow, renal, and hepatic functions
	Exclusion criteria:	Exclusion criteria:
	• Prior invasive malignancy (must be disease free for ≥3 years)	• Prior invasive malignancy (must be disease free for ≥3 years)
	• Recurrent/multifocal malignant gliomas	• Recurrent/multifocal malignant gliomas
	• Metastases below the tentorium or beyond the cranial vault	• Metastases below the tentorium or beyond the cranial vault
	• Prior chemotherapy or radiosensitizers for cancers of the head and neck	• Prior chemotherapy or radiosensitizers for cancers of the head and neck
	• Prior radiotherapy to the head or neck	• Prior radiotherapy to the head or neck
	• Severe, active comorbidity	• Severe, active comorbidity
	• Treated on any other therapeutic protocols ≤30 days of study entry	• Treated on any other therapeutic protocols ≤30 days of study entry
What was the intervention used (in the control arm)?	Radiation therapy: 2 Gy × 5 days × 6 weeks	Radiation therapy: 2 Gy × 5 days × 6 weeks
	Temozolomide (concurrent): 75 mg/m² daily	Temozolomide (concurrent): 75 mg/m² daily
	Temozolomide (adjuvant): 75–100 mg/m² days 1–21 (28-day cycle) for a maximum of 12 cycles	Temozolomide (adjuvant): 75–100 mg/m² days 1–21 (28-day cycle) for a maximum of 12 cycles
What endpoints were reported and how were they measured?	Primary: Overall survival (OS), defined as time from randomization until death due to any cause	Co-primary:
	Secondary: Progression-free survival (PFS), defined as time from randomization until death or disease progression (per McDonald criteria) with imaging frequency every 3 months	1. OS, defined as time from randomization until death due to any cause
		2. PFS, defined as time from randomization until death or disease progression (per McDonald criteria) with imaging frequency every 3 months
How many patients were enrolled in the study?	N = 831 (411 in the control SOC arm)	N = 621 (309 in the control SOC arm)
Present a tabulation of patient disease characteristics and background characteristics
	Characteristic	RTOG 05252 SOC Arm	RTOG 0825 SOC Arm—Stage 1 (N = 155)	P-value
	Characteristic	(N = 411)	RTOG 0825 SOC Arm—Stage 1 (N = 155)	P-value
	Age: median (range)	57 (22–85)	57 (19–79)	.34
	Male gender	239 (58%)	94 (61%)	.63
	Race
	White	319 (78%)	148 (95%)	<.05
	Black	6 (1.5%)	5 (3.2%)
	Other//unknown	86 (21%)	2 (1.3%)
	KPS
	60	14 (3.4%)	—	.02
	70	49 (11.9%)	17 (11%)
	80	75 (18.2%)	41 (26.5%)
	90	183 (44.5%)	71 (45.8%)
	100	90 (21.9%)	26 (16.8%)
	Surgery type
	Biopsy	14 (3.4%)	55 (35.5%)	<.05
	Partial resection	167 (40.6%)	94 (60.6%)
	Total resection	90 (21.9%)	6 (3.9%)
	Neurologic function
	0 (no symptom)	140 (34.1%)	51 (32.9%)
	1 (minor symptom)	185 (45%)	69 (44.5%)	.1
	2 (moderate symptom: fully active)	49 (11.9%)	29 (18.7%)
	3 (moderate symptoms: less fully)	35 (8.5%)	6 (3.9%)
	4 (severe neurologic symptoms)	2 (0.5%)	—
	MGMT methylation status
	Methylated	122 (29.7%)	35 (22.6%)
	Unmethylated	254 (61.8%)	117 (75.5%)	<.05
	Unknown	35 (8.5%)	3 (1.9%)
	Recursive partitioning analysis (RPA)
	III	85 (20.7%)	25 (16.1%)	.01
	IV	251 (61.1%)	97 (62.6%)
	V	75 (18.2%)	28 (18.1%)
	Unknown	—	5 (3.2%)
Present a tabulation of study outcomes
	PFS			PFS
	Median	1 year	2 years	Median	1 year	2 years
	(95% CI)	(95% CI)	(95% CI)	(95% CI)	(95% CI)	(95% CI)
	5.49 (4.8, 6.2)	26.5% (22.6%, 31.2%)	12.8% (9.9%, 16.6%)	7.13 (5.4, 8.7)	36.8% (29.9%, 45.4%)	17.4% (12.2%, 24.8%)
	OS			OS
	Median	1 Year	2 Years	Median	1 Year	2 Years
	(95% CI)	(95% CI)	(95% CI)	(95% CI)	(95% CI)	(95% CI)
	16.6 (15, 18)	63.7% (59.2%, 68.6%)	31.2% (26.9%, 36%)	15.6 (14, 19)	65.6% (58.4%, 73.7%)	33.8% (26.9%, 42.5%)

Open in a new tab

^*Selected inclusion/exclusion criteria: see trial protocols for an exhaustive listing.

Figure 2. — Interim comparison of overall survival (RTOG 0525 TMZ cohort vs. Stage 1 RTOG 0825 TMZ cohort).

Figure 3. — Overall survival analysis comparing bevacizumab and temozolomide. (A) Original RTOG 0825 randomized control design. (B) Externally augmented design (using RTOG 0525 TMZ arm as the external control). Treatment HR (BEV vs. TMZ) = 1.13 (0.93, 1.37). In the “Temozolomide” arm, patients received TMZ + RT + placebo during the concurrent phase, followed by TMZ + placebo during the maintenance phase. In the “Bevacicumab” arm, patients received TMZ + RT + BEV during the concurrent phase, followed by TMZ + BEV during Treatment HR (BEV vs. TMZ) = 1.12 (95% CI: 0.95–1.32). In the “Temozolomide” arm, patients received TMZ + RT + placebo during the concurrent phase, followed by TMZ + placebo during maintenance phase. In the “Bevacicumab” arm, patients received TMZ + RT + BEV during the concurrent phase, followed by TMZ + BEV during maintenance phase. Protocol treatment began during week 4 of radiotherapy and was continued for up to 12 cycles of maintenance chemotherapy.

This example illustrates that when an adequate EC dataset exists, it is possible to incorporate it into the design and analysis of a prospective randomized trial while ensuring a treatment effect estimate close to one based on a fully RC. However, the key to successful implementation of this strategy lies critically in the careful selection of a comparable EC, as demonstrated in the example of the NRG GBM trials. The Hatswell et al. tool can serve as a useful framework for a rigorous evaluation of EC data. Note that this case study does not guarantee overall type I error control (alpha = 0.05); this is an area of ongoing methodological research to identify design parameters in designs using EC s to maintain type I error control. On a further note, in the case of PFS, interim comparison of the RTOG 0825 and RTOG 0525 TMZ cohorts does not indicate comparability (log-rank P = .04 < .30). Consequently, the ECs would not be utilized for this endpoint. This would raise the question for some of whether the ECs truly are comparable and therefore should not be used for either endpoint. In this case, one should rule out alternative causes for the lack of comparability, such as differences in how PFS was assessed in the 2 studies. Table 3 indicates that the imaging scan intervals (every 3 months) and the disease assessment criteria (the McDonald criteria) were identical in both studies. Therefore, it is not clear what may have contributed to the discordant PFS results. However, based on existing literature, it is possible that the imaging looked “better” after treatment with vascular endothelial growth factor targeted therapies such as bevacizumab. In fact, this was corroborated by the RTOG 0825 results in that BEV prolonged stability on the MRI scans (PFS) but this benefit did not translate to a survival advantage.

Comparison of RCD and EAD via Statistical Simulations

In this section, we consider designing a hypothetical trial with a survival endpoint. We compare 2 competing designs: (1) a standard RCD and (2) an EAD that incorporates an EC (more details below). We use statistical simulations to demonstrate that while the use of external data in an EAD has the potential to decrease in-trial sample size, it does not guarantee type I error control especially when differences in outcome exist between external and RCs. Statistical simulations are useful for comparing characteristics of different study designs as one can simulate the true clinical scenarios that clinical trials attempt to estimate.

The Randomized Control Design

In this design, patients are concurrently randomized (with equal chance) to either the standard or ET. In all simulations, survival times are generated under the exponential distribution. Under the null hypothesis, the median survival is assumed to be 14.5 months in both treatment arms. Under the alternative hypothesis, the median survival in the standard arm and the experimental arm is assumed to be 14.5 and 19 months, respectively. Note that this degree of improvement translates to a treatment HR of 0.76. Assuming a monthly accrual rate of 30 patients, an RCD that enrolls a total of 720 patients over a period of 24 months with an additional follow-up of 29 months after the last patient accrual will have 90% power to detect the targeted improvement with a 2-sided 5% type I error. At the end of the study, a total of 558 death events are expected.

The Externally Augmented Design

We consider an alternative design that incorporates an EC. The first stage of the trial randomizes half of the sample size of its RCD counterpart (ie, N₁ = 360). Suppose that at the time of designing the EAD, an EC with 200 patients is available (N_EC = 200). These patients were accrued at a constant rate of 20 patients per month, followed by a 12-month additional follow-up until the study was seized. Of note, the number of death events in the EC is considered fixed by design (ie, there is no plan to continue following these patients further to gather more events). The interim comparability analysis between EC and RC entails rejecting the null hypothesis of survival equivalence using a 2-sided 30% level log-rank test. The Stage 1 duration of the EAD (ie, accrual time and follow-up time) is determined based on Equation (3) in Dixon and Simon⁴¹ such that the interim test will have 80% power to detect HR* = 0.76 while maintaining a 2-sided 30% type I error, where HR* denotes the true survival HR between RC and EC (corresponding to median survival of 11 vs. 14.5 months in EC and RC, respectively). This represents a scenario where patient prognosis has improved over time such that patients in the RC cohort confer longer survival than those in the EC cohort. If the test for survival equivalence is rejected, the EC is abandoned and the trial proceeds to the second stage which randomizes the remaining sample size (N₂ = 360) in a 1:1 ratio; otherwise, the randomization ratio is adapted to 2:1 in favor of the experimental arm. In this design, the final efficacy analysis occurs when a total of 558 death events (across all study arms) have been observed. The final analysis is based on either ET versus RC or ET versus (RC + EC) depending on the results of the interim test.

Simulation Results

For the standard RCD, we simulated 1000 trials under the null scenario where patients in the ET arm and RC arm have the same survival distributions (HR = 1). As expected, the RCD erroneously rejects the null of equivalence in about 5% of the simulated trials (Table 4). For the EAD, we simulated 1000 trials under each of the following 2 scenarios: (1) patients in the experimental arm and RC arm have the same survival distributions and the EC arm is comparable with the RC arm (HR = HR* = 1), and (2) patients in the experimental arm and RC arm have the same survival distributions, but the EC arm is not comparable with the RC arm (HR = 1 and HR* = 0.76). Table 4 shows that under scenario (1), the interim comparability rule erroneously abandons the external cohort 30% of the time, as stipulated by design. On average, incorporation of the EC is associated with ~19% (= [720 – 580]/720 × 100%) savings in-trial sample size compared to the RCD. However, the overall type I error (ie, the trial declares the superiority of the experimental therapy erroneously) is 5.7%, modestly greater than the desired 5% level. It is not clear why this inflation of the type I error occurs; it may be that the 2-step procedure introduces an additional element of variability that is not accounted for in the analysis. In any case, one might be willing to accept the modest increase in type I error in exchange for the reduced sample size. Under scenario (2) when the EC arm is incomparable with the RC arm, the interim comparability rule correctly leads to the abandonment of the external cohort about 78% of the time. On average, incorporation of EC leads to only about 6% (= [720 – 676]/720 × 100%) savings in trial sample size compared to the RCD, and at the cost of an even greater type I error (8%).

Table 4.

Comparison of RCD and EAD via statistical simulations

	Average in-trial sample size	% of trials where external control is abandoned at stage I	Type I error (under H₀: HR [ET vs. RC] = 1)
Randomized control design (RCD)	720	N/A	5%
Externally augmented design (EAD)
i.EC = RC (HR* = 1)	580	30%	5.7%
ii.EC ǂ RC (HR* = 0.76)	676	78%	8%

Open in a new tab

EC, external control arm; ET, experimental arm; RC, randomized control arm. HR = true survival hazard ratio between ET and RC (HR = 1 signifies equivalence). HR* = true survival hazard ratio between EC and RC (HR* = 1 signifies equivalence).

Other Practical and Statistical Considerations

In the era of precision medicine, modern clinical trials frequently incorporate baseline biomarkers into the design and/or analysis of the study. Non-contemporaneous studies may not collect information on pertinent molecular biomarkers or may have a large percentage of patients with missing biomarker data. If biomarkers have prognostic implications, missing data in the EC will hinder one’s ability to adjust for the prognostic effect of biomarkers. Even if the biomarker measurements are available from the EC, assay techniques used to determine biomarker values may have evolved over time, casting doubt on the comparability of biomarker data between the EC and current trial.

In designing a prospective clinical trial, it is customary to plan for the collection of additional data to address other relevant scientific questions. These ancillary questions are typically framed as secondary or exploratory objectives of the trial. In brain tumor trials, for example, common secondary objectives may include studying the toxicity profile of the ET, evaluating patient-reported quality of life, and patient’s neurocognitive functions. The data collection forms are designed to adequately capture data to address these questions. If the source of external data does not stem from a study with similar scientific objectives, it is unlikely that data necessary to address these additional ancillary questions would be available. Therefore, the scientific questions that could be answered with externally controlled studies are limited by the type and quality of data collected from ECs. Of course, EC controls cannot be incorporated at all if no data are available. The recent requirements for data sharing should lead to the availability of more well-curated datasets containing patient-level data that could be searched for as sources for ECs.

In an EAD, accrual should be suspended at the time of interim comparability analysis in order to fully capitalize on the design feature of an imbalanced randomization in favor of the experimental therapy (ie, when the result of the interim comparability analysis does not indicate a significant outcome difference between EC and RC). Undoubtedly, interruption of accrual adds a logistic burden to the trial. Hence, the choice to use EC should be additionally weighed against such practical challenges. Relatedly, since the adaptation of the randomization ratio at the second stage of EAD depends on the availability of the interim comparability analysis, the choice of efficacy endpoint should be one that can be rapidly ascertained to make the trial feasible. Another consideration with the EAD is whether the comparison of external and RCs should be repeated at the end of the study when more data are available. We would recommend that this take place only if the interim analysis indicated that the ECs could be utilized. It would then provide further evidence that the 2 control groups are comparable, or lead to a reversal of the decision to incorporate the ECs.

Concluding Remarks

In this article, we discuss design and practical challenges associated with using EC data in clinical trials. We present an example in neuro-oncology where this approach may have the potential to reduce in-trial sample size through the adaptation of randomization ratio without compromising the validity of trial conclusion. To date, the field of neuro-oncology has seen a paucity of examples utilizing this approach. While it is our collective feeling that this approach may hold promise in generating robust and informative data in certain settings, this sense of optimism should be tampered with a healthy dose of skepticism due to a myriad of design and analysis challenges articulated in this review. At the same time, we eagerly await a report from the upcoming Medicenna registration trial in recurrent GBM. At the current time, these methods are perhaps better suited for early-phase trials where the objective is to determine, in a preliminary fashion, if an ET is of sufficient promise for further study, or for rare disease conditions where concurrent randomization may be difficult or infeasible. In general, the decision to include external data is highly context dependent. Most importantly, careful planning is key to successful implementation of these approaches. This work represents a critical appraisal of this renewed concept that has drawn much attention in the neuro-oncology clinical trial community.

Contributor Information

Mei-Yin C Polley, Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA; NRG Oncology Statistics and Data Management Center, Philadelphia, Pennsylvania, USA.

Daniel Schwartz, Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts, USA.

Theodore Karrison, Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA; NRG Oncology Statistics and Data Management Center, Philadelphia, Pennsylvania, USA.

James J Dignam, Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA; NRG Oncology Statistics and Data Management Center, Philadelphia, Pennsylvania, USA.

Conflict of interest statement

M.Y.P. declared a consulting role with NeuroTrials, LLC.

Funding

M.Y.P., T.K., and J.J.D. were supported by National Cancer Institute grants to NRG Oncology Statistics and Data Management Center (U10 CA180822). D.S. was supported by NIH training grant (T32 CA009337).

References

1. Pocock SJ. The combination of randomized and historical controls in clinical trials. J Chronic Dis. 1976;29(3):175–188. [DOI] [PubMed] [Google Scholar]
2. Ventz S, Comment L, Louv B, et al. The use of external control data for predictions and futility interim analyses in clinical trials. Neuro Oncol. 2022;24(2):247–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Rahman R, Ventz S, McDunn J, et al. Leveraging external data in the design and analysis of clinical trials in neuro-oncology. Lancet Oncol. 2021;22(10):e456–e465. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Freidlin B, Korn EL.. Augmenting randomized clinical trial data with historical control data: precision medicine applications. J Natl Cancer Inst. 2023;115(1):14–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Viele K, Berry S, Neuenschwander B, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat. 2014;13(1):41–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Schoenfeld DA, Finkelstein DM, Macklin E, et al. ; Pooled Resource Open-Access ALS Clinical Trials Consortium. Pooled resource open-access ALS clinical trials consortium. Design and analysis of a clinical trial using previous trials as historical control. Clin Trials. 2019;16(5):531–538. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. van Rosmalen J, Dejardin D, van Norden Y, Löwenberg B, Lesaffre E.. Including historical data in the analysis of clinical trials: is it worth the effort? Stat Methods Med Res. 2018;27(10):3167–3182. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. The Friends of Cancer Research Working Group. Characterizing the Use of External Controls for Augmenting Randomized Control Arms and Confirming Benefit. White paper. Friends of Cancer Research, 2019. https://friendsofcancerresearch.org/wp-content/uploads/Panel-1_External_Control_Arms2019AM_2.pdf [Google Scholar]
9. FDA. Guidance for Industry: Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products. 2023. https://www.fda.gov/media/164960/download
10. Liau LM, Ashkan K, Brem S, et al. Association of autologous tumor lysate-loaded dendritic cell vaccination with extension of survival among patients with newly diagnosed and recurrent glioblastoma: a phase 3 prospective externally controlled cohort trial. JAMA Oncol. 2023;9(1):112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Grossman SA, Schreck KC, Ballman K, Alexander B.. Point/counterpoint: randomized versus single-arm phase II clinical trials for patients with newly diagnosed glioblastoma. Neuro Oncol. 2017;19(4):469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Vanderbeek AM, Ventz S, Rahman R, et al. To randomize, or not to randomize, that is the question: using data from prior clinical trials to guide future designs. Neuro Oncol. 2019;21(10):1239–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Vanderbeek AM, Rahman R, Fell G, et al. The clinical trials landscape for glioblastoma: is it adequate to develop new treatments? Neuro Oncol. 2018;20(8):1034–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Saraf A, Trippa L, Rahman R.. Novel clinical trial designs in neuro-oncology. Neurotherapeutics. 2022;19(6):1844–1854. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Ventz S, Khozin S, Louv B, et al. The design and evaluation of hybrid controlled trials that leverage external data and randomization. Nat Commun. 2022;13(1):5783. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Ventz S, Lai A, Cloughesy TF, et al. Design and evaluation of an external control arm using prior clinical trials and real-world data. Clin Cancer Res. 2019;25(16):4993–5001. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Porter AB, Wen PY, Polley MC.. Molecular profiling in neuro-oncology: where we are, where we’re heading, and how we ensure everyone can come along. Am Soc Clin Oncol Educ Book. 2023;43:e389322. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Grossman SA, Ye X, Piantadosi S, et al. ; NABTT CNS Consortium. Survival of patients with newly diagnosed glioblastoma treated with radiation and temozolomide in research studies in the United States. Clin Cancer Res. 2010;16(8):2443–2449. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Cox DR. Regression models and life-tables. J R Statist So B. 1972;34(2):187–202. [Google Scholar]
20. Lunceford JK, Davidian M.. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23(19):2937–2960. [DOI] [PubMed] [Google Scholar]
21. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Austin PC. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol. 2008;61(6):537–545. [DOI] [PubMed] [Google Scholar]
23. Austin PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med. 2010;29(20): 2137–2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013;32(16):2837–2849. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Austin PC, Stuart EA.. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Austin PC, Schuster T.. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res. 2016;25(5):2214–2237. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. The Adaptive Platform Trials Coalition. Adaptive platform trials coalition. Adaptive platform trials: definition, design, conduct and reporting considerations. Nat Rev Drug Discov. 2019;18(10):797–807. [DOI] [PubMed] [Google Scholar]
28. Saville BR, Berry DA, Berry NS, Viele K, Berry SM. The Bayesian Time Machine: accounting for temporal drift in multi-arm platform trials. Clin Trials 2022;19(5):490–501. [DOI] [PubMed] [Google Scholar]
29. Hobbs BP, Sargent DJ, Carlin BP.. Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Anal. 2012;7(3):639–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Hobbs BP, Carlin BP, Mandrekar SJ, Sargent DJ.. Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials. Biometrics. 2011;67(3):1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Ibrahim JG, Chen M-H.. Power prior distributions for regression models. Stat Sci. 2000;15(1):46–60. [Google Scholar]
32. Schmidli H, Gsteiger S, Roychoudhury S, et al. Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics. 2014;70(4):1023–1032. [DOI] [PubMed] [Google Scholar]
33. Liau LM, Ashkan K, Tran DD, et al. First results on survival from a large Phase 3 clinical trial of an autologous dendritic cell vaccine in newly diagnosed glioblastoma. J Transl Med. 2018;16(1):142. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Wick W, van den Bent MJ.. First results on the DCVax phase III trial: raising more questions than providing answers. Neuro Oncol. 2018;20(10):1283–1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Rahman R, Ventz S, Trippa L.. External control arms and data analysis methods in nonrandomized trial of patients with glioblastoma. JAMA Oncol. 2023;9(7):1006–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Medicenna Therapeutics Corp. Medicenna (news release): https://ir.medicenna.com/news-releases/news-release-details/medicenna-provides-mdna55-rgbm-clinical-program-update-following/
37. Stupp R, Mason WP, van den Bent MJ, et al. ; European Organisation for Research and Treatment of Cancer Brain Tumor and Radiotherapy Groups. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med. 2005;352(10):987–996. [DOI] [PubMed] [Google Scholar]
38. Gilbert MR, Wang M, Aldape KD, et al. Dose-dense temozolomide for newly diagnosed glioblastoma: a randomized phase III clinical trial. J Clin Oncol. 2013;31(32):4085–4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Gilbert MR, Dignam JJ, Armstrong TS, et al. A randomized trial of bevacizumab for newly diagnosed glioblastoma. N Engl J Med. 2014;370(8):699–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Hatswell A, Freemantle N, Baio G, Lesaffre E, van Rosmalen J.. Summarising salient information on historical controls: a structured assessment of validity and comparability across studies. Clin Trials. 2020;17(6):607–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Dixon DO, Simon R.. Sample size considerations for studies comparing survival curves using historical controls. J Clin Epidemiol. 1988;41(12):1209–1213. [DOI] [PubMed] [Google Scholar]

[CIT0001] 1. Pocock SJ. The combination of randomized and historical controls in clinical trials. J Chronic Dis. 1976;29(3):175–188. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2. Ventz S, Comment L, Louv B, et al. The use of external control data for predictions and futility interim analyses in clinical trials. Neuro Oncol. 2022;24(2):247–256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] 3. Rahman R, Ventz S, McDunn J, et al. Leveraging external data in the design and analysis of clinical trials in neuro-oncology. Lancet Oncol. 2021;22(10):e456–e465. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0004] 4. Freidlin B, Korn EL.. Augmenting randomized clinical trial data with historical control data: precision medicine applications. J Natl Cancer Inst. 2023;115(1):14–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0005] 5. Viele K, Berry S, Neuenschwander B, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat. 2014;13(1):41–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0006] 6. Schoenfeld DA, Finkelstein DM, Macklin E, et al. ; Pooled Resource Open-Access ALS Clinical Trials Consortium. Pooled resource open-access ALS clinical trials consortium. Design and analysis of a clinical trial using previous trials as historical control. Clin Trials. 2019;16(5):531–538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] 7. van Rosmalen J, Dejardin D, van Norden Y, Löwenberg B, Lesaffre E.. Including historical data in the analysis of clinical trials: is it worth the effort? Stat Methods Med Res. 2018;27(10):3167–3182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0008] 8. The Friends of Cancer Research Working Group. Characterizing the Use of External Controls for Augmenting Randomized Control Arms and Confirming Benefit. White paper. Friends of Cancer Research, 2019. https://friendsofcancerresearch.org/wp-content/uploads/Panel-1_External_Control_Arms2019AM_2.pdf [Google Scholar]

[CIT0009] 9. FDA. Guidance for Industry: Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products. 2023. https://www.fda.gov/media/164960/download

[CIT0010] 10. Liau LM, Ashkan K, Brem S, et al. Association of autologous tumor lysate-loaded dendritic cell vaccination with extension of survival among patients with newly diagnosed and recurrent glioblastoma: a phase 3 prospective externally controlled cohort trial. JAMA Oncol. 2023;9(1):112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0011] 11. Grossman SA, Schreck KC, Ballman K, Alexander B.. Point/counterpoint: randomized versus single-arm phase II clinical trials for patients with newly diagnosed glioblastoma. Neuro Oncol. 2017;19(4):469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0012] 12. Vanderbeek AM, Ventz S, Rahman R, et al. To randomize, or not to randomize, that is the question: using data from prior clinical trials to guide future designs. Neuro Oncol. 2019;21(10):1239–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0013] 13. Vanderbeek AM, Rahman R, Fell G, et al. The clinical trials landscape for glioblastoma: is it adequate to develop new treatments? Neuro Oncol. 2018;20(8):1034–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0014] 14. Saraf A, Trippa L, Rahman R.. Novel clinical trial designs in neuro-oncology. Neurotherapeutics. 2022;19(6):1844–1854. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0015] 15. Ventz S, Khozin S, Louv B, et al. The design and evaluation of hybrid controlled trials that leverage external data and randomization. Nat Commun. 2022;13(1):5783. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0016] 16. Ventz S, Lai A, Cloughesy TF, et al. Design and evaluation of an external control arm using prior clinical trials and real-world data. Clin Cancer Res. 2019;25(16):4993–5001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] 17. Porter AB, Wen PY, Polley MC.. Molecular profiling in neuro-oncology: where we are, where we’re heading, and how we ensure everyone can come along. Am Soc Clin Oncol Educ Book. 2023;43:e389322. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0018] 18. Grossman SA, Ye X, Piantadosi S, et al. ; NABTT CNS Consortium. Survival of patients with newly diagnosed glioblastoma treated with radiation and temozolomide in research studies in the United States. Clin Cancer Res. 2010;16(8):2443–2449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0019] 19. Cox DR. Regression models and life-tables. J R Statist So B. 1972;34(2):187–202. [Google Scholar]

[CIT0020] 20. Lunceford JK, Davidian M.. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23(19):2937–2960. [DOI] [PubMed] [Google Scholar]

[CIT0021] 21. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0022] 22. Austin PC. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol. 2008;61(6):537–545. [DOI] [PubMed] [Google Scholar]

[CIT0023] 23. Austin PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med. 2010;29(20): 2137–2148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0024] 24. Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013;32(16):2837–2849. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0025] 25. Austin PC, Stuart EA.. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0026] 26. Austin PC, Schuster T.. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res. 2016;25(5):2214–2237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0027] 27. The Adaptive Platform Trials Coalition. Adaptive platform trials coalition. Adaptive platform trials: definition, design, conduct and reporting considerations. Nat Rev Drug Discov. 2019;18(10):797–807. [DOI] [PubMed] [Google Scholar]

[CIT0028] 28. Saville BR, Berry DA, Berry NS, Viele K, Berry SM. The Bayesian Time Machine: accounting for temporal drift in multi-arm platform trials. Clin Trials 2022;19(5):490–501. [DOI] [PubMed] [Google Scholar]

[CIT0029] 29. Hobbs BP, Sargent DJ, Carlin BP.. Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Anal. 2012;7(3):639–674. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0030] 30. Hobbs BP, Carlin BP, Mandrekar SJ, Sargent DJ.. Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials. Biometrics. 2011;67(3):1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0031] 31. Ibrahim JG, Chen M-H.. Power prior distributions for regression models. Stat Sci. 2000;15(1):46–60. [Google Scholar]

[CIT0032] 32. Schmidli H, Gsteiger S, Roychoudhury S, et al. Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics. 2014;70(4):1023–1032. [DOI] [PubMed] [Google Scholar]

[CIT0033] 33. Liau LM, Ashkan K, Tran DD, et al. First results on survival from a large Phase 3 clinical trial of an autologous dendritic cell vaccine in newly diagnosed glioblastoma. J Transl Med. 2018;16(1):142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] 34. Wick W, van den Bent MJ.. First results on the DCVax phase III trial: raising more questions than providing answers. Neuro Oncol. 2018;20(10):1283–1284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0035] 35. Rahman R, Ventz S, Trippa L.. External control arms and data analysis methods in nonrandomized trial of patients with glioblastoma. JAMA Oncol. 2023;9(7):1006–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0036] 36. Medicenna Therapeutics Corp. Medicenna (news release): https://ir.medicenna.com/news-releases/news-release-details/medicenna-provides-mdna55-rgbm-clinical-program-update-following/

[CIT0037] 37. Stupp R, Mason WP, van den Bent MJ, et al. ; European Organisation for Research and Treatment of Cancer Brain Tumor and Radiotherapy Groups. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N Engl J Med. 2005;352(10):987–996. [DOI] [PubMed] [Google Scholar]

[CIT0038] 38. Gilbert MR, Wang M, Aldape KD, et al. Dose-dense temozolomide for newly diagnosed glioblastoma: a randomized phase III clinical trial. J Clin Oncol. 2013;31(32):4085–4091. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0039] 39. Gilbert MR, Dignam JJ, Armstrong TS, et al. A randomized trial of bevacizumab for newly diagnosed glioblastoma. N Engl J Med. 2014;370(8):699–708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0040] 40. Hatswell A, Freemantle N, Baio G, Lesaffre E, van Rosmalen J.. Summarising salient information on historical controls: a structured assessment of validity and comparability across studies. Clin Trials. 2020;17(6):607–616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0041] 41. Dixon DO, Simon R.. Sample size considerations for studies comparing survival curves using historical controls. J Clin Epidemiol. 1988;41(12):1209–1213. [DOI] [PubMed] [Google Scholar]

PERMALINK

Leveraging external control data in the design and analysis of neuro-oncology trials: Pearls and perils

Mei-Yin C Polley

Daniel Schwartz

Theodore Karrison

James J Dignam

Abstract

Background

Methods

Results

Conclusions

Clinical Trial Designs Utilizing External Information

Table 1.

Figure 1.

Design 1: Randomized Controlled Design

Design 2: Single-Arm Design

Table 2.

Design 3: Externally Controlled Single-Arm Design

Design 4: Externally Augmented Design

The Fallacies of Using External Control Data

Fallacy 1: The Large Sample Size Fallacy

Fallacy 2: The Statistics Magic Fallacy

Fallacy 3: The External Control for Confirmatory Evidence Fallacy

Use of External Control Data in GBM Trials: A Case Study

Two Radiation Therapy Oncology Group Trials in GBM

An Externally Augmented Design

Qualitative and Quantitative Evaluations

Table 3.

Figure 2.

Figure 3.

Comparison of RCD and EAD via Statistical Simulations

The Randomized Control Design

The Externally Augmented Design

Simulation Results

Table 4.

Other Practical and Statistical Considerations

Concluding Remarks

Contributor Information

Conflict of interest statement

Funding

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases