Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2024 Dec 17;112(2):asae069. doi: 10.1093/biomet/asae069

Improving randomized controlled trial analysis via data-adaptive borrowing

Chenyin Gao 1, Shu Yang 2,, Mingyang Shan 3, Wenyu Ye 4, Ilya Lipkovich 5, Douglas Faries 6
PMCID: PMC11972012  PMID: 40191435

Summary

In recent years, real-world external controls have grown in popularity as a tool to empower randomized placebo-controlled trials, particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as external controls are not always comparable to the trials, direct borrowing without scrutiny may heavily bias the treatment effect estimator. Our paper proposes a data-adaptive integrative framework capable of preventing unknown biases of the external controls. The adaptive nature is achieved by dynamically sorting out a comparable subset of external controls via bias penalization. Our proposed method can simultaneously achieve (a) the semiparametric efficiency bound when the external controls are comparable and (b) selective borrowing that mitigates the impact of the existence of incomparable external controls. Furthermore, we establish statistical guarantees, including consistency, asymptotic distribution and inference, providing Type-I error control and good power. Extensive simulations and two real-data applications show that the proposed method leads to improved performance over the trial-only estimator across various bias-generating scenarios.

Keywords: Adaptive lasso, Calibration weighting, Dynamic borrowing, Study heterogeneity

1. Introduction

Randomized controlled trials have been considered the gold standard of clinical research to provide confirmatory evidence on the safety and efficacy of treatments. However, randomized placebo-controlled trials are expensive, require lengthy recruitment periods and may not always be ethical, feasible or practical in rare or life-threatening diseases. In response, quality patient-level real-world data from disease registries and electronic health records have become increasingly available and can generate fit-for-purpose real-world evidence to facilitate healthcare and regulatory decision-making (FDA, 2021). Studies using real-world data may have advantages over randomized placebo-controlled trials, including longer observation windows, larger and more heterogeneous patient populations, and reduced burden on investigators and patients (Visvanathan et al., 2017; Colnet et al., 2020). There is interest in novel clinical trial designs that leverage external controls from real-world data to improve the efficiency of randomized placebo-controlled trials while maintaining robust evidence on the safety and efficacy of treatments (Silverman, 2018; FDA, 2019; Ghadessi et al., 2020). The focus of this paper is on hybrid control arm designs using real-world data, where the concurrent control arm is augmented with real-world external controls to form a hybrid comparator group.

The concept of hybrid controls dates back to Pocock (1976), who combined the trial data and historical controls by adjusting for data-source-level differences. Since then, numerous methods for using external controls have been developed. However, regulatory approvals of external control arm designs as confirmatory trials are rare and limited to ultra-rare diseases, pediatric trials or oncology trials (FDA, 2014, 2016; Odogwu et al., 2018). Concerns regarding the validity and comparability of the external controls have limited their use in a broader context. Guidance documents from regulatory agencies, including the recent FDA draft guidance (FDA, 2023), note several potential issues with the external controls, including selection bias, lack of concurrency, differences in the definitions of covariates, treatments or outcomes, and unmeasured confounding (FDA, 2001, 2019, 2023). Without proper scrutiny, each of these concerns may lead to biased treatment effect estimates and misleading conclusions.

Selection bias is a type of data heterogeneity often encountered in nonrandomized studies. In the context of external control augmentation, it arises when the real-world baseline subjects’ characteristics differ from those in the trial data. Multiple methods are available to adjust for selection bias by balancing the baseline covariates’ distributions across the different data sources. For example, matching and subclassification approaches select a subset of comparable external controls to construct the hybrid control arm (Stuart, 2010). Matching on the propensity score or the probability of trial inclusion can balance numerous baseline covariates simultaneously (Rosenbaum & Rubin, 1983). Weighting approaches that reweight external controls using the probability of trial inclusion or other balancing scores have also been proposed, e.g., empirical likelihood (Qin et al., 2015), entropy balancing (Lee et al., 2022b; Wu & Yang, 2022b; Chu et al., 2023), constrained maximum likelihood (Chatterjee et al., 2016; Zhang et al., 2020) and Bayesian power priors (Neuenschwander et al., 2010; van Rosmalen et al., 2018). Furthermore, matching or weighting can be combined with outcome modelling to enhance robustness against model misspecification in addressing selection bias of external controls (Li et al., 2023).

Differences in the outcomes may still exist between the concurrent controls and the external controls after matching or weighting due to differences in study settings, time frame, data quality or the definition of covariates or outcomes (Phelan et al., 2017). Methods were proposed to adaptively select the degree of borrowing or adjust the outcomes for external controls based on observed outcome differences with concurrent controls. Some researchers suggested first testing the heterogeneity in control outcomes before deciding whether to incorporate external subjects into the hybrid control arm (Viele et al., 2014; Li et al., 2023). More dynamic borrowing approaches were also proposed, including matching and bias adjustment (Stuart & Rubin, 2008), power priors (Ibrahim & Chen, 2000; Neuenschwander et al., 2009), Bayesian hierarchical models including meta-analytic predictive priors (Neuenschwander et al., 2010; Schoenfeld et al., 2019) and commensurate priors (Hobbs et al., 2011). While these existing methods seem appealing, simulation studies could not identify a single approach that could perform well across all scenarios where hidden biases exist (Shan et al., 2022). The surveyed Bayesian methods often have inflated Type-I errors, while frequentist methods suffer lower power when hidden biases exist. Nearly all methods performed poorly in the presence of unmeasured confounding and could not simultaneously minimize bias and gain power. Furthermore, many existing methods rely on parametric assumptions that are sensitive to model misspecification and cannot capture complex relationships that are prevalent in practice.

In this paper, we propose an approach to achieve an efficient estimation of treatment effects that is robust to various potential discrepancies that may arise in the external controls. When handling the selection bias of external controls, our proposal is based on calibration weighting (Lee et al., 2022b) so that the covariate distribution of external controls matches with that of the trial subjects. Furthermore, leveraging semiparametric theory, we develop an integrative augmented calibration weighting estimator, motivated by the efficient influence function (Bickel et al., 1998; Tsiatis, 2006), which is semiparametrically efficient and doubly robust against model misspecification. Despite the potential to view the selection bias problem as a generalizability or transportability issue (Lee et al., 2022b), our framework fundamentally diverges from theirs as our context encompasses the outcomes from both the trial data and external controls, while Lee et al. (2022b) solely considered the trial outcomes.

To deal with potential outcome heterogeneity, we develop a selective borrowing framework to determine an optimal subset from the external controls for integration. Specifically, we introduce a bias parameter for each external subject entailing his or her comparability with the concurrent control. To prevent bias in the integrative estimator, the goal is to select the comparable external controls with zero bias and exclude any others with nonzero bias. Thus, this formulation recasts the selective borrowing strategy as a model selection problem, which can be solved by penalized estimation (e.g., the adaptive lasso penalty; Zou, 2006). Subsequent to the selection process, comparable external controls are utilized to construct the integrative estimator. Prior works such as those by Chen et al. (2021), Liu et al. (2021) and Zhai & Han (2022) although able to identify biases, exclude the entire external sample when confronted with incomparability. Moreover, compared to these existing selective borrowing approaches, our method leverages off-the-shelf machine learning models to achieve semiparametric efficiency and does not require stringent parametric assumptions on the distribution of outcomes.

2. Methodology

2.1. Notation, assumptions and objectives

Let Inline graphic represent a randomized placebo-controlled trial and Inline graphic represent an external control source, which contain Inline graphic and Inline graphic subjects, respectively. The total sample size is Inline graphic. An extension to multiple external control groups is discussed in the Supplementary Material. A total of Inline graphic and Inline graphic subjects receive the active treatment and control treatment in Inline graphic, while we assume that all Inline graphic subjects in Inline graphic receive the control. Each observation Inline graphic comprises the outcomes Inline graphic, the treatment assignment Inline graphic and a set of baseline covariates Inline graphic. Similarly, each observation Inline graphic comprises Inline graphic, Inline graphic and Inline graphic. Let Inline graphic represent a data source indicator, which is 1 for all subjects Inline graphic and 0 for all subjects Inline graphic. To sum up, an independent and identically distributed sample Inline graphic is observed, where Inline graphic. Let Inline graphic denote the potential outcomes under treatment Inline graphic (Rubin, 1974). The causal estimand of interest is defined as the average treatment effect among the trial population, Inline graphic, where Inline graphic for Inline graphic. The clinical trials for treatment effect estimation satisfy the following assumption.

Assumption 1

(Consistency, randomization and positivity).

Suppose that

  • i.

    Inline graphic ,

  • ii.

    Inline graphic for Inline graphic and

  • iii.
    the known treatment propensity score satisfies
    graphic file with name M1.gif

    for all Inline graphic such that Inline graphic .

Assumption 1 is standard in the causal inference literature (Rosenbaum & Rubin, 1983; Imbens, 2004) and holds for the well-controlled clinical trials guaranteed by the randomization mechanism. Under Assumption 1, Inline graphic is identifiable with the trial data.

Moreover, the external controls should ideally be comparable with the concurrent controls.

Assumption 2

(External control compatibility).

Suppose that

  • i.

    Inline graphic and

  • ii.

    Inline graphic for all Inline graphic such that Inline graphic .

Assumption 2 states that the conditional mean of Inline graphic is the same for the trial data and external controls. This assumption holds if Inline graphic captures all the outcome predictors that are correlated with Inline graphic. From the guidance in FDA (2023) for drug development in rare diseases, there are five main concerns regarding the use of external controls: (i) selection bias, (ii) unmeasured confounding, (iii) lack of concurrency, (iv) data quality and (v) outcome validity. Assumption 2 does not require the covariate distribution of external controls to be the same as that of the trial data, which is referred to as selection bias in the guidance. Under Assumption 2, borrowing external controls to improve treatment effect estimation is similar to a transportability or covariate shift problem. However, the presence of concerns (ii)–(v) can result in violation of Assumption 2. Our paper has two main objectives: (i) under Assumption 2, similarly to the work of Li et al. (2023), we develop a semiparametrically efficient and robust strategy to borrow external controls to improve estimation while correcting for selection bias (§ 2.2); (ii) considering that Assumption 2 can be potentially violated, we incorporate a selective borrowing procedure that will detect the biases and retain only a subset of comparable external controls for integration (§ 2.3).

2.2. Semiparametric efficient estimation under the ideal situation

From the semiparametric theory (Bickel et al., 1998), we derive efficient and robust estimators for Inline graphic under Assumptions 1 and 2. The derivation reaches the same estimator as Li et al. (2023), and will serve as the base for our selective borrowing strategy. The semiparametric model is attractive as it exploits the observed data without making assumptions about the nuisance parts of the data generation process that are not of substantive interest. We derive the efficient influence function of Inline graphic in Theorem 1 below, which shall serve as the foundational component of our proposed framework.

Theorem 1.

Under Assumptions 1 and 2, the efficient influence function of Inline graphic is  

Theorem 1.

where  

Theorem 1.

Based on Theorem 1, the semiparametric efficiency bound for Inline graphic is Inline graphic. Hence, a principled estimator can be motivated by solving the empirical analogue of Inline graphic for Inline graphic.

Let the estimators of Inline graphic be Inline graphic, and define Inline graphic  Inline graphic. Then, by solving the empirical version of the efficient influence function for Inline graphic, we have

2.2. (1)

We now discuss the estimators for the nuisance functions Inline graphic. To estimate Inline graphic, Inline graphic and Inline graphic, one can follow the standard approach by fitting parametric models based on the trial data.

For estimating weight Inline graphic, a direct approach is to predict Inline graphic, which however is unstable due to inverting probability estimates. To achieve stability of weighting, the key insight is based on the central role of Inline graphic as balancing the covariate distribution between two groups: Inline graphic for any Inline graphic, which is a Inline graphic-dimensional function of Inline graphic. Thus, we estimate Inline graphic by calibrating the covariate balance between the trial data and external controls. In particular, we assign a weight Inline graphic for each subject Inline graphic, then solve the following optimization problem for Inline graphic:

2.2.

subject to (i) Inline graphic (ii) Inline graphic. First, Inline graphic is the entropy of the weights; thus, minimizing this criterion ensures that the calibration weights are not too far from uniform, so it minimizes the variability due to heterogeneous weights. Constraint (i) is a standard condition for the weights. Constraint (ii) forces the empirical moments of the covariates to be the same after calibration, leading to better-matched distributions of the trial data and external controls.

The optimization problem can be solved using constrained convex optimization. The estimated calibration weight is Inline graphic, and Inline graphic solves Inline graphic which is the Lagrangian dual problem to the optimization problem. The dual problem also entails that the calibration weighting approach makes a log regression model for Inline graphic. We refer to Inline graphic with calibration weights as the augmented calibration weighting estimator Inline graphic.

Remark 1.

The variance ratio Inline graphic quantifies the relative residual variability of Inline graphic given Inline graphic between the trial data and external controls. In general, estimating the conditional variance ratio involves nonparametric regression, which can be challenging; see Shen et al. (2020) and the references therein. Fortunately, the consistency of Inline graphic does not rely on the correct specification of Inline graphic. For example, if Inline graphic is set to be zero, Inline graphic reduces to the trial-only estimator without borrowing any external information, which is always consistent. In order to leverage external information and estimate Inline graphic practically, we can make a simplifying homoscedasticity assumption that the residual variances of Inline graphic after addressing Inline graphic are constant over studies. In this case, Inline graphic can be estimated by Inline graphic.

We show that Inline graphic has the following desirable properties. (i) Local efficiency: Inline graphic achieves the semiparametric efficiency bound if the nuisance functions are correctly specified. (ii) Double robustness: Inline graphic is consistent for Inline graphic if either the model for Inline graphic or that for Inline graphic is correct; see the proof in the Supplementary Material.

The doubly robust estimators were initially developed to gain robustness to parametric misspecification, but are now known to also be robust to approximation errors using machine learning methods (e.g., Chernozhukov et al., 2018). We investigate this new doubly robust feature for the proposed estimator Inline graphic, and use flexible semiparametric or nonparametric methods to estimate both Inline graphic (Inline graphic), Inline graphic and Inline graphic in (1). First, we consider the method of sieves (Chen, 2007) for Inline graphic. In comparison with other nonparametric methods such as kernels, the method of sieves is particularly well suited for calibration weighting. We consider general sieve basis functions such as power series, Fourier series, splines, wavelets and artificial neural networks; see Chen (2007) for a comprehensive review. The number of bases can be selected by cross-validation. Second, we consider flexible outcome models, e.g., generalized additive models, kernel regression and the method of sieves for Inline graphic (Inline graphic). Using flexible methods alleviates bias from the misspecification of parametric models. The following regularity conditions are required for the nuisance function estimators.

Assumption 3.

For a function Inline graphic with a generic random variable Inline graphic, define its Inline graphic norm as Inline graphic. Assume that

Assumption 3 is a set of typical regularity conditions for Inline graphic-estimation to achieve rate double robustness (Van der Vaart, 2000). Under these regularity conditions, our proposed framework can incorporate flexible methods for estimating the nuisance functions, while maintaining parametric rate consistency for Inline graphic.

Theorem 2.

Under Assumptions 1–3, we have Inline graphic, where Inline graphic. If Inline graphic, Inline graphic achieves semiparametric efficiency.

Theorem 2 motivates variance estimation by Inline graphic, which is consistent for Inline graphic under Assumptions 1–3.

2.3. Bias detection and selective borrowing

In practical situations, Assumption 2 may not hold, and the augmentation in (1) can be biased. We develop a selective borrowing framework to select external subjects that are comparable with the concurrent controls for integration. To account for potential violations, we introduce a vector of bias parameters Inline graphic for all Inline graphic, where Inline graphic. When Assumption 2 holds, we have Inline graphic. Otherwise, there exists at least one Inline graphic such that Inline graphic. To prevent bias in Inline graphic from incomparable external controls, the goal is to select the comparable subset with Inline graphic and exclude any others with Inline graphic.

Let Inline graphic be a consistent estimator for Inline graphic where Inline graphic is a consistent estimator for Inline graphic. Let Inline graphic be an initial estimator for Inline graphic. We propose a refined estimator of Inline graphic by penalized estimation:

2.3. (2)

Here Inline graphic is the estimated variance of Inline graphic, Inline graphic is the adaptive lasso penalty term and Inline graphic are two tuning parameters. Intuitively, if Inline graphic is close to zero, the associated penalty will be large, which further shrinks estimate Inline graphic towards zero. According to Zou (2006), Huang et al. (2008) and Lin et al. (2009), the adaptive lasso penalty can lead to a desirable property under the following regularity conditions.

Assumption 4.

Suppose that

  • i.

    Inline graphic and Inline graphic for all Inline graphic ,

  • ii.

    there exist constants Inline graphic and Inline graphic such that Inline graphic , where Inline graphic and Inline graphic are the smallest and largest eigenvalues of Inline graphic ,

  • iii.

    Inline graphic , where Inline graphic , and

  • iv.

    Inline graphic and Inline graphic .

Lemma 1.

Suppose that the assumptions in Theorem 2 and Assumption 4 hold except that Assumption 2 may be violated. We have Inline graphic.

Lemma 1 shows that the adaptive lasso penalty has the ability to select zero-valued parameters consistently when using an Inline graphic-consistent initial estimator Inline graphic and proper choices of Inline graphic, provided that the minimum of the nonzero bias Inline graphic does not diminish too fast and the initial estimator Inline graphic is sufficiently good. In practice, the initial estimator Inline graphic can be obtained by leveraging off-the-shelf machine learning models with a guaranteed convergence rate, and Inline graphic are selected by minimizing the mean square error using cross-validation. Given Inline graphic, the selected set of comparable external controls is Inline graphic. The modified integrative estimator is

2.3. (3)

where Inline graphic is the estimated function of Inline graphic, which is used to adjust for changes in the covariate distribution from all external controls in Inline graphic to Inline graphic.

Following the suggestions of Ho et al. (2007) to improve the finite-sample performances, nearest-neighbour matching based on the estimated probability of trial inclusion Inline graphic is performed after selecting the comparable subset Inline graphic, which ensures a more balanced allocation ratio between the treated group and the hybrid control arm; see Algorithm 1 below for an overview of our selective borrowing framework.

Algorithm 1.

Proposed selective integrative estimator.

   Input: a randomized controlled trial with size Inline graphic and external controls.

  • Step 1. Fit the models for the outcome meansInline graphic  and weightsInline graphic .

  • Step 2. Construct the initial estimator Inline graphic for the bias parameter Inline graphic .

  • Step 3. Select the comparable subsetInline graphic  via the bias penalization ( 2 ).

  • Step 4. IfInline graphic  then perform the nearest-neighbour matching to selectInline graphic  external controls as the finalInline graphic  ; otherwise, jump to step 5.

  • Step 5. ComputeInline graphic  in ( 2.3 ) using the selected external controls inInline graphic .

We show the efficiency gain of the proposed estimator compared to the trial-only estimator.

Theorem 3.

Suppose that the assumptions in Theorem 2 and Assumption 4 hold except that Assumption 2 may be violated. Let Inline graphic. The reduction of the asymptotic variance of Inline graphic compared to the trial-only estimator is  

Theorem 3. (4)

which is strictly positive unless Inline graphic or Inline graphic or Inline graphic for all Inline graphic such that Inline graphic.

We derive (4) using orthogonality of the efficient influence function of Inline graphic to the nuisance tangent space, and relegate the details to the should be highlighted. Theorem 3 showcases the advantage of including external controls in a data-adaptive manner, where the asymptotic variance of Inline graphic should be strictly smaller than the trial-only estimator unless the external controls all suffer exceeding noise, i.e., Inline graphic, or the compatible subset Inline graphic of the external controls is an empty set, i.e., Inline graphic, or the covariate Inline graphic captures all the variability of Inline graphic in the trial data, i.e., Inline graphic. Below, we establish the asymptotic properties and provide a valid inferential framework for the proposed integrative estimator; more details are provided in the Supplementary Material.

Theorem 4.

Suppose that the assumptions in Theorem 2 and Assumption 4 hold except that Assumption 2 may be violated. We have Inline graphic. Furthermore, the Inline graphic confidence interval Inline graphic for Inline graphic can be constructed as  

Theorem 4.

where Inline graphic is a variance estimator of Inline graphic, Inline graphic is the Inline graphic quantile for the standard normal distribution and Inline graphic satisfies Inline graphic as Inline graphic.

3. Simulation

In this section, we evaluate the finite-sample performance of the proposed framework to estimate treatment effects under potential bias scenarios via plasmode simulations. First, a set of Inline graphic baseline covariates Inline graphic is generated by mimicking the correlation structure and the moments (up to the sixth) of variables from an oncology randomized placebo-controlled trial (i.e., the trial data) and the Flatiron Health Spotlight Phase 2 cohort (© 2020 Flatiron Health, all rights reserved; external controls).

Next, we generate the data source indicator Inline graphic as Inline graphic given the sample sizes Inline graphic, where Inline graphic represents an unmeasured confounder. The treatment assignment for the trial data is completely at random (i.e., Inline graphic), while all external subjects receive the control (i.e., Inline graphic). The outcomes Inline graphic are generated as

3.

We consider three data-generating scenarios in Table 1(a), where Inline graphic is chosen adaptively to ensure the desired sample sizes Inline graphic, and Inline graphic are chosen empirically based on the model fits using the observed oncology clinical trial data. In all the scenarios, we use the linear predictor of Inline graphic to fit Inline graphic, and thus the models are correctly specified under the model choices Inline graphic, where the linear predictor of Inline graphic governs the true data generation, but are misspecified under choices Inline graphic, where the data generation depends on a new set of covariates Inline graphic, which include the quadratic and cubic terms of the Inline graphicth and Inline graphicth covariates (i.e., Inline graphic) addition to the baseline covariate Inline graphic. Moreover, we utilize the cross-fitting procedure to select tuning parameters for the gradient boosting model.

Table 1:

Simulation settings: (a) model choices (C and W), where Inline graphic  Inline graphic, and (b) descriptions of the five estimators

(a) Model choices
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
(b) Estimators
Inline graphic The augmented inverse probability weighting estimator without borrowing (Cao et al., 2009)
Inline graphic The integrative augmented calibration weighting estimator with full borrowing (Li et al., 2023)
Inline graphic The data-adaptive integrative estimator using the linear regressions for Inline graphic
Inline graphic The data-adaptive integrative estimator using the tree-based gradient boosting for Inline graphic
Inline graphic The Bayesian predictive Inline graphic-value power prior estimator (Kwiatkowski et al., 2023)

The proposed framework is evaluated on imbalanced trial data, where Inline graphic and Inline graphic with an external control group of size Inline graphic. We investigate the performance of our proposed estimator under two levels of unmeasured confounding (Inline graphic and 0.3) by comparing with other estimators in Table 1(b). The trial-only augmented inverse probability weighting estimator Inline graphic (Cao et al., 2009) and the augmented calibration weighting estimator Inline graphic with full borrowing (Li et al., 2023) are used as benchmarks. Two data-adaptive integrative estimators, Inline graphic and Inline graphic, are considered, where linear regressions and tree-based gradient boosting are used to estimate the nuisance models. Other machine learning algorithms that satisfy pointwise consistency, such as the generalized additive model, can also be utilized to select a comparable subset of external controls consistently. The Bayesian predictive Inline graphic-value power prior estimator, Inline graphic, is an extension of the power prior, which discounts each external control according to its outcome compatibility using Box’s Inline graphic-value (Kwiatkowski et al., 2023).

Figure 1 displays the average bias, variance, mean squared error and Type-I error when Inline graphic, and power for testing Inline graphic when Inline graphic based on 1000 sets of data replications. Over the three model scenarios, the trial-only estimator Inline graphic is always consistent, but lacks efficiency as it only utilizes the concurrent controls for estimation, especially when Inline graphic is small. When the conditional mean exchangeability in Assumption 2 holds (i.e., Inline graphic), the full-borrowing estimator Inline graphic is most efficient, shown by its low mean squared error and high power for detecting a significant treatment effect. Our proposed selective integrative estimators, Inline graphic and Inline graphic, may be less efficient than Inline graphic due to finite-sample selection error. However, they maintain smaller variance and improved power compared to Inline graphic, regardless of whether the nuisance models are misspecified. When Assumption 2 is violated (i.e., Inline graphic), Inline graphic becomes biased, leading to an inflated Type-I error and low power. The Bayesian estimator Inline graphic requires correct parametric specification of the outcome model and performs poorly when the model omits a key confounder that is imbalanced between data sources. In our simulations, high weights were assigned to the external control subjects, which led to some bias in the treatment effect estimates when Inline graphic was small. However, both Inline graphic and Inline graphic achieve smaller mean squared errors than the trial-only estimator by incorporating external control subjects. In cases where the outcome model is incorrectly specified and Inline graphic, the benefit of using machine learning methods becomes apparent. Specifically, the flexibility of the gradient boosting model ensures the convergence rate assumption for Inline graphic, i.e., Inline graphic for a certain sequence Inline graphic (Zhang & Yu, 2005). By incorporating compatible external controls more accurately, Inline graphic better controls bias and achieves comparable power levels to Inline graphic. However, the adaptive lasso estimation based on the misspecified linear model lacks such properties and may not provide gains in power. One notable trade-off of our proposed estimators is the slight Type-I error inflation when Inline graphic is small and Assumption 2 is violated, which can be attributed to finite-sample selection error and was also observed by Viele et al. (2014).

Figure 1:

Figure 1:

Simulation results under various levels of Inline graphic, and different model choices of Inline graphic and Inline graphic.

4. Real-data application

In this section, we present an application of the proposed methodology to investigate the effectiveness of basal insulin lispro against regular insulin glargine in patients with Type-I diabetes. When combined with preprandial insulin lispro, basal insulin lispro and insulin glargine are two long-acting insulin formulations used for patients with Type-I diabetes mellitus. We analyse the IMAGINE-1 study, a randomized controlled trial where participants were unevenly assigned to either basal insulin lispro (treatment group) or insulin glargine (control group). Additionally, external control subjects from the IMAGINE-3 trial were used. In the Supplementary Material we also explore the effectiveness of solanezumab versus the placebo in slowing Alzheimer’s disease progression using external observational data.

Our primary objective is to test the hypothesis of whether basal insulin lispro is superior to regular insulin glargine at glycemic control for patients with Type-I diabetes mellitus. This can be achieved by comparing the deviation of the hemoglobin A1c level from baseline after 52 weeks of treatment. Both studies contain a rich set of baseline covariates Inline graphic, such as age, gender, baseline hemoglobin A1c (%), baseline fasting serum glucose (mmol/L), baseline triglycerides (mmol/L), baseline low-density lipoprotein cholesterol (mmol/L) and baseline alanine transaminase (U/L). The primary analysis population in IMAGINE-1 was the randomized patients who received at least one treatment dose. To mimic the full-analysis population from IMAGINE-1, external control subjects with missing baseline assessments are discarded from IMAGINE-3. The last observation carried forward is used to impute missing postbaseline outcomes. The IMAGINE-1 study consists of Inline graphic subjects with 286 in the treated group and 153 in the control group, while the IMAGINE-3 study includes Inline graphic patients in the control arm. In our statistical analysis, we first use the baseline covariates Inline graphic to model the trial inclusion probability by calibration weighting under the entropy loss function. Next, we assume a linear heterogeneity treatment effect function for the outcomes with Inline graphic as the treatment modifier, and compare the same set of estimators in the simulation study.

Table 2 reports the estimated results. The trial-only estimator Inline graphic shows that basal insulin lispro has a significant treatment effect on reducing the glucose level solely based on the IMAGINE-1 study. Because of potential population bias, the naively integrative estimators Inline graphic and Inline graphic, albeit significant, are slightly different from Inline graphic, which may be subject to possible biases of the external controls. After filtering out the incompatible patients from the external controls by our adaptive lasso selection, the final integrative estimates Inline graphic and Inline graphic are closer to the benchmark, but have narrower confidence intervals. According to our adaptive analysis result, basal insulin lispro is significantly more effective than regular insulin glargine at glycemic control when used for patients with Type-I diabetes mellitus.

Table 2:

Point estimates, standard errors and 95% confidence intervals of the treatment effect of BIL against regular GL based on the IMAGINE-1 and IMAGINE-3 studies

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Est. (SE) Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
CI Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

Est., estimate; SE, standard error; CI, confidence interval; BIL, basal insulin lispro; GL, regular insulin glargine.

Next, we compare the performances of Inline graphic with our data-adaptive integrative estimates to highlight the advantages of our dynamic borrowing framework. To this end, we retain the size of the treatment group, but create 100 subsamples by randomly selecting Inline graphic patients from its control group, where Inline graphic. Then, the patients treated with regular insulin glargine in the IMAGINE-3 study are augmented to each selected subsample and the treatment effect is evaluated upon the hybrid control arm design. Figure 2 presents the average probabilities of successfully detecting Inline graphic, the so-called probability of success, against the size of subsamples. When solely utilizing patients from the IMAGINE-1 study, Inline graphic produces a probability of success larger than 0.8 only if the size of the control group is larger than 25. Combined with the IMAGINE-3 study, Inline graphic and Inline graphic refine the treatment effect estimation and only 15 patients are needed in the concurrent control group to attain a probability of success higher than 0.8. Therefore, by properly leveraging the external controls, we may accelerate drug development by decreasing the number of patients on the concurrent control, thereby reducing the duration and cost of the clinical trial.

Figure 2:

Figure 2:

Probability of success for detecting Inline graphic by Inline graphic, Inline graphic and Inline graphic with varying control group sizes in the IMAGINE-1 study.

5. Discussion

Interest in the use of external control arms for drug development is becoming more common. However, concerns regarding their quality and validity have limited their use for healthcare decision-making thus far, necessitating careful and appropriate assessment. To adjust for potential selection bias, our proposed method calibrates the covariate moments across two data sources, ensuring that the covariate distributions in both sources match each other. Alternative predictive model-based strategies are applicable when only a subset of covariates is shared (Stuart et al., 2011; Tipton, 2014). To address differences in outcomes, we select comparable external subsets based on the adaptive lasso penalty. Alternative penalties can be considered if the selection consistency property is attained, such as the smoothly clipped absolute deviation penalty (Fan & Li, 2001). Moreover, our framework can be easily extended to augment observational studies with external data, which may require additional modelling and assumptions to achieve double robustness. Slight Type-I error inflation is observed in our simulations when the concurrent control group is small, attributed to selection error in finite samples. One future direction will be to rigorously construct a data-adaptive confidence interval to account for finite-sample selection uncertainty without being overly conservative (Lee et al., 2016; Tibshirani et al., 2016). Other future directions include extending the proposed integrated inferential framework to survival outcomes (Lee et al., 2022a), estimating heterogeneous treatment effects (Wu & Yang, 2022a; Yang et al., 2022) and combining probability and nonprobability samples (Yang et al., 2020; Gao & Yang, 2023).

Supplementary Material

asae069_Supplementary_Data

Acknowledgement

This project was supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services (HHS) (U01FD007934) and the National Institute on Aging of the National Institutes of Health (R01AG06688). The views and opinions expressed herein are those of the authors and do not necessarily represent those of, nor endorsement by, FDA/HHS, the National Institutes of Health or the U.S. Government.

Contributor Information

Chenyin Gao, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, USA.

Shu Yang, Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, USA.

Mingyang Shan, Eli Lilly & Company, Lilly Corporate Center, 893 Delaware Street, Indianapolis, Indiana 46285, USA.

Wenyu Ye, Eli Lilly & Company, Lilly Corporate Center, 893 Delaware Street, Indianapolis, Indiana 46285, USA.

Ilya Lipkovich, Eli Lilly & Company, Lilly Corporate Center, 893 Delaware Street, Indianapolis, Indiana 46285, USA.

Douglas Faries, Eli Lilly & Company, Lilly Corporate Center, 893 Delaware Street, Indianapolis, Indiana 46285, USA.

Supplementary material

The Supplementary Material includes all technical proofs, additional simulation results and other real-data applications. An open-source software R package (R Development Core Team, 2025) is available for implementing our proposed methodology at https://github.com/IntegrativeStats/SelectiveIntegrative.

References

  1. Bickel P. J., Klaassen C., Ritov Y. & Wellner J. (1998). Efficient and Adaptive Inference in Semiparametric Models, vol. 50. Baltimore, MD: Johns Hopkins University Press. [Google Scholar]
  2. Cao W., Tsiatis A. A. & Davidian M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika  96, 723–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chatterjee N., Chen Y.-H., Maas P. & Carroll R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. J. Am. Statist. Assoc. 111, 107–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook of Econometrics, vol. 6, Ed. Heckman J.J. and Leamer E. E., pp. 5549–5632. Amsterdam: Elsevier. [Google Scholar]
  5. Chen Z., Ning J., Shen Y. & Qin J. (2021). Combining primary cohort data with external aggregate information without assuming comparability. Biometrics  77, 1024–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chernozhukov V., Chetverikov D., Demirer M., Duflo E., Hansen C., Newey W. & Robins J. (2018). Double/debiased machine learning for treatment and structural parameters. Econom. J. 21, 1–68. [Google Scholar]
  7. Chu J., Lu W. & Yang S. (2023). Targeted optimal treatment regime learning using summary statistics. Biometrika  110, 913–31. [Google Scholar]
  8. Colnet B., Mayer I., Chen G., Dieng A., Li R., Varoquaux G., Vert J.-P., Josse J. & Yang S. (2020). Causal inference methods for combining randomized trials and observational studies: a review. Statist. Sci.  39, 165–91. [Google Scholar]
  9. Fan J. & Li R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 96, 1348–60. [Google Scholar]
  10. FDA. (2001). E10 Choice of Control Group and Related Issues in Clinical Trials. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/e10-choice-control-group-and-related-issues-clinical-trials [PubMed]
  11. FDA. (2014). Blinatumomab Drug Approval Package. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2014/125557Orig1s000TOC.cfm
  12. FDA. (2016). Avelumab Drug Approval Package. https://www.fda.gov/drugs/resources-information-approved-drugs/avelumab-bavencio
  13. FDA. (2019). Rare Diseases: Natural History Studies for Drug Development. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/rare-diseases-natural-history-studies-drug-development
  14. FDA. (2021). Real-World Data: Assessing Registries to Support Regulatory Decision-Making for Drug and Biological Products Guidance for Industry. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/real-world-data-assessing-registries-support-regulatory-decision-making-drug-and-biological-products
  15. FDA. (2023). Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products Guidance for Industry. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-design-and-conduct-externally-controlled-trials-drug-and-biological-products
  16. Gao C. & Yang S. (2023). Pretest estimation in combining probability and non-probability samples. Electron. J. Statist.  17, 1492–546. [Google Scholar]
  17. Ghadessi M., Tang R., Zhou J., Liu R., Wang C., Toyoizumi K., Mei C., Zhang L., Deng C. & Beckman R. A. (2020). A roadmap to using historical controls in clinical trials – by Drug Information Association Adaptive Design Scientific Working Group (DIA-ADSWG). Orphanet J. Rare Dis. 15, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ho D. E., Imai K., King G. & Stuart E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal.  15, 199–236. [Google Scholar]
  19. Hobbs B. P., Carlin B. P., Mandrekar S. J. & Sargent D. J. (2011). Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials. Biometrics  67, 1047–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Huang J., Ma S. & Zhang C.-H. (2008). Adaptive lasso for sparse high-dimensional regression models. Statist. Sinica  18, 1603–18. [Google Scholar]
  21. Ibrahim J. G. & Chen M.-H. (2000). Power prior distributions for regression models. Statist. Sci.  15, 46–60. [Google Scholar]
  22. Imbens G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: a review. Rev. Econ. Statist. 86, 4–29. [Google Scholar]
  23. Kwiatkowski E., Zhu J., Li X., Pang H., Lieberman G. & Psioda M. A. (2023). Case weighted adaptive power priors for hybrid control analyses with time-to-event data. arXiv: 2305.05913v1. [DOI] [PMC free article] [PubMed]
  24. Lee D., Yang S., Dong L., Wang X., Zeng D. & Cai J. (2022b). Improving trial generalizability using observational studies. Biometrics  79, 1213–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lee D., Yang S. & Wang X. (2022a). Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population. J. Causal Infer. 10, 415–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lee J. D., Sun D. L., Sun Y. & Taylor J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist.  44, 907–27. [Google Scholar]
  27. Li X., Miao W., Lu F. & Zhou X.-H. (2023). Improving efficiency of inference in clinical trials with external control data. Biometrics  79, 394–403. [DOI] [PubMed] [Google Scholar]
  28. Lin Z., Xiang Y. & Zhang C. (2009). Adaptive lasso in high-dimensional settings. J. Nonparam. Statist. 21, 683–96. [Google Scholar]
  29. Liu M., Bunn V., Hupf B., Lin J. & Lin J. (2021). Propensity-score-based meta-analytic predictive prior for incorporating real-world and historical data. Statist. Med. 40, 4794–808. [DOI] [PubMed] [Google Scholar]
  30. Neuenschwander B., Branson M. & Spiegelhalter D. J. (2009). A note on the power prior. Statist. Med.  28, 3562–6. [DOI] [PubMed] [Google Scholar]
  31. Neuenschwander B., Capkun-Niggli G., Branson M. & Spiegelhalter D. J. (2010). Summarizing historical information on controls in clinical trials. Clin. Trials  7, 5–18. [DOI] [PubMed] [Google Scholar]
  32. Odogwu L., Mathieu L., Blumenthal G., Larkins E., Goldberg K. B., Griffin N., Bijwaard K., Lee E. Y., Philip R., Jiang X.  et al. (2018). FDA approval summary: dabrafenib and trametinib for the treatment of metastatic non-small cell lung cancers harboring BRAF V600E mutations. The Oncologist  23, 740–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Phelan M., Bhavsar N. A. & Goldstein B. A. (2017). Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. J Electron. Health Data Meth. 5, 22–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Pocock S. J. (1976). The combination of randomized and historical controls in clinical trials. J. Chronic Dis. 29, 175–88. [DOI] [PubMed] [Google Scholar]
  35. Qin J., Zhang H., Li P., Albanes D. & Yu K. (2015). Using covariate-specific disease prevalence information to increase the power of case-control studies. Biometrika  102, 169–80. [Google Scholar]
  36. R Development Core Team (2025). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org [Google Scholar]
  37. Rosenbaum P. R. & Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika  70, 41–55. [Google Scholar]
  38. Rubin D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66, 688–701. [Google Scholar]
  39. Schoenfeld D. A., Finkelstein D. M., Macklin E., Zach N., Ennist D. L., Taylor A. A., Atassi N. & Pooled Resource Open-Access ALS Clinical Trials Consortium. (2019). Design and analysis of a clinical trial using previous trials as historical control. Clin. Trials  16, 531–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Shan M., Faries D., Dang A., Zhang X., Cui Z. & Sheffield K. M. (2022). A simulation-based evaluation of statistical methods for hybrid real-world control arms in clinical trials. Statist. Biosci. 14, 259–84. [Google Scholar]
  41. Shen Y., Gao C., Witten D. & Han F. (2020). Optimal estimation of variance in nonparametric regression with random design. Ann. Statist.  48, 3589–618. [Google Scholar]
  42. Silverman B. (2018). A baker’s dozen of US FDA efficacy approvals using real world evidence. Pharma Intelligence Pink Sheet, 7 August.
  43. Stuart E. A. (2010). Matching methods for causal inference: a review and a look forward. Statist. Sci. 25, 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Stuart E. A. & Rubin D. B. (2008). Matching with multiple control groups with adjustment for group differences. J. Educ. Behav. Statist. 33, 279–306. [Google Scholar]
  45. Stuart E. A., Cole S. R., Bradshaw C. P. & Leaf P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Statist. Soc. A  174, 369–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Tibshirani R. J., Taylor J., Lockhart R. & Tibshirani R. (2016). Exact post-selection inference for sequential regression procedures. J. Am. Statist. Assoc. 111, 600–20. [Google Scholar]
  47. Tipton E. (2014). How generalizable is your experiment? An index for comparing experimental samples and populations. J. Educ. Behav. Statist. 39, 478–501. [Google Scholar]
  48. Tsiatis A. (2006). Semiparametric Theory and Missing Data. New York: Springer. [Google Scholar]
  49. Van der Vaart A. W. (2000). Asymptotic Statistics, vol. 3. Cambridge: Cambridge University Press. [Google Scholar]
  50. van Rosmalen J., Dejardin D., van Norden Y., Löwenberg B. & Lesaffre E. (2018). Including historical data in the analysis of clinical trials: is it worth the effort?  Statist. Meth. Med. Res. 27, 3167–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Viele K., Berry S., Neuenschwander B., Amzal B., Chen F., Enas N., Hobbs B., Ibrahim J. G., Kinnersley N., Lindborg S.  et al. (2014). Use of historical control data for assessing treatment effects in clinical trials. Pharm. Statist. 13, 41–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Visvanathan K., Levit L. A., Raghavan D., Hudis C. A., Wong S., Dueck A. & Lyman G. H. (2017). Untapped potential of observational research to inform clinical decision making: American Society of Clinical Oncology research statement. J. Clin. Oncol. 35, 1845–54. [DOI] [PubMed] [Google Scholar]
  53. Wu L. & Yang S. (2022a). Integrative Inline graphic-learner of heterogeneous treatment effects combining experimental and observational studies. In Proc. 1st Conf. Causal Learn. Reason., pp. 904–26. PMLR.
  54. Wu L. & Yang S. (2022b). Transfer learning of individualized treatment rules from experimental to real-world data. J. Comp. Graph. Statist. 32, 1036–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Yang S., Kim J. K. & Song R. (2020). Doubly robust inference when combining probability and non-probability samples with high dimensional data. J. R. Statist. Soc. B  82, 445–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Yang S., Zeng D. & Wang X. (2022). Elastic integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation. arXiv: 2005.10579v3. [DOI] [PMC free article] [PubMed]
  57. Zhai Y. & Han P. (2022). Data integration with oracle use of external information from heterogeneous populations. J. Comp. Graph. Statist. 31, 1001–12. [Google Scholar]
  58. Zhang H., Deng L., Schiffman M., Qin J. & Yu K. (2020). Generalized integration model for improved statistical inference by leveraging external summary data. Biometrika  107, 689–703. [Google Scholar]
  59. Zhang T. & Yu B. (2005). Boosting with early stopping: convergence and consistency. Ann. Statist.  33, 1538–79. [Google Scholar]
  60. Zou H. (2006). The adaptive LASSO and its oracle properties. J. Am. Statist. Assoc. 101, 1418–29. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

asae069_Supplementary_Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES