Abstract
The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this article, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture model that allows us to find equivalent population strata in the EHR and the treatment arm and then resample the EHR data to create equivalent patient populations under both the single arm trial and the resampled EHR. Resampling is implemented via a density-free importance sampling scheme. Using the synthetic control arm, inference for the treatment effect can then be carried out using any method available for RCTs. Alternatively the proposed nonparametric Bayesian model allows straightforward model-based inference. In simulation experiments, the proposed method exhibits higher power than alternative methods in detecting treatment effects, specifically for nonlinear response functions. We apply the method to supplement single arm treatment-only glioblastoma studies with a synthetic control arm based on historical trials. Supplementary materials for this article are available online.
Keywords: Common atoms mixture, Glioblastoma, Importance sampling, Mixtures, Real world data, Single-arm trials
1. Introduction
We introduce a novel Bayesian nonparametric regression model to construct synthetic control arms from external real world data (RWD) to supplement single arm treatment-only trials. The use of common atoms across multiple random probability measures is a critical feature of the proposed construction. Models with similar features have been used before in the literature, including Denti et al. (2021), Camerlenghi et al. (2019), Rodríguez, Dunson, and Gelfand (2008), and Teh et al. (2006).
Randomized controlled trials (RCT) are the gold standard in evidence-based evaluation of new treatments. RCTs are, however, increasingly associated with bottlenecks involving volunteer recruitment, patient truancy and adverse events (Nichol, Bailey, and Cooper 2010) and hence are often very time consuming, expensive and laborious. This is of particular concern for rare diseases, such as glioblastoma (GBM). With digitization of health records and other advances in medical informatics, new data sources are becoming available that can supplement RCTs. For example, relevant information on a control treatment is often available from completed RCTs, electronic health record data, insurance claims data or patient registries from hospitals (Franklin et al. 2019). Such external data, also referred to as RWD, can augment or substitute the control group in the target clinical trial (Davi et al. 2020). This has led researchers to consider the creation of synthetic control arms from RWD (see Schmidli et al. 2020 for a review). However, the heterogeneity of RWD prohibits the direct use of patient level data as a control arm, lest differences with the actual treatment population with respect to patient profiles bias inference on treatment effects (Burcu et al. 2020). Many existing methods adjust for the lack of randomization in treatment assignments by correcting the bias in the response model and hence can be sensitive to the specification of the treatment assignment as well as the response model as we discuss below. In this article, we take a fundamentally different approach by resampling the RWD to construct a cohort equivalent to the treatment arm in terms of their covariate profiles which can then serve as the (synthetic) control arm.
There is a fast growing literature on the problem of incorporating RWD in clinical trials. Traditional meta-analytic approaches aim to combine information across studies to construct comparisons of treatments (Sutton and Abrams 2001). Power prior (Prevost, Abrams, and Jones 2000; Chen and Ibrahim 2000), commensurate prior (Hobbs et al. 2011) and elastic prior (Jiang, Nie, and Yuan 2023) constructions try to incorporate information from historical data by way of informative prior models. However, these approaches may be inadequate when the RWD population is considerably more heterogeneous than the experimental arm; see Müller, Chandra, and Sarkar (2023) for a review.
Many methods to incorporate RWD in trial design and data analysis are based on propensity scores (PSs), defined as the conditional probabilities of treatment assignment given covariates. In the context of incorporating external data, investigators often use PSs for a patient being selected into the current trial versus the external data; in case of supplementing a single arm treatment-only trial, the PSs are identical to treatment assignments. Rosenbaum and Rubin (1983) showed that an unbiased estimate of the average treatment effect can be obtained by PS adjustments. Most PS-based methods can be broadly classified to be based on matching, stratification, weighting, or regression. Matching is used to achieve covariate balance across different arms. However, matching PSs do not generally imply matching covariates (King and Nielsen 2019). Stratification splits the data into strata with respect to PSs and calculates an average treatment effect as a weighted average of within-stratum estimates (Wang et al. 2019; Chen et al. 2020; Lu et al. 2022). PS-stratification may be sensitive to the definition of the strata and weight-based estimators may be sensitive to the misspecification of the PS model (Zhao 2004). Regression adjustments, that use the PS as a regressor for the outcome, address these issues (Rosenbaum and Rubin 1983) but the estimates may again be biased if the regression model is misspecified (Vansteelandt and Daniel 2014). Bayesian nonparametric models that avoid a particular parametric family or structure, such as linearity, of the regression relationship have thus also been proposed (Wang and Rosner 2019). Nevertheless, consolidated unidimensional PSs can be inadequate in matching multivariate covariates from multiple studies (Stuart 2010; King and Nielsen 2019). Additionally, these methods often do not efficiently use all available data by dropping unmatched data. Finally, some other methods (Hasegawa et al. 2017; Li and Song 2020), although not specifically designed to create synthetic controls, also integrate multiple studies using the covariate distributions.
In this article, we develop an alternative approach based on Bayesian nonparametric (BNP) mixture models. Mixture models imply a random partition of experimental units linked to different atoms in the mixture (Dahl 2006). We exploit this property to propose a BNP common atoms mixture (CAM) model to introduce matched clusters of patients in a treatment-only trial dataset and a (typically much larger) RWD. We show how such matched clusters allow a density free importance resampling scheme to generate a subpopulation of the RWD such that the distribution of covariates in the subpopulation can be considered to be equivalent to the single-arm trial. That is, the patients in a matching RWD cluster can be considered digital clones of patients in a matching cluster in the single-arm trial.
The proposed CAM model allows, among other things, the following two alternatives for inference on treatment effects. Having established equivalent patient populations, inference can in principle proceed as if treatment had been assigned at random, using inference for RCTs. Alternatively, we propose model-based inference using an extension of the CAM model with a sampling model for the outcome. While both alternatives are based on the same underlying CAM model, we prefer the model-based inference on treatment effect as a more explicit and principled approach.
The proposed CAM model builds on related BNP models in the literature, including the hierarchical Dirichlet process (DP) (Teh et al. 2006) which allows for information sharing across multiple groups through common atoms, the nested DP (Rodríguez, Dunson, and Gelfand 2008) which can identify distributional clusters, and Camerlenghi et al. (2019) who proposed a latent mixture of shared and idiosyncratic processes across the sub-models. Denti et al. (2021) proposed a CAM model for the analysis of nested datasets where the distributions of the units differ only over a small fraction of the observations sampled from each unit. In contrast to these constructions, the CAM model proposed here introduces more structure as needed in our application by setting up two nonparametric Bayesian mixture models with shared atoms and constraints on the implied clusters.
The rest of this article is organized as follows. Section 2 describes the glioblastoma study that motivated this work. Section 3.1 introduces the proposed common atoms mixture model on the covariates and how it can handle variable dimensional covariates of different data types; Section 3.2 introduces a novel density-free importance resampling scheme to achieve equivalent populations; and Section 3.3 discusses the general common atoms regression model, a flexible mixture of lognormals for censored survival outcomes and an easy to use graphical tool for model validation. In Section 4, we discuss two alternative strategies for inference on treatment effects. Section 6 presents simulation studies. Section 7 shows results for the motivating GBM data. Section 8 concludes with final remarks. Below, in Table 1, we list the many acronyms used in the article for easy reference.
Table 1.
List of acronyms.
| Acronym | Full forms | Acronym | Full forms |
|---|---|---|---|
| AUC | Area under the receiver operating characteristic curve | DP | Dirichlet process |
| BART | Bayesian additive regression tree | GBM | Glioblastoma |
| BNP | Bayesian nonparametric | IS | Importance sampling |
| CAM | Common atoms mixture | PPMx | Product partition model with regression on covariates |
| CA-PPMx | Common atoms PPMx | PS | Propensity score |
| RCT | Randomized controlled trial | RWD | Real-world data |
2. Motivating Application in Glioblastoma
Our motivating application arises from a GBM data science project at MD Anderson Cancer Center. GBM is a devastating disease with the average life expectancy of less than 12 months in the general population (Ostrom et al. 2016). Despite decades of intensive clinical research, the progress in developing an effective treatment for GBM lags behind that of other cancers (Aldape et al. 2019). In the last 30 years, only two drugs (carmustine wafers and temozolomide) have been approved by the Federal Drug Administration (FDA) for patients with newly diagnosed GBM (Fisher and Adamson 2021). These drugs extend median survival by less than three months and neither offers a potential for cure. One major cause of the high failure rate of the drug development for GBM is suboptimal design of phase II trials, in particular, the lack of a control arm in many studies (Grossman et al. 2017). A review of phase I/II GBM trials from 1980 to 2013 found that only 20 (5%) were randomized compared to 365 (95%) single-arm trials (Grossman and Ellsworth 2016). Reasons for the dominance of single-arm trials include the small number of GBM patients available for clinical trials and investigator’s desire to speed up drug development and reduce trial costs. GBM is a rare disease by the definition of the Orphan Drug Act (FDA 2020). Unfortunately, the high heterogeneity of GBM patients makes single-arm trials highly susceptible to bias, contributing to the fact that almost all phase II trials showing promising treatment effects failed in phase III RCTs (Mandel et al. 2017). The objective of the GBM data science project is to address this pressing issue by leveraging historical data collected at the MD Anderson Cancer Center. The overarching goal is to develop a platform for future single-arm clinical trials in GBM, with synthetic controls constructed from the historical database to enhance the evaluation and screening of new drugs. Working toward this goal, we describe here a method to create synthetic controls, as the engine of the platform, for future trials.
We work with a database that comprises records from 339 highly clinically and molecularly annotated GBM patients treated at MD Anderson over more than 10 years. Once the system is set, the database is expected to be continuously updated with new patient data collected at MD Anderson Cancer Center and potentially also be combined with the data from other institutions.
After discarding variables with minimal variability across patients and relying on clinical judgment, we identified 11 clinically important categorical covariates. These covariates are commonly considered as prognostic factors in GBM treatments (Nam and de Groot 2017; Alexander et al. 2019) and are briefly described in Table 2.
Table 2.
Description of the covariates in the GBM data.
| Covariate | Description |
|---|---|
| Age | Dichotomized at 55 years |
| KPS | Karnofsky performance score, categorized into three classes: “≤ 60”, “(60,80]” and “> 80” |
| RT Dose | Radiation therapy dose: dichotomized at 50 Gray |
| SOC | Received standard-of-care (concurrent radiation therapy and temozolomide): Yes/No |
| CT | Participation in a therapeutic trial: Yes/No |
| MGMT | Status of MGMT (O6-methylguanine-DNA methyltransferase) gene: methylated (M), unmethylated (UM) or uninterpretable (UI) |
| ATRX | Loss of the ATRX chromatin remodeler gene: Yes/No |
| Gender | Gender |
| EOR | Extent of tumor resection: “total”, “subtotal” or “laser interstitial thermal therapy” (LITT, Patel and Kim 2020) |
| Histologic grade | Grade of astrocytoma: IV (GBM) (most cases), or I–III (low-grade or anaplastic) (few) |
| Surgery reason | “therapeutic” or “other” (relapse) |
Figure 1 shows the categorical covariates in the historical database and a future treatment-only study which we elaborate in Section 7. Figure S.1 in the supplementary materials highlights the lack of randomization in the two populations.
Figure 1.

Glioblastoma dataset of 11 baseline categorical covariates with missing entries in the two treatment arms. The left block shows the historical patients. The (smaller) right block shows a hypothetical future trial.
3. Common Atoms Mixture Model
We first introduce a model for matching patients with respect to their covariate profiles across different treatment arms and then an extension of the model to also include outcomes. Later we will introduce two alternative methods for inference on treatment effects that build on this model.
3.1. Common Atoms Mixture Model on the Covariates
Suppose we have datasets , comprising -dimensional covariate vectors and corresponding responses associated with patients . In this article, we assume the responses to be univariate. Let refer to the arm for the (new) experimental therapy, and denote the RWD datasets. Focusing on the motivating GBM application, we elaborate the model for with a single RWD set. When we have multiple historical datasets, that is, when , we would simply merge them and consider the merged dataset to be a single RWD with increased heterogeneity as illustrated in Section S.9.6 of the supplementary materials. For a valid evaluation of treatment effects, it is then important to verify equivalent patient populations, that is, matching the distributions of under versus , or to otherwise adjust for any detected differences (Burcu et al. 2020). As the RWD population can be from a variety of sources, such data are typically more heterogeneous than the patient population in the ongoing trial. We develop a novel BNP CAM model with this specific feature to model the two distributions. The proposed CAM model gives rise to a random partition of similar and a matching partition of . Clusters under the latter partition can be considered digital clones of the matching clusters of the earlier partition.
We first construct the model for covariates in the RWD. Let and denote cluster-specific parameters and weights, respectively. We let
| (1) |
Here is a suitably chosen kernel with parameter is a prior distribution for the ’s, and is a stick-breaking prior on the mixture weights corresponding to a DP with mass parameter (Sethuraman 1994). Let denote a discrete probability measure with atoms at the ’s. An equivalent hierarchical model representation of (1) is
| (2) |
where is a DP with base measure and concentration parameter (Ferguson 1973). The discrete nature of the DP random measure gives rise to possible ties between the ’s, which define the desired clusters. For later reference we define notations for these ties and clusters. Let denote the distinct values in , let if denote cluster membership indicators defining clusters . We assume the distribution of be a mixture with the same kernel and the same atoms ,
| (3) |
where , Dir indicates an -dimensional Dirichlet distribution with parameters , and is a concentration parameter. Note that model (3) is defined conditionally on (1) and such that and share the same set of atoms. Importantly, the construction avoids the imputation of clusters (strata) with only ’s. There is always a corresponding (non-empty) cluster for the ’s from the RWD. This is important for the upcoming constructions. The motivation here is that, owing to the bigger size of the RWD compared to the trial arm, can be expected to exhibit greater heterogeneity than (see, e.g., the right panel in Figure 2).
Figure 2.

An illustration of the CAM model: In the generative model, there are a total of four atoms shared between RWD and the treatment arm (left panel). Despite having positive weight, the atom is not associated with any sample from the RWD (right panel) and hence the density of the treatment arm is also allowed to be supported only on the remaining non-empty clusters of the RWD. The atom is linked with only the RWD (right panel). A cluster for the treatment arm alone is however not permissible.
In summary, we define and , with the prior on atoms and weights as discussed. Figure 2 shows a stylized representation of the generative process of the proposed CAM model. Notice that here atom is not linked with any observation and hence . Accordingly, is a mixture of three components. Finally, no observation from is linked to . The ‘s linked to and can be regarded as digital clones of the ’s linked to the same atoms.
The described CAM model is different from existing BNP mixture models. In (1)–(3), the atoms linked to are always a subset of those atoms that are linked to , which is not naturally the case for the hierarchical DP model (Teh et al. 2006). Also, unlike the nested DP (Rodríguez, Dunson, and Gelfand 2008) and the common atoms nested DP (Denti et al. 2021) models, there is no notion of clustering distributions. That is, a priori. Instead, the intention here is to cluster similar covariate values across the datasets.
Regarding the concentration parameters , we assume for . Ascolani et al. (2022) showed that a hyper-prior on the concentration parameters can solve the problem of inconsistency of DP mixtures (Miller and Harrison 2013).
3.1.1. Handling Mixed Data Types and Missing Values
An appealing feature of the proposed CAM model over existing approaches is the easy use of covariates of different data-types and missing values. Covariates in RCTs often comprise different data-types including continuous, discrete and categorical variables. Missing values are also quite common. For example, in Figure 1, there are a large number of missing values for the ATRX gene which has only recently been identified as a therapeutic target for glioma (Haase et al. 2018) and was therefore not commonly recorded before.
Many existing methods for handling missing data rely on imputation (Choi, Dekkers, and le Cessie 2019), possibly at the expense of an additional layer of prediction errors. Alternatively, data records with missing variables may be dropped altogether, resulting in a reduced sample size.
Assuming missingness completely at random, the proposed CAM model avoids these issues by accommodating variable dimensional covariates in a principled manner by considering a separate univariate kernel for each covariate. Note that a mixture with independent kernels can still accommodate marginal dependence between the covariates (Ghosal and van der Vaart 2017, sec. 7.2.2, pp. 175). Specifically, let denotes the set of observed covariates for patient in dataset . We use independent kernels
| (4) |
where is a univariate kernel corresponding to the th covariate with parameters and is a prior on with hyper-parameters . The likelihood function of is then computed on the basis of only the observed values. The kernel is chosen to accommodate the data-type of the th covariate. The model allows co-clustering of with some missing variables and another fully observed ; see Section S.3 of the supplementary materials for additional details. Missingness patterns other than completely at random can be handled by introducing additional hierarchy in the model, see, for example, Linero and Daniels (2018) for a review.
3.2. Density-Free Importance Resampling of RWD
Building on the fitted CAM for covariates, we propose an importance resampling method to create a subpopulation of that can be considered to be equivalent to (see below for a definition of equivalence that is being used here). Under the assumption of no unmeasured confounders, the ’s in the sampled (or weighted) subpopulation can be assumed to follow the same distribution as , and be considered digital clones of the . With such equivalent populations, in principle, any desired method for randomized clinical trials can subsequently be used to carry out inference on treatment effects. Such focus on equivalent populations follows recent recommendations by the FDA (FDA 2021).
Recall that denotes the mixture model for , under (1) and (3), respectively. We define equivalent populations as a subset (possibly all) of together with a set of weights such that expectation of any function of interest under can be evaluated as a (weighted) Monte Carlo average using these (and the weights). Here we assume that all stated expectations exist and that the order of taking expectations and limits can be switched.
Recall that . Alternatively, the joint model of can be expressed as . For easier housekeeping, we assume for , that is, the first atoms are linked with the ’s. Accordingly, we let using the same first atoms observed in the population. This is the exact construction of (1) and (3). For an equivalent population, we require weights attached to (using to drop samples) such that:
The weights are functions of and as follows. Define , the cardinality of the earlier introduced clusters . Then is an unbiased estimator of and
| (5) |
is an unbiased estimator of . We then recognize as the ideal weights. Since we only observe but not and , we replace in by a Monte Carlo average under posterior MCMC simulation to get the desired equality simulation-exact (i.e., in the limit as and the number of MCMC simulations increases). Let index the posterior sample and use , etc. to indicate parameter values in the th sample. We use
| (6) |
with being the importance sampling weight for . The ’s can be resampled with these weights to obtain the desired subpopulation with distribution (Skare, Bølviken, and Holden 2003). This resampled subpopulation of can then be regarded as equivalent in distribution to . Algorithm 1 summarizes the procedure.
To test the equivalence of the two populations, we use a Bayesian additive regression tree (BART, Chipman, George, and McCulloch 2010) in Step 5 of Algorithm 1. In extensive simulation studies in Section 6, we notice that an AUC (area under the receiver operating characteristic curve) less than 0.6 yields excellent empirical performance. Once equivalence is achieved, in principle any existing approach for inference on treatment effects can be used (see Section 4 and later).
Note that even if the RWD population is not a heterogeneous superset of the current trial, one can still fit the CAM model.
In case the RWD is not comparable, Step 5 of Algorithm 1 can discriminate the two populations and the AUC can quantify the degree of incongruence. In general, importance sampling schemes need the ratio of the target density (in our case, ) and the importance sampling density (in our case, ). For our problem, this would require high-dimensional density estimation. Even if the densities were known, importance sampling would be plagued by unbounded weights (Au and Beck 2003). Exploiting the common atoms structure, our proposed scheme however avoids evaluation of the marginal multivariate densities. We therefore refer to this as a density-free importance resampling scheme, and for brevity often simply as an IS scheme. In the denominator of , the use of (which by definition are ≥1) avoids complications arising from unbounded weights. Conventional importance sampling schemes are asymptotically consistent. This is seen to hold in numerical experiments with our algorithm as well. Additional discussions on Algorithm 1 are in Section S.4 of the supplementary materials.
Algorithm 1:
Density-free importance resampling of RWD and validation RWD and validation
| 1 Input two datasets and . |
| 2 Fit the CAM model to the data using MCMC simulation. Let and be the th MCMC sample of and , respectively, and be the size of cluster in the th MCMC iteration for . |
| 3 Calculate importance sampling weights |
| 4 Resample a subpopulation of size from with importance resampling weights with replacement. |
| 5 Test for equivalence of and the resampled subpopulation of using a supervised classification algorithm (e.g., a BART as described in the text). |
3.3. Regression with CAM Model on Covariates
Note that up to here we only concerned ourselves with the covariates, without any reference to the outcomes . In preparation for one of the strategies in the upcoming discussion of treatment comparison (Section 4), we now augment the CAM model to include a sampling model for the outcomes. That is, we add a response model on top of the CAM model on covariates.
The extended model defines a regression of on covariates by first grouping patients with similar covariate profiles into clusters and then adding a cluster-specific sampling model for the outcome . That is, the overall model specifies a regression of on via a random partition. A major advantage of this approach is that it allows a variable-dimension covariate vector—a feature that is not straightforward to include in a regression otherwise. Similar product partition models with regression on covariates (PPMx, see also Section S.2 in the supplementary materials) were considered by Müller, Quintana, and Rosner (2011) and Page, Quintana, and Müller (2022), albeit without any notion of common atoms. We will therefore refer to the model proposed below as the common atoms PPMx (CA-PPMx). Formally, we introduce cluster-specific parameters , and assume
| (7) |
for a suitable choice of . For example, for an event-time response, could be a lognormal, exponential or Weibull model. The response model (7) depends on the covariates indirectly via ’s, that is, the partition induced by the covariates. Within stratum , the response models allows for a treatment comparison based on , which can then be averaged with respect to the assumed distribution of to define an average treatment effect.
For the implementation in the motivating case study, we let denote the log OS (overall survival) times and assume to be a normal kernel with . Such mixtures are highly flexible (Ghosal, Ghosh, and Ramamoorthi 1999), making them an attractive choice for many applications. We complete the model with conjugate normal-inverse-gamma (NIG) priors on the ’s. In summary, we have
| (8) |
where is a gamma distribution with mean . We add the hyper-priors and on the main location-scale controlling hyper-parameters and while fixing the precision hyper-parameters and . Choices of these hyperparameters are discussed in Section S.7 of the supplementary materials. Finally, for a goodness-of-fit test under the proposed model, we use the approach of Johnson (2007) to build a graphical tool based on quantile plots. Such visual tools are often quite effective for detecting departures from model assumptions (Meloun and Militký 2011, chap. 2). See Section S.5 in the supplement for more details.
4. Inference on Treatment Effects
4.1. Two-Step Importance Sampling (IS) Approach
We already described the use of the weights in the CAM model to achieve equivalent patient populations. This allows a straightforward approach to treatment comparison. Using the adjusted (resampled) subpopulation of , one can proceed with inference on the treatment effect using any method relying on equivalent patient populations across the two arms. We refer to this approach as the “two-step IS” and use it in the simulation studies and applications in Sections 6 and 7, respectively. This approach does not make use of the outcome model of Section 3.3.
4.2. Model-Based Inference for Treatment Effects
Alternatively, we implement inference using the response model of Section 3.3, that is, the full CA-PPMx. We refer to this approach as “model-based inference”. We assume that the desired inference on treatment effects takes the form of inference for some notion of difference of the marginal distributions under the two treatment arms, . However, since the covariate populations in the two treatment arms can be substantially different, comparison between the marginal (with respect to the covariates) outcome models and can be biased. We need to appropriately adjust for the differences in the two populations. We do this by replacing as follows. Exploiting the common atoms structure of the proposed CA-PPMx, there is an operationally simple method to carry out this adjustment and infer treatment effects. Since within each cluster, the covariate populations can be considered equivalent, the adjustment for the lack of randomization amounts to adjusting the corresponding cluster weights. We define
where the mixture components of the response model in the RWD are weighted by , that is, the cluster weights associated with (rather than ). Thus, is the distribution of outcomes under control in the treatment population or in other words, the response of an average individual from the trial arm potentially treated with the control therapy. With these notions, we define the population adjusted treatment effect as
| (9) |
For example, when is a univariate response variable and simplifies to , which further reduces to when .
In general, each cluster of covariates in the CAM model can be interpreted as a homogeneous sub-population of patients. For the th group, the average treatment effect is and its proportion in the target population is . The reported treatment effect (9) includes the adjustment with the sub-population proportions . On a related point, the proposed model-based inference on treatment effects in the CA-PPMx model can be interpreted as a stochastic propensity score stratification approach. See Section S.6 in the supplementary materials for the details.
We prefer the Bayesian model-based approach to avoid discarding unmatched patient records from the RWD from the analysis. The two-step IS can be useful to validate the results obtained by the model-based approach.
5. Posterior Computation
We develop an efficient Gibbs sampler for posterior inference in the proposed CAM model for nonconjugate mixture of lognormals on survival outcomes. One potential complication arises from the varying dimension of depending on the observed atoms in . Posterior simulation with variable dimensional parameters generally involves complicated trans-dimensional Markov chain Monte Carlo (Green 1995), often resulting in poor mixing and computational inefficiencies. Our posterior sampling algorithm avoids such complications while rigorously maintaining the architecture of the CAM model. See Section S.8 in the supplementary materials for more details.
6. Simulation Study
We first describe the simulation scenarios.
CAM scenario:
We first consider a scenario where the covariates are generated from a CAM model. In this scenario, we take the first covariates to be continuous and the remaining 3 to be binary. For the trial arm , we generate and for . For the RWD arm, , we generate and for where and . We take ensuring that the population is substantially different from in having more heterogeneity.
MIX scenario:
In this scenario, we generate . We take for all but set so that the atoms in the treatment arm are not exactly a subset of those in the RWD. Given the typically larger heterogeneity of the RWD, this is not a realistic scenario. We include it to evaluate the approach under model misspecification. Different weights attached to the atoms in the two populations result in significantly different marginal densities.
Interaction scenario:
In this scenario, we resample from the historical GBM database of 339 patients to create a future single-arm trial population. Let denote the (unknown) distribution of the covariates in the database, and be an indicator variable such that if is selected into arm . That is, we sample iid from and from where is the PS of assignment to the treatment arm. We set to be a logistic regression with pairwise interactions between some covariates. We can sample by simple weighted resampling of the historical database, without explicitly knowing .
Oracle scenario:
In this fourth and final scenario, we proceed as in the Interaction scenario but now with defined as a logistic regression with main effects of the true predictors only, that is, as if an oracle had revealed the right predictors.
Outcome model:
Under the CAM and MIX scenarios, we generate and where is a nonlinear function; in the Interaction and Oracle scenarios we generate and , where for and , implying as the true treatment effect. We repeat the experiments for .
We repeat the simulations in the CAM and MIX scenarios for and set for all setups, keeping the ratio of the population sizes consistent with the GBM application. For each combination in the CAM and MIX scenarios, we perform 500 independent replications. Under the Interaction and Oracle scenarios, there are covariates and we use . To avoid reporting summaries that might just hinge on a lucky choice of the logistic regression coefficients in and to remove one source of randomness unrelated to the methods under comparison, we independently sample different sets of regression coefficients (from a discrete mixture distribution) for each of the 500 repeat simulations. Further details are provided in Section S.9.2 of the supplementary materials.
6.1. Analyses
We compare the CA-PPMx model with the PS-integrated power prior and composite likelihood approaches (Wang et al. 2019, 2020; Chen et al. 2020) as implemented in the psrwe R package, and a two-step population matching approach. We perform seven different analyses for each of the four scenarios to estimate the treatment effect which we define here as the difference in mean outcomes, that is, . The analyses are (i) CA-PPMx: The proposed CA-PPMx model of Section 4.2; (ii) IS-LM: The two-step IS approach introduced in Section 3.2. We first sample a subpopulation of size from following the importance resampling scheme proposed in Section 3.2 and subsequently estimate the treatment effect between the subpopulation and the treatment arm by fitting a linear model; (iii) and (iv) PP-Logistic and PP-RF: Two PS-based power prior approaches using logistic regression and random forest (Breiman 2001), respectively; (v) and (vi) CL-Logistic and CL-RF: Two composite likelihood based approaches with logistic and random forest classifier based PSs, respectively; and finally, (vii) Matching: A distance based bipartite matching method designed to match treatment and control groups in observational studies (Hansen and Klopfer 2006) and subsequently using a linear model for detecting treatment effects as implemented in the optmatch R package.
6.2. Equivalence of Populations
In preparation for inference under the two-step IS approach, we generate equivalent populations using the density-free importance resampling scheme discussed in Section 3.2 based on the fitted CAM model. To formally test for equivalence of the adjusted datasets, we implement Step 5 in Algorithm 1. We first merge the datasets and then try to classify patients in the merged sample as originally RWD or single-arm treatment cohort ( vs. in our earlier notation). For classification, we use BART and report the boxplots of the area under the receiver operating characteristic curve (AUC) of the classification accuracy across the independent experiments for all simulation settings in Figure 3. For comparison, we also subsample randomly (instead of using the IS weights) and report the AUCs in the same figure. We refer to the two sampling strategies as IS and Random, respectively.
Figure 3.

Boxplots of the area under the receiver operating characteristic curve (AUC) of the classification accuracy for a merged dataset consisting of and the subsampled using BART, and trying to classify into originally versus . Two subsampling schemes are used—the importance sampling (IS) strategy in Section 3.2 and simple random resampling. Here AUCs close to 0.5 imply near equivalence between the populations. Panel (a) shows the AUCs in the CAM and MIX scenarios across different sample sizes and number of covariates; and (b) shows the AUCs under the Interaction and Oracle scenarios.
In Figure 3(a), the Random resampling strategy yields high AUC, indicating that the two populations are substantially different and adjustment in the RWD population is necessary before using it as synthetic control. For both, the CAM and MIX scenarios, the performance of the IS scheme improves with increasing sample size. This is expected as for small sample sizes is lacking enough data to produce a subsample equivalent to . AUC values close to 1 under the CAM scenario imply that the true populations are indeed very different in this case. In contrast, the AUC values close to 0.5 under the IS scheme indicate near equivalence after adjustment. In both scenarios, AUC is substantially reduced under the IS resampling scheme, implying that the proposed CAM model indeed adjusts for the lack of randomization.
Results under the last two scenarios are shown in Figure 3(b). Recall that in both scenarios the simulation truth is not based on the CAM model. Still, the fit under the proposed CAM model achieves near perfect adjustment as shown in the figure.
6.3. Inference on Treatment Effects
In each simulation setup, we test versus at 5% level of significance. We elaborate the testing procedure in Section S.9.1 of the supplementary materials. We report power in Figure 4, with detailed numerical results appearing in Tables S.1–S.3 in the supplementary materials. Under the PS-based approaches, the power remains below 15% across all scenarios (not shown in the figure). Fully model-based nonparametric CA-PPMx has higher power than IS-LM and Matching when the true response models are nonlinear. In contrast, the IS-LM and Matching perform comparably and have higher power than the CA-PPMx approach in Interaction and Oracle scenarios where the true response model is linear, but are susceptible to model misspecification as reflected in the CAM and MIX scenarios. This is because IS-LM and Matching assume a linear model for the outcome, which happens to match the simulation truth in the Interaction and Oracle scenarios. Except under the PS-based approaches, power increases with increasing sample size, indicating that PS-based methods may require a much larger population size in the RWD to adjust for the lack of randomization.
Figure 4.

Power of detecting treatment effects in different simulation setups for a 5% level of significance test: Seven methods are used to estimate the effects where IS-LM and CA-PPMx are based on the proposed CAM model. Panel (a) corresponds to the CAM (top) and MIX (bottom) scenarios. Panel (b) shows results under the Interaction (left side) and the Oracle (right side) scenarios.
7. Application in Glioblastoma
We return to the motivating case study of creating a synthetic control for a hypothetical upcoming single-arm GBM trial. The sample size of the trial is , similar to past trials (Vanderbeek et al. 2018). The endpoint of interest is overall survival (OS). We evaluate the operating characteristics of the proposed design by simulating trial replicates. See (Berry et al. 2010, sec. 2.5.4) for a discussion of the role of frequentist operating characteristics in Bayesian inference. To create treatment arm data, we first select covariates by randomly selecting patients from the historical database. To generate a realistic nonequivalent patient population, we select not uniformly but using a logistic regression on the covariates (as described in the Interaction scenario in Section 6). The treatment effect is quantified by the hazard ratio (HR) between the treatment arm and the (synthetic) control arm, with the null and alternative hypotheses : HR = 1 versus : HR ≤ 0.6 at 50 weeks. The HR of 0.6 was suggested by clinical collaborators as a meaningful clinical target.
We show results under two alternative scenarios (a) : no treatment effect (i.e., HR = 1), created by keeping the OS for the patients in the treatment arm as originally observed in the historical database (since the patients received treatments with similar efficacy); and (b) : there is a clinically meaningful treatment effect. We created by increasing the OS of patients in the treatment arm with an increment that would correspond to a HR of 0.6 under an exponential model.
We apply three methods to make inference on the treatment effect: (i) IS-based two-step procedure: Here we first create equivalent patient populations using Algorithm 1 and then proceed with inference on the treatment effect as if patients were randomly assigned to treatment and control; (ii) Matching-based two-step procedure: Operationally similar to (i) but now the Matching method discussed in Section 6 is used to create equivalent patient populations; and (iii) Model-based inference: The extension of the CAM model to include the outcomes , as described in Section 4.2.
7.1. IS-based Two-Step Procedure
In preparation for inference, we start with a test for equivalence of the subsampled population in each of the repeat simulations. Figure 5 plots the relative frequencies for each covariate in the treatment arm (red) and in the synthetic control arm constructed from the RWD using: (a) the IS sampling following Algorithm 1 (green) and (b) random sampling (blue). Very different frequencies in the two arms under random resampling indicate significant differences in the covariate distributions between the treatment and the control arms. For most covariates, the differences are however greatly reduced by the IS scheme.
Figure 5.

Covariate distributions before and afte adjustments. The red bars show the distributions of the covariates in the treatment arm. The green and blue bars show the distributions of the covariates in the synthetic control arms formed using the IS and random resampling schemes, respectively.
Once we establish equivalence of the patient populations, we proceed with inference for the treatment effect. We use a Cox proportional hazard (PH) model (Cox 1972) and the logrank test (Peto and Peto 1972) to compare the survival functions. The top panel of Figure 6 shows inference summaries over the repetitions. The figure shows the histograms of -values under (blue) and (red). Under -values are almost uniformly spread out over [0, 1]. In contrast, under , the histogram of -values over repeat simulations is peaked close to zero.
Figure 6.

Inference under treatment effects under the two-step procedures: Panel (a) shows histograms of the -values corresponding to a logrank test under the Cox PH model comparing the survival curves between the treatment arms; Panel (b) shows the Kaplan-Meier curves and pointwise confidence intervals for treatment (blue) and control (red) arms under scenarios (left) and , respectively. The top and bottom panels of (a) and (b) show the results corresponding to the IS and Matching based approaches, respectively.
Finally, we identify representative simulations from the repetitions under each of the two scenarios by finding the instance with -value closest to the median of the respective histograms. For these two representatives, we show Kaplan-Meier (KM) survival curves in the top panels of Figure 7(b), respectively. We observe that the survival curves in the two arms are quite alike with wide confidence intervals under the scenario, whereas significant improvements in the survival times can be observed for the treatment arm for the first 80 weeks under the scenario.
Figure 7.

Inference under treatment effects under the model-based approach: Panel (a) shows quantile-quantile plots to assess model fit from Section S.5 in the supplementary materials; the left plot of panel (b) shows posterior probabilities under repeat simulations, and on the right posterior estimated hazard ratios for OS with pointwise 95% credible regions are shown under and .
7.2. Matching-based Two-Step Procedure
We use the Matching procedure to create a synthetic control and then follow the same routine of (i) for inference on treatment effects. The results are provided in the bottom panels of Figures 6(a) and (b). The distribution of the -values under the scenario is less peaked around 0 compared to the IS-based procedure. This is also reflected in the representative KM plot under in having a much wider confidence interval around the survival curve possibly indicating the IS-based approach is doing better than Matching in creating equivalent populations.
7.3. Model-Based Inference
As it is not straightforward to account for the uncertainty in creating the synthetic control in the aforementioned two-step procedures, we consider a fully model-based approach. For inference on treatment effects, we first assess goodness of fit of the CA-PPMx model (see Section S.5 in the supplementary materials for details). Quantile-quantile plots for the two scenarios are shown in Figure 7(a). Near diagonal lines indicate no evidence for a lack of fit. We then evaluate the posterior probability (with indexing the repeat simulations) at weeks under the proposed model. The left panel of Figure 7(b) shows histograms of under (in blue) and under (in red). As desired, the posterior probabilities are clustered near 0 under , but are peaked near 1 under .
Finally, we identify a representative simulation again by selecting the repeat simulation with posterior probability closest to the median of the respective histograms under each of the two scenarios. For each of the two scenarios, we plot the posterior estimated hazard ratios (blue and red for simulation under and , respectively), together with pointwise 95% posterior credible intervals in the right panel of Figure 7(b). Under (blue), HR is almost equal to 1 with wide credible intervals, whereas under (red), HR is significantly below 1 with high posterior probability. The median (over the simulations) posterior probabilities are 0.08 and 0.98 under and , respectively.
8. Discussion
With a long term goal of setting up a platform for future single-arm early-phase clinical trials in GBM, where new patients only receive experimental therapies, in this article we developed a Bayesian nonparametric approach for creating synthetic controls from RWD. We introduced a Bayesian CAM model that clusters covariates with similar values across different treatment arms.
The flexibility of the CAM model makes it easily generalizable to other problems, for example, to create two synthetic treatment arms to compare two treatments based on RWD from electronic health records.
Another direction for extensions could build on extracting propensity scores as inference summaries under the CA-PPMx model. This is briefly discussed in Section S.6 of the supplementary materials.
A limitation of the current model is scalability to high-dimensional covariates. In the GBM application, we rely on 11 clinically important categorical covariates that are commonly considered as prognostic factors in GBM treatments. However, in many applications candidate covariates can be high-dimensional. Implicit in the current construction is the assumption that the recorded covariates are clinically relevant for the disease or condition under consideration, and the approach may not be appropriate when large numbers of unscreened candidate covariates are used. Recent advances in Bayesian model-based clustering by Chandra, Canale, and Dunson (2023) could be useful to construct high-dimensional generalizations.
Supplementary Material
Supplementary materials include additional discussion of the motivating dataset, a brief review on the PPMx, detailed discussion of the graphical goodness-of-fit test for the regression model, an alternative interpretation of our model-based inference approach, choices of hyperparameters, details of the posterior simulation scheme, additional simulation studies and associated details, and MCMC convergence diagnostics. C++ and R programs implementing the methods developed in this article and R Markdown files with instructions are provided in a separately attached Codes.zip folder.
Acknowledgments
We thank the Editor, Dr. Michael Stein, an anonymous Associate Editor, and two anonymous referees for comments that led to significant improvements in the clarity and presentation of the paper.
Footnotes
Disclosure Statement
No potential conflict of interest was reported by the author(s).
References
- Aldape K, Brindle KM, Chesler L, Chopra R, Gajjar A, and Gilbert MR (2019), “Challenges to Curing Primary Brain Tumours,” Nature Reviews Clinical Oncology, 16, 509–520. [Google Scholar]
- Alexander BM, Trippa L, Gaffey S, Arrillaga-Romany IC, Lee EQ, Rinne ML, et al. (2019), “Individualized Screening Trial of Innovative Glioblastoma Therapy (INSIGhT): A Bayesian Adaptive Platform Trial to Develop Precision Medicines for Patients with Glioblastoma,” JCO Precision Oncology, 3, 1–13. [Google Scholar]
- Ascolani F, Lijoi A, Rebaudo G, and Zanella G (2022), “Clustering Consistency with Dirichlet Process Mixtures,” Biometrika (to appear). [Google Scholar]
- Au S, and Beck J (2003), “Important Sampling in High Dimensions,” Structural Safety, 25, 139–163. [Google Scholar]
- Berry SM, Carlin BP, Lee JJ, and Müller P (2010), Bayesian Adaptive Methods for Clinical Trials, Boca Raton, FL: CRC Press. [Google Scholar]
- Breiman L (2001), “Random Forests,” Machine Learning, 45, 5–32. [Google Scholar]
- Burcu M, Dreyer NA, Franklin JM, Blum MD, Critchlow CW, Perfetto EM, and Zhou W (2020), “Real-World Evidence to Support Regulatory Decision-Making for Medicines: Considerations for External Control Arms,” Pharmacoepidemiology and Drug Safety, 29, 1228–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camerlenghi F, Dunson DB, Lijoi A, Prünster I, and Rodríguez A (2019), “Latent Nested Nonparametric Priors,” (with discussion), Bayesian Analysis, 14, 1303–1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandra NK, Canale A, and Dunson DB (2023), “Escaping the Curse of Dimensionality in Bayesian Model-based Clustering,” Journal of Machine Learning Research, 24, 1–42. [Google Scholar]
- Chen M-H, and Ibrahim JG (2000), “Power Prior Distributions for Regression Models,” Statistical Science, 15, 46–60. [Google Scholar]
- Chen W-C, Wang C, Li H, Lu N, Tiwari R, Xu Y, and Yue LQ (2020), “Propensity Score-Integrated Composite Likelihood Approach for Augmenting the Control Arm of a Randomized Controlled Trial by Incorporating Real-World Data,” Journal of Biopharmaceutical Statistics, 30, 508–520. [DOI] [PubMed] [Google Scholar]
- Chipman HA, George EI, and McCulloch RE (2010), “BART: Bayesian Additive Regression Trees,” Annals of Applied Statistics, 4, 266–298. [Google Scholar]
- Choi J, Dekkers OM, and le Cessie S (2019), “A Comparison of Different Methods to Handle Missing Data in the Context of Propensity Score Analysis,” European Journal of Epidemiology, 34, 23–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox DR (1972), “Regression Models and Life-Tables,” Journal of the Royal Statistical Society, Series B, 34, 187–220. [Google Scholar]
- Dahl DB (2006), Model-based Clustering for Expression Data via a Dirichlet Process Mixture Model, pp. 201–218, Cambridge: Cambridge University Press. [Google Scholar]
- Davi R, Mahendraratnam N, Chatterjee A, Jill Dawson C, and Sherman R (2020), “Informing Single-Arm Clinical Trials with External Controls,” Nature Reviews Drug Discovery, 19, 821–822. [Google Scholar]
- Denti F, Camerlenghi F, Guindani M, and Mira A (2021), “A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data,” Journal of the American Statistical Association, 118, 405–416, DOI: 10.1080/01621459.2021.1933499 [DOI] [PMC free article] [PubMed] [Google Scholar]
- FDA (2020), “Rare Diseases at FDA,” available at https://www.fda.gov/patients/rare-diseases-fda. Accessed on 7th December 2021.
- — (2021), “Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products, Guidance for Industry. Available at https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adjusting-covariates-randomized-clinical-trials-drugs-and-biological-products.
- Ferguson TS (1973), “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, 1, 209–230. [Google Scholar]
- Fisher JP, and Adamson DC (2021), “Current FDA-approved Therapies for High-Grade Malignant Gliomas,” Biomedicines, 9, 324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franklin JM, Glynn RJ, Martin D, and Schneeweiss S (2019), “Evaluating the Use of Nonrandomized Real-World Data Analyses for Regulatory Decision Making,” Clinical Pharmacology & Therapeutics, 105, 867–877. [DOI] [PubMed] [Google Scholar]
- Ghosal S, and van der Vaart A (2017), Fundamentals of Nonparametric Bayesian Inference. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. [Google Scholar]
- Ghosal S, Ghosh JK, and Ramamoorthi RV (1999), “Posterior Consistency of Dirichlet Mixtures in Density Estimation,” Annals of Statistics, 27, 143–158. [Google Scholar]
- Green PJ (1995), “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, 82, 711–732. [Google Scholar]
- Grossman SA, and Ellsworth SG (2016), “Published Glioblastoma Clinical Trials from 1980 to 2013: Lessons from the Past and for the Future,” Journal of Clinical Oncology, 34, e13522–e13522. [Google Scholar]
- Grossman SA, Schreck KC, Ballman K, and Alexander B (2017), “Point/counterpoint: Randomized versus Single-Arm Phase II Clinical Trials for Patients with Newly Diagnosed Glioblastoma,” Neuro-Oncology, 19, 469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haase S, Garcia-Fabiani MB, Carney S, Altshuler D, Núñez FJ, et al. (2018), “Mutant ATRX: Uncovering a New Therapeutic Target for Glioma,” Expert Opinion on Therapeutic Targets, 22, 599–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen BB, and Klopfer SO (2006), “Optimal Full Matching and Related Designs via Network Flows,” Journal of Computational and Graphical Statistics, 15, 609–627. [Google Scholar]
- Hasegawa T, Claggett B, Tian L, Solomon SD, Pfeffer MA, and Wei L-J (2017), “The Myth of Making Inferences for an Overall Treatment Efficacy with Data from Multiple Comparative Studies via Meta-Analysis,” Statistics in Biosciences, 9, 284–297. [Google Scholar]
- Hobbs BP, Carlin BP, Mandrekar SJ, and Sargent DJ (2011), “Hierarchical Commensurate and Power Prior Models for Adaptive Incorporation of Historical Information in Clinical Trials,” Biometrics, 67, 1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang L, Nie L, and Yuan Y (2023), “Elastic Priors to Dynamically Borrow Information from Historical Data in Clinical Trials,” Biometrics, 79, 49–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson VE (2007), “Bayesian Model Assessment Using Pivotal Quantities,” Bayesian Analysis, 2, 719–733. [Google Scholar]
- King G, and Nielsen R (2019), “Why Propensity Scores Should Not Be Used for Matching,” Political Analysis, 27, 435–454. [Google Scholar]
- Li X, and Song Y (2020), “Target Population Statistical Inference with Data Integration Across Multiple Sources-An Approach to Mitigate Information Shortage in Rare Disease Clinical Trials,” Statistics in Biopharmaceutical Research, 12, 322–333. [Google Scholar]
- Linero AR, and Daniels MJ (2018), “Bayesian Approaches for Missing not at Random Outcome Data: The Role of Identifying Restrictions,” Statistical Science, 33, 198–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu N, Wang C, Chen W-C, Li H, Song C, Tiwari R, Xu Y, and Yue LQ (2022), “Leverage Multiple Real-World Data Sources in Single-Arm Medical Device Clinical Studies,” Journal of Biopharmaceutical Statistics, 32, 107–123. [DOI] [PubMed] [Google Scholar]
- Mandel JJ, Yust-Katz S, Patel AJ, Cachia D, Liu D, Park M, et al. (2017), “Inability of Positive Phase II Clinical Trials of Investigational Treatments to Subsequently Predict Positive Phase III Clinical Trials in Glioblastoma,” Neuro-Oncology, 20, 113–122. [Google Scholar]
- Meloun M, and Militký J (2011), “The Exploratory and Confirmatory Analysis of Univariate Data,” in Statistical Data Analysis, pp. 25–71, New Delhi: Woodhead Publishing India. [Google Scholar]
- Miller JW, and Harrison MT (2013), “A Simple Example of Dirichlet Process Mixture Inconsistency for the Number of Components,” in Advances in Neural Information Processing Systems (Vol. 26). [Google Scholar]
- Müller P, Quintana F, and Rosner GL (2011), “A Product Partition Model with Regression on Covariates,” Journal of Computational and Graphical Statistics, 20, 260–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller P, Chandra NK, and Sarkar A (2023), “Bayesian Approaches to Include Real-World Data in Clinical Studies,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381, 20220158. [Google Scholar]
- Nam JY, and de Groot JF (2017), “Treatment of Glioblastoma,” Journal of Oncology Practice, 13, 629–638. [DOI] [PubMed] [Google Scholar]
- Nichol A, Bailey M, and Cooper D (2010), “Challenging Issues in Randomised Controlled Trials,” Injury, 41, S20–S23. [DOI] [PubMed] [Google Scholar]
- Ostrom QT, Gittleman H, Liao P, Vecchione-Koval T, Wolinsky Y, Kruchko C, and Barnholtz-Sloan JS (2016), “CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2009–2013,” Neuro-Oncology, 18, v1–v75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Page GL, Quintana FA, and Müller P (2022), “Clustering and Prediction with Variable Dimension Covariates,” Journal of Computational and Graphical Statistics, 31, 466–476. [Google Scholar]
- Patel B, and Kim AH (2020), “Laser Interstitial Thermal Therapy,” Missouri Medicine, 117, 50–55. [PMC free article] [PubMed] [Google Scholar]
- Peto R, and Peto J (1972), “Asymptotically Efficient Rank Invariant Test Procedures,” Journal of the Royal Statistical Society, Series A, 135, 185–207. [Google Scholar]
- Prevost TC, Abrams KR, and Jones DR (2000), “Hierarchical Models in Generalized Synthesis of Evidence: An Example based on Studies of Breast Cancer Screening,” Statistics in Medicine, 19, 3359–3376. [DOI] [PubMed] [Google Scholar]
- Rodríguez A, Dunson DB, and Gelfand AE (2008), “The Nested Dirichlet Process,” Journal of the American Statistical Association, 103, 1131–1154. [Google Scholar]
- Rosenbaum PR, and Rubin DB (1983), “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, 70, 41–55. [Google Scholar]
- Schmidli H, Häring DA, Thomas M, Cassidy A, Weber S, and Bretz F (2020), “Beyond Randomized Clinical Trials: Use of External Controls,” Clinical Pharmacology & Therapeutics, 107, 806–816. [DOI] [PubMed] [Google Scholar]
- Sethuraman J (1994), “A Constructive Definition of Dirichlet Priors,” Statistica Sinica, 4, 639–650. [Google Scholar]
- Skare O, Bølviken E, and Holden L (2003), “Improved Sampling-Importance Resampling and Reduced Bias Importance Sampling,” Scandinavian Journal of Statistics, 30, 719–737. [Google Scholar]
- Stuart EA (2010), “Matching Methods for Causal Inference: A Review and a Look Forward,” Statistical Science, 25, 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutton AJ, and Abrams KR (2001), “Bayesian Methods in Meta-Analysis and Evidence Synthesis,” Statistical Methods in Medical Research, 10, 277–303. [DOI] [PubMed] [Google Scholar]
- Teh YW, Jordan MI, Beal MJ, and Blei DM (2006), “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581. [Google Scholar]
- Vanderbeek AM, Rahman R, Fell G, Ventz S, Chen T, Redd R, Parmigiani G, Cloughesy TF, Wen PY, Trippa L, and Alexander BM (2018), “The Clinical Trials Landscape for Glioblastoma: Is It Adequate to Develop New Treatments?,” Neuro-Oncology, 20, 1034–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vansteelandt S, and Daniel R (2014), “On Regression Adjustment for the Propensity Score,” Statistics in Medicine, 33, 4053–4072. [DOI] [PubMed] [Google Scholar]
- Wang C, and Rosner GL (2019), “A Bayesian Nonparametric Causal Inference Model for Synthesizing Randomized Clinical Trial and Real-World Evidence,” Statistics in Medicine, 38, 2573–2588. [DOI] [PubMed] [Google Scholar]
- Wang C, Li H, Chen W-C, Lu N, Tiwari R, Xu Y, and Yue LQ (2019), “Propensity Score-Integrated Power Prior Approach for Incorporating Real-World Evidence in Single-Arm Clinical Studies,” Journal of Biopharmaceutical Statistics, 29, 731–748. [DOI] [PubMed] [Google Scholar]
- Wang C, Lu N, Chen W-C, Li H, Tiwari R, Xu Y, and Yue LQ (2020), “Propensity Score-Integrated Composite Likelihood Approach for Incorporating Real-World Evidence in Single-Arm Clinical Studies,” Journal of Biopharmaceutical Statistics, 30, 495–507. [DOI] [PubMed] [Google Scholar]
- Zhao Z (2004), “Using Matching to Estimate Treatment Effects: Data Requirements, Matching Metrics, and Monte Carlo Evidence,” The Review of Economics and Statistics, 86, 91–107. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
