Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 7.
Published in final edited form as: J Am Stat Assoc. 2023 Jul 26;118(544):2301–2314. doi: 10.1080/01621459.2023.2231581

Bayesian Nonparametric Common Atoms Regression for Generating Synthetic Controls in Clinical Trials

Noirrit Kiran Chandra a, Abhra Sarkar b, John F de Groot c, Ying Yuan d, Peter Müller b,e
PMCID: PMC12965752  NIHMSID: NIHMS2151389  PMID: 41799294

Abstract

The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this article, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture model that allows us to find equivalent population strata in the EHR and the treatment arm and then resample the EHR data to create equivalent patient populations under both the single arm trial and the resampled EHR. Resampling is implemented via a density-free importance sampling scheme. Using the synthetic control arm, inference for the treatment effect can then be carried out using any method available for RCTs. Alternatively the proposed nonparametric Bayesian model allows straightforward model-based inference. In simulation experiments, the proposed method exhibits higher power than alternative methods in detecting treatment effects, specifically for nonlinear response functions. We apply the method to supplement single arm treatment-only glioblastoma studies with a synthetic control arm based on historical trials. Supplementary materials for this article are available online.

Keywords: Common atoms mixture, Glioblastoma, Importance sampling, Mixtures, Real world data, Single-arm trials

1. Introduction

We introduce a novel Bayesian nonparametric regression model to construct synthetic control arms from external real world data (RWD) to supplement single arm treatment-only trials. The use of common atoms across multiple random probability measures is a critical feature of the proposed construction. Models with similar features have been used before in the literature, including Denti et al. (2021), Camerlenghi et al. (2019), Rodríguez, Dunson, and Gelfand (2008), and Teh et al. (2006).

Randomized controlled trials (RCT) are the gold standard in evidence-based evaluation of new treatments. RCTs are, however, increasingly associated with bottlenecks involving volunteer recruitment, patient truancy and adverse events (Nichol, Bailey, and Cooper 2010) and hence are often very time consuming, expensive and laborious. This is of particular concern for rare diseases, such as glioblastoma (GBM). With digitization of health records and other advances in medical informatics, new data sources are becoming available that can supplement RCTs. For example, relevant information on a control treatment is often available from completed RCTs, electronic health record data, insurance claims data or patient registries from hospitals (Franklin et al. 2019). Such external data, also referred to as RWD, can augment or substitute the control group in the target clinical trial (Davi et al. 2020). This has led researchers to consider the creation of synthetic control arms from RWD (see Schmidli et al. 2020 for a review). However, the heterogeneity of RWD prohibits the direct use of patient level data as a control arm, lest differences with the actual treatment population with respect to patient profiles bias inference on treatment effects (Burcu et al. 2020). Many existing methods adjust for the lack of randomization in treatment assignments by correcting the bias in the response model and hence can be sensitive to the specification of the treatment assignment as well as the response model as we discuss below. In this article, we take a fundamentally different approach by resampling the RWD to construct a cohort equivalent to the treatment arm in terms of their covariate profiles which can then serve as the (synthetic) control arm.

There is a fast growing literature on the problem of incorporating RWD in clinical trials. Traditional meta-analytic approaches aim to combine information across studies to construct comparisons of treatments (Sutton and Abrams 2001). Power prior (Prevost, Abrams, and Jones 2000; Chen and Ibrahim 2000), commensurate prior (Hobbs et al. 2011) and elastic prior (Jiang, Nie, and Yuan 2023) constructions try to incorporate information from historical data by way of informative prior models. However, these approaches may be inadequate when the RWD population is considerably more heterogeneous than the experimental arm; see Müller, Chandra, and Sarkar (2023) for a review.

Many methods to incorporate RWD in trial design and data analysis are based on propensity scores (PSs), defined as the conditional probabilities of treatment assignment given covariates. In the context of incorporating external data, investigators often use PSs for a patient being selected into the current trial versus the external data; in case of supplementing a single arm treatment-only trial, the PSs are identical to treatment assignments. Rosenbaum and Rubin (1983) showed that an unbiased estimate of the average treatment effect can be obtained by PS adjustments. Most PS-based methods can be broadly classified to be based on matching, stratification, weighting, or regression. Matching is used to achieve covariate balance across different arms. However, matching PSs do not generally imply matching covariates (King and Nielsen 2019). Stratification splits the data into strata with respect to PSs and calculates an average treatment effect as a weighted average of within-stratum estimates (Wang et al. 2019; Chen et al. 2020; Lu et al. 2022). PS-stratification may be sensitive to the definition of the strata and weight-based estimators may be sensitive to the misspecification of the PS model (Zhao 2004). Regression adjustments, that use the PS as a regressor for the outcome, address these issues (Rosenbaum and Rubin 1983) but the estimates may again be biased if the regression model is misspecified (Vansteelandt and Daniel 2014). Bayesian nonparametric models that avoid a particular parametric family or structure, such as linearity, of the regression relationship have thus also been proposed (Wang and Rosner 2019). Nevertheless, consolidated unidimensional PSs can be inadequate in matching multivariate covariates from multiple studies (Stuart 2010; King and Nielsen 2019). Additionally, these methods often do not efficiently use all available data by dropping unmatched data. Finally, some other methods (Hasegawa et al. 2017; Li and Song 2020), although not specifically designed to create synthetic controls, also integrate multiple studies using the covariate distributions.

In this article, we develop an alternative approach based on Bayesian nonparametric (BNP) mixture models. Mixture models imply a random partition of experimental units linked to different atoms in the mixture (Dahl 2006). We exploit this property to propose a BNP common atoms mixture (CAM) model to introduce matched clusters of patients in a treatment-only trial dataset and a (typically much larger) RWD. We show how such matched clusters allow a density free importance resampling scheme to generate a subpopulation of the RWD such that the distribution of covariates in the subpopulation can be considered to be equivalent to the single-arm trial. That is, the patients in a matching RWD cluster can be considered digital clones of patients in a matching cluster in the single-arm trial.

The proposed CAM model allows, among other things, the following two alternatives for inference on treatment effects. Having established equivalent patient populations, inference can in principle proceed as if treatment had been assigned at random, using inference for RCTs. Alternatively, we propose model-based inference using an extension of the CAM model with a sampling model for the outcome. While both alternatives are based on the same underlying CAM model, we prefer the model-based inference on treatment effect as a more explicit and principled approach.

The proposed CAM model builds on related BNP models in the literature, including the hierarchical Dirichlet process (DP) (Teh et al. 2006) which allows for information sharing across multiple groups through common atoms, the nested DP (Rodríguez, Dunson, and Gelfand 2008) which can identify distributional clusters, and Camerlenghi et al. (2019) who proposed a latent mixture of shared and idiosyncratic processes across the sub-models. Denti et al. (2021) proposed a CAM model for the analysis of nested datasets where the distributions of the units differ only over a small fraction of the observations sampled from each unit. In contrast to these constructions, the CAM model proposed here introduces more structure as needed in our application by setting up two nonparametric Bayesian mixture models with shared atoms and constraints on the implied clusters.

The rest of this article is organized as follows. Section 2 describes the glioblastoma study that motivated this work. Section 3.1 introduces the proposed common atoms mixture model on the covariates and how it can handle variable dimensional covariates of different data types; Section 3.2 introduces a novel density-free importance resampling scheme to achieve equivalent populations; and Section 3.3 discusses the general common atoms regression model, a flexible mixture of lognormals for censored survival outcomes and an easy to use graphical tool for model validation. In Section 4, we discuss two alternative strategies for inference on treatment effects. Section 6 presents simulation studies. Section 7 shows results for the motivating GBM data. Section 8 concludes with final remarks. Below, in Table 1, we list the many acronyms used in the article for easy reference.

Table 1.

List of acronyms.

Acronym Full forms Acronym Full forms
AUC Area under the receiver operating characteristic curve DP Dirichlet process
BART Bayesian additive regression tree GBM Glioblastoma
BNP Bayesian nonparametric IS Importance sampling
CAM Common atoms mixture PPMx Product partition model with regression on covariates
CA-PPMx Common atoms PPMx PS Propensity score
RCT Randomized controlled trial RWD Real-world data

2. Motivating Application in Glioblastoma

Our motivating application arises from a GBM data science project at MD Anderson Cancer Center. GBM is a devastating disease with the average life expectancy of less than 12 months in the general population (Ostrom et al. 2016). Despite decades of intensive clinical research, the progress in developing an effective treatment for GBM lags behind that of other cancers (Aldape et al. 2019). In the last 30 years, only two drugs (carmustine wafers and temozolomide) have been approved by the Federal Drug Administration (FDA) for patients with newly diagnosed GBM (Fisher and Adamson 2021). These drugs extend median survival by less than three months and neither offers a potential for cure. One major cause of the high failure rate of the drug development for GBM is suboptimal design of phase II trials, in particular, the lack of a control arm in many studies (Grossman et al. 2017). A review of phase I/II GBM trials from 1980 to 2013 found that only 20 (5%) were randomized compared to 365 (95%) single-arm trials (Grossman and Ellsworth 2016). Reasons for the dominance of single-arm trials include the small number of GBM patients available for clinical trials and investigator’s desire to speed up drug development and reduce trial costs. GBM is a rare disease by the definition of the Orphan Drug Act (FDA 2020). Unfortunately, the high heterogeneity of GBM patients makes single-arm trials highly susceptible to bias, contributing to the fact that almost all phase II trials showing promising treatment effects failed in phase III RCTs (Mandel et al. 2017). The objective of the GBM data science project is to address this pressing issue by leveraging historical data collected at the MD Anderson Cancer Center. The overarching goal is to develop a platform for future single-arm clinical trials in GBM, with synthetic controls constructed from the historical database to enhance the evaluation and screening of new drugs. Working toward this goal, we describe here a method to create synthetic controls, as the engine of the platform, for future trials.

We work with a database that comprises records from 339 highly clinically and molecularly annotated GBM patients treated at MD Anderson over more than 10 years. Once the system is set, the database is expected to be continuously updated with new patient data collected at MD Anderson Cancer Center and potentially also be combined with the data from other institutions.

After discarding variables with minimal variability across patients and relying on clinical judgment, we identified 11 clinically important categorical covariates. These covariates are commonly considered as prognostic factors in GBM treatments (Nam and de Groot 2017; Alexander et al. 2019) and are briefly described in Table 2.

Table 2.

Description of the covariates in the GBM data.

Covariate Description
Age Dichotomized at 55 years
KPS Karnofsky performance score, categorized into three classes: “≤ 60”, “(60,80]” and “> 80”
RT Dose Radiation therapy dose: dichotomized at 50 Gray
SOC Received standard-of-care (concurrent radiation therapy and temozolomide): Yes/No
CT Participation in a therapeutic trial: Yes/No
MGMT Status of MGMT (O6-methylguanine-DNA methyltransferase) gene: methylated (M), unmethylated (UM) or uninterpretable (UI)
ATRX Loss of the ATRX chromatin remodeler gene: Yes/No
Gender Gender
EOR Extent of tumor resection: “total”, “subtotal” or “laser interstitial thermal therapy” (LITT, Patel and Kim 2020)
Histologic grade Grade of astrocytoma: IV (GBM) (most cases), or I–III (low-grade or anaplastic) (few)
Surgery reason “therapeutic” or “other” (relapse)

Figure 1 shows the categorical covariates in the historical database and a future treatment-only study which we elaborate in Section 7. Figure S.1 in the supplementary materials highlights the lack of randomization in the two populations.

Figure 1.

Figure 1.

Glioblastoma dataset of 11 baseline categorical covariates with missing entries in the two treatment arms. The left block shows the historical patients. The (smaller) right block shows a hypothetical future trial.

3. Common Atoms Mixture Model

We first introduce a model for matching patients with respect to their covariate profiles across different treatment arms and then an extension of the model to also include outcomes. Later we will introduce two alternative methods for inference on treatment effects that build on this model.

3.1. Common Atoms Mixture Model on the Covariates

Suppose we have S datasets Xs,i,Ys,i,s=1,2,,S, comprising p-dimensional covariate vectors Xs,i=Xs,i,1,,Xs,i,pT and corresponding responses Ys,i associated with patients i=1,,ns. In this article, we assume the responses to be univariate. Let s=1 refer to the arm for the (new) experimental therapy, and s=2,,S denote the RWD datasets. Focusing on the motivating GBM application, we elaborate the model for S=2 with a single RWD set. When we have multiple historical datasets, that is, when S>2, we would simply merge them and consider the merged dataset to be a single RWD with increased heterogeneity as illustrated in Section S.9.6 of the supplementary materials. For a valid evaluation of treatment effects, it is then important to verify equivalent patient populations, that is, matching the distributions of Xs,i under s=1 versus s=2, or to otherwise adjust for any detected differences (Burcu et al. 2020). As the RWD population can be from a variety of sources, such data are typically more heterogeneous than the patient population in the ongoing trial. We develop a novel BNP CAM model with this specific feature to model the two distributions. The proposed CAM model gives rise to a random partition of similar X1,i and a matching partition of X2,i. Clusters under the latter partition can be considered digital clones of the matching clusters of the earlier partition.

We first construct the model for covariates X2,i in the RWD. Let ζ~=ζ~jj=1 and π2=π2,jj=1 denote cluster-specific parameters and weights, respectively. We let

X2,i|ζ˜,π2~iidj=1π2,jq(X2,iζ˜j),F2(X2,iπ2,ζ˜)ζ˜jξ~iidG0(ζ˜jξ),π2~GEM(α2). (1)

Here qζ~j is a suitably chosen kernel with parameter ζ~j,G0(ξ) is a prior distribution for the ζ~j’s, and GEM(α) is a stick-breaking prior on the mixture weights corresponding to a DP with mass parameter α>0 (Sethuraman 1994). Let G=j=1π2,jδζ˜j() denote a discrete probability measure with atoms at the ζ~j’s. An equivalent hierarchical model representation of (1) is

X2,iζi~iidqX2,iζi,ζiG~iidG,Gα2,ξ~DPα2,G0(ξ), (2)

where DPα,G0 is a DP with base measure G0 and concentration parameter α (Ferguson 1973). The discrete nature of the DP random measure G gives rise to possible ties between the ζi’s, which define the desired clusters. For later reference we define notations for these ties and clusters. Let ζ=ζj,j=1,,kn2 denote the distinct values in ζi;i=1,,n2, let c2,i=j if ζi=ζj denote cluster membership indicators defining clusters Cj=i:ζi=ζj. We assume the distribution of X1,i be a mixture with the same kernel q and the same atoms ζ,

X1,iζ~iidj=1kn2π1,jqX1,iζjF1X1,iπ1,ζ,π1~Dirα1kn2,,α1kn2, (3)

where π1=π1,1,,π1,kn2, Dir a1,,ar indicates an r-dimensional Dirichlet distribution with parameters a1,,ar, and α1>0 is a concentration parameter. Note that model (3) is defined conditionally on (1) and ζ such that F1 and F2 share the same set of atoms. Importantly, the construction avoids the imputation of clusters (strata) with only X1,i’s. There is always a corresponding (non-empty) cluster for the X2,i’s from the RWD. This is important for the upcoming constructions. The motivation here is that, owing to the bigger size of the RWD compared to the trial arm, X2 can be expected to exhibit greater heterogeneity than X1 (see, e.g., the right panel in Figure 2).

Figure 2.

Figure 2.

An illustration of the CAM model: In the generative model, there are a total of four atoms ζ~1:4 shared between RWD and the treatment arm (left panel). Despite having positive weight, the atom ζ~3 is not associated with any sample from the RWD (right panel) and hence the density of the treatment arm is also allowed to be supported only on the remaining non-empty clusters ζ~1,ζ~2,ζ~4 of the RWD. The atom ζ~2 is linked with only the RWD (right panel). A cluster for the treatment arm alone is however not permissible.

In summary, we define F1Xπ1,ζ=j=1kn2π1,jq(Xζj and F2Xπ2,ζ~=j=1π2,jqXζ~j, with the prior on atoms and weights as discussed. Figure 2 shows a stylized representation of the generative process of the proposed CAM model. Notice that here atom ζ~3 is not linked with any X2 observation and hence kn2=3. Accordingly, F1π1,ζ is a mixture of three components. Finally, no observation from X1 is linked to ζ~2. The X2,i ‘s linked to ζ~1 and ζ~4 can be regarded as digital clones of the X1,i’s linked to the same atoms.

The described CAM model is different from existing BNP mixture models. In (1)–(3), the atoms linked to X1 are always a subset of those atoms that are linked to X2, which is not naturally the case for the hierarchical DP model (Teh et al. 2006). Also, unlike the nested DP (Rodríguez, Dunson, and Gelfand 2008) and the common atoms nested DP (Denti et al. 2021) models, there is no notion of clustering distributions. That is, pF1Xπ1,ζ=F2Xπ2,ζ~=0 a priori. Instead, the intention here is to cluster similar covariate values across the datasets.

Regarding the concentration parameters αs, we assume logαs~Nμα,σα2 for s=1,2. Ascolani et al. (2022) showed that a hyper-prior on the concentration parameters can solve the problem of inconsistency of DP mixtures (Miller and Harrison 2013).

3.1.1. Handling Mixed Data Types and Missing Values

An appealing feature of the proposed CAM model over existing approaches is the easy use of covariates of different data-types and missing values. Covariates in RCTs often comprise different data-types including continuous, discrete and categorical variables. Missing values are also quite common. For example, in Figure 1, there are a large number of missing values for the ATRX gene which has only recently been identified as a therapeutic target for glioma (Haase et al. 2018) and was therefore not commonly recorded before.

Many existing methods for handling missing data rely on imputation (Choi, Dekkers, and le Cessie 2019), possibly at the expense of an additional layer of prediction errors. Alternatively, data records with missing variables may be dropped altogether, resulting in a reduced sample size.

Assuming missingness completely at random, the proposed CAM model avoids these issues by accommodating variable dimensional covariates in a principled manner by considering a separate univariate kernel for each covariate. Note that a mixture with independent kernels can still accommodate marginal dependence between the covariates (Ghosal and van der Vaart 2017, sec. 7.2.2, pp. 175). Specifically, let 𝒪s,i=j:Xs,i,jisrecorded} denotes the set of observed covariates for patient i in dataset s. We use independent kernels

qXs,iζj=𝒪s,iqXs,i,ζj,,G0ζjξ==1pg0,ζj,ξ, (4)

where qζj, is a univariate kernel corresponding to the th covariate with parameters ζj, and g0,ζj,ξ is a prior on ζj, with hyper-parameters ξ. The likelihood function of Xs,i is then computed on the basis of only the observed values. The kernel q is chosen to accommodate the data-type of the th covariate. The model allows co-clustering of Xs,i with some missing variables and another fully observed Xs,i; see Section S.3 of the supplementary materials for additional details. Missingness patterns other than completely at random can be handled by introducing additional hierarchy in the model, see, for example, Linero and Daniels (2018) for a review.

3.2. Density-Free Importance Resampling of RWD

Building on the fitted CAM for covariates, we propose an importance resampling method to create a subpopulation of X2 that can be considered to be equivalent to X1 (see below for a definition of equivalence that is being used here). Under the assumption of no unmeasured confounders, the X2,i’s in the sampled (or weighted) subpopulation can be assumed to follow the same distribution as X1,i, and be considered digital clones of the X1,i. With such equivalent populations, in principle, any desired method for randomized clinical trials can subsequently be used to carry out inference on treatment effects. Such focus on equivalent populations follows recent recommendations by the FDA (FDA 2021).

Recall that Fs denotes the mixture model for Xs,i,s=1,2, under (1) and (3), respectively. We define equivalent populations as a subset (possibly all) of X2 together with a set of weights such that expectation of any function of interest gX1,i under X1,i~F1 can be evaluated as a (weighted) Monte Carlo average using these X2 (and the weights). Here we assume that all stated expectations exist and that the order of taking expectations and limits can be switched.

Recall that F2π2,ζ~=j=1π2,jqζ~j. Alternatively, the joint model of X2,c2 can be expressed as F2HX2,i,c2,iπ2,ζ~=qX2,iζ~c2,iπ2,c2,i. For easier housekeeping, we assume ζj=ζ~j for j=1,,kn2, that is, the first kn2 atoms are linked with the X2,i’s. Accordingly, we let F1π1,ζ~=j=1kn2π1,jqζ~j using the same first kn2 atoms observed in the X2 population. This is the exact construction of (1) and (3). For an equivalent population, we require weights wi attached to X2,i,c2,i (using wi=0 to drop samples) such that:

EF1π1,ζ~gX1,i=EF2Hπ2,ζ~g^X2,c2withg^X2,c2=i=1n2wigX2,i,c2,i.

The weights wi are functions of c2,i and π1,j as follows. Define n2,j=C2,j, the cardinality of the earlier introduced clusters C2,j. Then 1n2,jiC2,jgX2,i is an unbiased estimator of Eqζ~jg(X) and

g^=jπ1,jiC2,j1n2,jgX2,i=i=1n2π1,c2,in2,c2,igX2,i (5)

is an unbiased estimator of EF1π1,ζ~g(X). We then recognize π1,c2,i/n2,c2,i as the ideal weights. Since we only observe Xs but not π1 and c2, we replace π1,c2,i/n2,c2,i in g^ by a Monte Carlo average under posterior MCMC simulation to get the desired equality simulation-exact (i.e., in the limit as n1,n2 and the number of MCMC simulations increases). Let m=1,,M index the posterior sample and use π1,j(m),n2,j(m), etc. to indicate parameter values in the mth sample. We use

g^=iwigX2,i,wim=1Mπ1,C2,i(m)(m)/n2,c2,i(m)(m), (6)

with wi being the importance sampling weight for X2,i. The X2,i’s can be resampled with these weights to obtain the desired subpopulation with distribution F1xπ1,ζ~ (Skare, Bølviken, and Holden 2003). This resampled subpopulation of X2 can then be regarded as equivalent in distribution to X1. Algorithm 1 summarizes the procedure.

To test the equivalence of the two populations, we use a Bayesian additive regression tree (BART, Chipman, George, and McCulloch 2010) in Step 5 of Algorithm 1. In extensive simulation studies in Section 6, we notice that an AUC (area under the receiver operating characteristic curve) less than 0.6 yields excellent empirical performance. Once equivalence is achieved, in principle any existing approach for inference on treatment effects can be used (see Section 4 and later).

Note that even if the RWD population is not a heterogeneous superset of the current trial, one can still fit the CAM model.

In case the RWD is not comparable, Step 5 of Algorithm 1 can discriminate the two populations and the AUC can quantify the degree of incongruence. In general, importance sampling schemes need the ratio of the target density (in our case, F1) and the importance sampling density (in our case, F2). For our problem, this would require high-dimensional density estimation. Even if the densities were known, importance sampling would be plagued by unbounded weights (Au and Beck 2003). Exploiting the common atoms structure, our proposed scheme however avoids evaluation of the marginal multivariate densities. We therefore refer to this as a density-free importance resampling scheme, and for brevity often simply as an IS scheme. In the denominator of wi, the use of n2,j (which by definition are ≥1) avoids complications arising from unbounded weights. Conventional importance sampling schemes are asymptotically consistent. This is seen to hold in numerical experiments with our algorithm as well. Additional discussions on Algorithm 1 are in Section S.4 of the supplementary materials.

Algorithm 1:

Density-free importance resampling of RWD and validation RWD and validation

1 Input two datasets X1 and X2.
2 Fit the CAM model to the data using MCMC simulation. Let π1(m) and c2(m) be the m th MCMC sample of π1 and c2, respectively, and n2,j(m) be the size of cluster C2,j in the m th MCMC iteration for m=1,,M.
3 Calculate importance sampling weights
wim=1Mπ1,c2,i(m)(m)/n2,c2,i(m)(m),i=1,,n2.
4 Resample a subpopulation of size n1 from X2 with importance resampling weights wi with replacement.
5 Test for equivalence of X1 and the resampled subpopulation of X2 using a supervised classification algorithm (e.g., a BART as described in the text).

3.3. Regression with CAM Model on Covariates

Note that up to here we only concerned ourselves with the covariates, without any reference to the outcomes Y. In preparation for one of the strategies in the upcoming discussion of treatment comparison (Section 4), we now augment the CAM model to include a sampling model for the outcomes. That is, we add a response model on top of the CAM model on covariates.

The extended model defines a regression of Ys,i on covariates Xs,i by first grouping patients with similar covariate profiles into clusters and then adding a cluster-specific sampling model for the outcome Ys,i. That is, the overall model specifies a regression of Ys,i on Xs,i via a random partition. A major advantage of this approach is that it allows a variable-dimension covariate vector—a feature that is not straightforward to include in a regression otherwise. Similar product partition models with regression on covariates (PPMx, see also Section S.2 in the supplementary materials) were considered by Müller, Quintana, and Rosner (2011) and Page, Quintana, and Müller (2022), albeit without any notion of common atoms. We will therefore refer to the model proposed below as the common atoms PPMx (CA-PPMx). Formally, we introduce cluster-specific parameters θs=θs,j;j=1,,kn2, and assume

Ys,iθs,cs,i=j~indhYs,iθs,j, (7)

for a suitable choice of h. For example, for an event-time response, h could be a lognormal, exponential or Weibull model. The response model (7) depends on the covariates indirectly via cs,i’s, that is, the partition induced by the covariates. Within stratum Cj=C1,jC2,j, the response models allows for a treatment comparison based on θ1,j,θ2,j, which can then be averaged with respect to the assumed distribution of Xs,j to define an average treatment effect.

For the implementation in the motivating case study, we let Ys,i denote the log OS (overall survival) times and assume hYs,iθs,j to be a normal kernel with θs,j=μs,j,σs,j2. Such mixtures are highly flexible (Ghosal, Ghosh, and Ramamoorthi 1999), making them an attractive choice for many applications. We complete the model with conjugate normal-inverse-gamma (NIG) priors on the (μs,j,σs,j2)’s. In summary, we have

Ys,ics,i=j,θs~indNμs,j,σs,j2,μs,jσs,j2~indNμ0,σs,j2κ0,σs,j-2~iidGaa0,b0, (8)

where Gaa0,b0 is a gamma distribution with mean a0/b0. We add the hyper-priors μ0~Nmμ,sμ2 and logb0~Nmb,sb2 on the main location-scale controlling hyper-parameters μ0 and b0 while fixing the precision hyper-parameters κ0 and a0. Choices of these hyperparameters are discussed in Section S.7 of the supplementary materials. Finally, for a goodness-of-fit test under the proposed model, we use the approach of Johnson (2007) to build a graphical tool based on quantile plots. Such visual tools are often quite effective for detecting departures from model assumptions (Meloun and Militký 2011, chap. 2). See Section S.5 in the supplement for more details.

4. Inference on Treatment Effects

4.1. Two-Step Importance Sampling (IS) Approach

We already described the use of the weights πs,j in the CAM model to achieve equivalent patient populations. This allows a straightforward approach to treatment comparison. Using the adjusted (resampled) subpopulation of X2, one can proceed with inference on the treatment effect using any method relying on equivalent patient populations across the two arms. We refer to this approach as the “two-step IS” and use it in the simulation studies and applications in Sections 6 and 7, respectively. This approach does not make use of the outcome model of Section 3.3.

4.2. Model-Based Inference for Treatment Effects

Alternatively, we implement inference using the response model of Section 3.3, that is, the full CA-PPMx. We refer to this approach as “model-based inference”. We assume that the desired inference on treatment effects takes the form of inference for some notion of difference δ(,) of the marginal distributions under the two treatment arms, Δθ=δf1θ1,π1,f2θ2,π2. However, since the covariate populations in the two treatment arms can be substantially different, comparison between the marginal (with respect to the covariates) outcome models f1Y1,iθ1,π1 and f2Y2,iθ2,π2 can be biased. We need to appropriately adjust for the differences in the two populations. We do this by replacing f2 as follows. Exploiting the common atoms structure of the proposed CA-PPMx, there is an operationally simple method to carry out this adjustment and infer treatment effects. Since within each cluster, the covariate populations can be considered equivalent, the adjustment for the lack of randomization amounts to adjusting the corresponding cluster weights. We define

f~2Yθ2,π1=j=1kn2π1,jhYθ2,j,

where the mixture components hYθ2,j of the response model in the RWD are weighted by π1, that is, the cluster weights associated with X1 (rather than π2). Thus, f~2 is the distribution of outcomes under control in the treatment population or in other words, the response of an average individual from the trial arm potentially treated with the control therapy. With these notions, we define the population adjusted treatment effect as

Δ~θ=δf1θ1,π1,f~2θ2,π1. (9)

For example, when Y is a univariate response variable and δf1,f2=Ef1(Y)-Ef2(Y),Δ~θ simplifies to Δ~θ=j=1kn2π1,jEhYθ1,j(Y)-EhYθ2,j(Y), which further reduces to Δ~θ=j=1kn2π1,jμ1,j-μ2,j when μs,j=EhYθs,j{T(Y)}.

In general, each cluster of covariates in the CAM model can be interpreted as a homogeneous sub-population of patients. For the jth group, the average treatment effect is hYθ1,j,h(Yθ2,j} and its proportion in the target population is π1,j. The reported treatment effect (9) includes the adjustment with the sub-population proportions π1,j. On a related point, the proposed model-based inference on treatment effects in the CA-PPMx model can be interpreted as a stochastic propensity score stratification approach. See Section S.6 in the supplementary materials for the details.

We prefer the Bayesian model-based approach to avoid discarding unmatched patient records from the RWD from the analysis. The two-step IS can be useful to validate the results obtained by the model-based approach.

5. Posterior Computation

We develop an efficient Gibbs sampler for posterior inference in the proposed CAM model for nonconjugate mixture of lognormals on survival outcomes. One potential complication arises from the varying dimension of π1 depending on the observed atoms in X2. Posterior simulation with variable dimensional parameters generally involves complicated trans-dimensional Markov chain Monte Carlo (Green 1995), often resulting in poor mixing and computational inefficiencies. Our posterior sampling algorithm avoids such complications while rigorously maintaining the architecture of the CAM model. See Section S.8 in the supplementary materials for more details.

6. Simulation Study

We first describe the simulation scenarios.

CAM scenario:

We first consider a scenario where the covariates are generated from a CAM model. In this scenario, we take the first q=p-3 covariates to be continuous and the remaining 3 to be binary. For the trial arm s=1, we generate X1,i,1:q~iidj=12π1,jNqμj,σj2Iq and X1,i,~iidBernoulliϱ1 for =q+1,,p. For the RWD arm, s=2, we generate X2,i,1:q~iidj=13π2,jNqμj,σj2Iq and X2,i,~iidj=12ιjBernoulliϱj for =q+1,,p where ι1=π2,1+π2,2 and ι2=π2,3. We take ι1ι2 ensuring that the X2 population is substantially different from X1 in having more heterogeneity.

MIX scenario:

In this scenario, we generate Xs,i~iidj=1kπs,jNpμs,j,0.05Ip. We take μ1,j=μ2,j for all j<k but set μ1,kμ2,k so that the atoms in the treatment arm are not exactly a subset of those in the RWD. Given the typically larger heterogeneity of the RWD, this is not a realistic scenario. We include it to evaluate the approach under model misspecification. Different weights attached to the atoms in the two populations result in significantly different marginal densities.

Interaction scenario:

In this scenario, we resample from the historical GBM database of 339 patients to create a future single-arm trial population. Let F(X) denote the (unknown) distribution of the covariates in the database, and Z be an indicator variable such that Z=s if X is selected into arm s. That is, we sample X1,i iid from pX1,iFX1,ieX1,i and X2,i from pX2,iFX2,i1-eX2,i where e(X)=Pr(Z=1X) is the PS of assignment to the treatment arm. We set e(X) to be a logistic regression with pairwise interactions between some covariates. We can sample Xs,i by simple weighted resampling of the historical database, without explicitly knowing F().

Oracle scenario:

In this fourth and final scenario, we proceed as in the Interaction scenario but now with e(X) defined as a logistic regression with main effects of the true predictors only, that is, as if an oracle had revealed the right predictors.

Outcome model:

Under the CAM and MIX scenarios, we generate Y1,i=δ+fX1,i+ϵ1,i and Y2,i=fX2,i+ϵ2,i where f() is a nonlinear function; in the Interaction and Oracle scenarios we generate Y1,i=δ+X1,iTβ+ϵ1,i and Y2,i=X2,iTβ+ϵ2,i, where ϵs,i~iidN(0,1) for i=1,,ns and s=1,2, implying δ as the true treatment effect. We repeat the experiments for δ=-1,0,1,3.

We repeat the simulations in the CAM and MIX scenarios for p=10,20,n1=50,100,150 and set n2=6×n1 for all setups, keeping the ratio of the population sizes consistent with the GBM application. For each (n1,p,δ) combination in the CAM and MIX scenarios, we perform 500 independent replications. Under the Interaction and Oracle scenarios, there are p=11 covariates and we use n1=49. To avoid reporting summaries that might just hinge on a lucky choice of the logistic regression coefficients in e() and to remove one source of randomness unrelated to the methods under comparison, we independently sample different sets of regression coefficients (from a discrete mixture distribution) for each of the 500 repeat simulations. Further details are provided in Section S.9.2 of the supplementary materials.

6.1. Analyses

We compare the CA-PPMx model with the PS-integrated power prior and composite likelihood approaches (Wang et al. 2019, 2020; Chen et al. 2020) as implemented in the psrwe R package, and a two-step population matching approach. We perform seven different analyses for each of the four scenarios to estimate the treatment effect Δθ which we define here as the difference in mean outcomes, that is, δ. The analyses are (i) CA-PPMx: The proposed CA-PPMx model of Section 4.2; (ii) IS-LM: The two-step IS approach introduced in Section 3.2. We first sample a subpopulation of size n1 from X2 following the importance resampling scheme proposed in Section 3.2 and subsequently estimate the treatment effect between the subpopulation and the treatment arm by fitting a linear model; (iii) and (iv) PP-Logistic and PP-RF: Two PS-based power prior approaches using logistic regression and random forest (Breiman 2001), respectively; (v) and (vi) CL-Logistic and CL-RF: Two composite likelihood based approaches with logistic and random forest classifier based PSs, respectively; and finally, (vii) Matching: A distance based bipartite matching method designed to match treatment and control groups in observational studies (Hansen and Klopfer 2006) and subsequently using a linear model for detecting treatment effects as implemented in the optmatch R package.

6.2. Equivalence of Populations

In preparation for inference under the two-step IS approach, we generate equivalent populations using the density-free importance resampling scheme discussed in Section 3.2 based on the fitted CAM model. To formally test for equivalence of the adjusted datasets, we implement Step 5 in Algorithm 1. We first merge the datasets and then try to classify patients in the merged sample as originally RWD or single-arm treatment cohort (s=2 vs. s=1 in our earlier notation). For classification, we use BART and report the boxplots of the area under the receiver operating characteristic curve (AUC) of the classification accuracy across the independent experiments for all simulation settings in Figure 3. For comparison, we also subsample randomly (instead of using the IS weights) and report the AUCs in the same figure. We refer to the two sampling strategies as IS and Random, respectively.

Figure 3.

Figure 3.

Boxplots of the area under the receiver operating characteristic curve (AUC) of the classification accuracy for a merged dataset consisting of X1 and the subsampled X2 using BART, and trying to classify into originally X1 versus X2. Two subsampling schemes are used—the importance sampling (IS) strategy in Section 3.2 and simple random resampling. Here AUCs close to 0.5 imply near equivalence between the populations. Panel (a) shows the AUCs in the CAM and MIX scenarios across different sample sizes and number of covariates; and (b) shows the AUCs under the Interaction and Oracle scenarios.

In Figure 3(a), the Random resampling strategy yields high AUC, indicating that the two populations are substantially different and adjustment in the RWD population is necessary before using it as synthetic control. For both, the CAM and MIX scenarios, the performance of the IS scheme improves with increasing sample size. This is expected as for small sample sizes X2 is lacking enough data to produce a subsample equivalent to X1. AUC values close to 1 under the CAM scenario imply that the true populations are indeed very different in this case. In contrast, the AUC values close to 0.5 under the IS scheme indicate near equivalence after adjustment. In both scenarios, AUC is substantially reduced under the IS resampling scheme, implying that the proposed CAM model indeed adjusts for the lack of randomization.

Results under the last two scenarios are shown in Figure 3(b). Recall that in both scenarios the simulation truth is not based on the CAM model. Still, the fit under the proposed CAM model achieves near perfect adjustment as shown in the figure.

6.3. Inference on Treatment Effects

In each simulation setup, we test H0:δ=0 versus H1:δ0 at 5% level of significance. We elaborate the testing procedure in Section S.9.1 of the supplementary materials. We report power in Figure 4, with detailed numerical results appearing in Tables S.1S.3 in the supplementary materials. Under the PS-based approaches, the power remains below 15% across all scenarios (not shown in the figure). Fully model-based nonparametric CA-PPMx has higher power than IS-LM and Matching when the true response models are nonlinear. In contrast, the IS-LM and Matching perform comparably and have higher power than the CA-PPMx approach in Interaction and Oracle scenarios where the true response model is linear, but are susceptible to model misspecification as reflected in the CAM and MIX scenarios. This is because IS-LM and Matching assume a linear model for the outcome, which happens to match the simulation truth in the Interaction and Oracle scenarios. Except under the PS-based approaches, power increases with increasing sample size, indicating that PS-based methods may require a much larger population size in the RWD to adjust for the lack of randomization.

Figure 4.

Figure 4.

Power of detecting treatment effects in different simulation setups for a 5% level of significance test: Seven methods are used to estimate the effects where IS-LM and CA-PPMx are based on the proposed CAM model. Panel (a) corresponds to the CAM (top) and MIX (bottom) scenarios. Panel (b) shows results under the Interaction (left side) and the Oracle (right side) scenarios.

7. Application in Glioblastoma

We return to the motivating case study of creating a synthetic control for a hypothetical upcoming single-arm GBM trial. The sample size of the trial is n1=49, similar to past trials (Vanderbeek et al. 2018). The endpoint of interest is overall survival (OS). We evaluate the operating characteristics of the proposed design by simulating L=100 trial replicates. See (Berry et al. 2010, sec. 2.5.4) for a discussion of the role of frequentist operating characteristics in Bayesian inference. To create treatment arm data, we first select covariates X1,i by randomly selecting patients from the historical database. To generate a realistic nonequivalent patient population, we select not uniformly but using a logistic regression on the covariates (as described in the Interaction scenario in Section 6). The treatment effect is quantified by the hazard ratio (HR) between the treatment arm and the (synthetic) control arm, with the null and alternative hypotheses H0 : HR = 1 versus H1 : HR ≤ 0.6 at 50 weeks. The HR of 0.6 was suggested by clinical collaborators as a meaningful clinical target.

We show results under two alternative scenarios (a) H0: no treatment effect (i.e., HR = 1), created by keeping the OS for the patients in the treatment arm as originally observed in the historical database (since the patients received treatments with similar efficacy); and (b) H1: there is a clinically meaningful treatment effect. We created H1 by increasing the OS of patients in the treatment arm with an increment that would correspond to a HR of 0.6 under an exponential model.

We apply three methods to make inference on the treatment effect: (i) IS-based two-step procedure: Here we first create equivalent patient populations using Algorithm 1 and then proceed with inference on the treatment effect as if patients were randomly assigned to treatment and control; (ii) Matching-based two-step procedure: Operationally similar to (i) but now the Matching method discussed in Section 6 is used to create equivalent patient populations; and (iii) Model-based inference: The extension of the CAM model to include the outcomes Ys,i, as described in Section 4.2.

7.1. IS-based Two-Step Procedure

In preparation for inference, we start with a test for equivalence of the subsampled population in each of the L=100 repeat simulations. Figure 5 plots the relative frequencies for each covariate in the treatment arm (red) and in the synthetic control arm constructed from the RWD using: (a) the IS sampling following Algorithm 1 (green) and (b) random sampling (blue). Very different frequencies in the two arms under random resampling indicate significant differences in the covariate distributions between the treatment and the control arms. For most covariates, the differences are however greatly reduced by the IS scheme.

Figure 5.

Figure 5.

Covariate distributions before and afte adjustments. The red bars show the distributions of the covariates in the treatment arm. The green and blue bars show the distributions of the covariates in the synthetic control arms formed using the IS and random resampling schemes, respectively.

Once we establish equivalence of the patient populations, we proceed with inference for the treatment effect. We use a Cox proportional hazard (PH) model (Cox 1972) and the logrank test (Peto and Peto 1972) to compare the survival functions. The top panel of Figure 6 shows inference summaries over the L=100 repetitions. The figure shows the histograms of p-values under H0 (blue) and H1 (red). Under H0,p-values are almost uniformly spread out over [0, 1]. In contrast, under H1, the histogram of p-values over repeat simulations is peaked close to zero.

Figure 6.

Figure 6.

Inference under treatment effects under the two-step procedures: Panel (a) shows histograms of the p-values corresponding to a logrank test under the Cox PH model comparing the survival curves between the treatment arms; Panel (b) shows the Kaplan-Meier curves and pointwise confidence intervals for treatment (blue) and control (red) arms under scenarios H0 (left) and H1, respectively. The top and bottom panels of (a) and (b) show the results corresponding to the IS and Matching based approaches, respectively.

Finally, we identify representative simulations from the L repetitions under each of the two scenarios by finding the instance with p-value closest to the median of the respective histograms. For these two representatives, we show Kaplan-Meier (KM) survival curves in the top panels of Figure 7(b), respectively. We observe that the survival curves in the two arms are quite alike with wide confidence intervals under the H0 scenario, whereas significant improvements in the survival times can be observed for the treatment arm for the first 80 weeks under the H1 scenario.

Figure 7.

Figure 7.

Inference under treatment effects under the model-based approach: Panel (a) shows quantile-quantile plots to assess model fit from Section S.5 in the supplementary materials; the left plot of panel (b) shows posterior probabilities p(HR<0.6Data) under repeat simulations, and on the right posterior estimated hazard ratios for OS with pointwise 95% credible regions are shown under H0 and H1.

7.2. Matching-based Two-Step Procedure

We use the Matching procedure to create a synthetic control and then follow the same routine of (i) for inference on treatment effects. The results are provided in the bottom panels of Figures 6(a) and (b). The distribution of the p-values under the H1 scenario is less peaked around 0 compared to the IS-based procedure. This is also reflected in the representative KM plot under H1 in having a much wider confidence interval around the survival curve possibly indicating the IS-based approach is doing better than Matching in creating equivalent populations.

7.3. Model-Based Inference

As it is not straightforward to account for the uncertainty in creating the synthetic control in the aforementioned two-step procedures, we consider a fully model-based approach. For inference on treatment effects, we first assess goodness of fit of the CA-PPMx model (see Section S.5 in the supplementary materials for details). Quantile-quantile plots for the two scenarios are shown in Figure 7(a). Near diagonal lines indicate no evidence for a lack of fit. We then evaluate the posterior probability pp(HR<0.6Data) (with indexing the L=100 repeat simulations) at t=50 weeks under the proposed model. The left panel of Figure 7(b) shows histograms of p under H0 (in blue) and under H1 (in red). As desired, the posterior probabilities are clustered near 0 under H0, but are peaked near 1 under H1.

Finally, we identify a representative simulation again by selecting the repeat simulation with posterior probability p closest to the median of the respective histograms under each of the two scenarios. For each of the two scenarios, we plot the posterior estimated hazard ratios (blue and red for simulation under H0 and H1, respectively), together with pointwise 95% posterior credible intervals in the right panel of Figure 7(b). Under H0 (blue), HR is almost equal to 1 with wide credible intervals, whereas under H1 (red), HR is significantly below 1 with high posterior probability. The median (over the L simulations) posterior probabilities p(HR<0.6Data) are 0.08 and 0.98 under H0 and H1, respectively.

8. Discussion

With a long term goal of setting up a platform for future single-arm early-phase clinical trials in GBM, where new patients only receive experimental therapies, in this article we developed a Bayesian nonparametric approach for creating synthetic controls from RWD. We introduced a Bayesian CAM model that clusters covariates with similar values across different treatment arms.

The flexibility of the CAM model makes it easily generalizable to other problems, for example, to create two synthetic treatment arms to compare two treatments based on RWD from electronic health records.

Another direction for extensions could build on extracting propensity scores as inference summaries under the CA-PPMx model. This is briefly discussed in Section S.6 of the supplementary materials.

A limitation of the current model is scalability to high-dimensional covariates. In the GBM application, we rely on 11 clinically important categorical covariates that are commonly considered as prognostic factors in GBM treatments. However, in many applications candidate covariates can be high-dimensional. Implicit in the current construction is the assumption that the recorded covariates are clinically relevant for the disease or condition under consideration, and the approach may not be appropriate when large numbers of unscreened candidate covariates are used. Recent advances in Bayesian model-based clustering by Chandra, Canale, and Dunson (2023) could be useful to construct high-dimensional generalizations.

Supplementary Material

Supplementary

Supplementary materials include additional discussion of the motivating dataset, a brief review on the PPMx, detailed discussion of the graphical goodness-of-fit test for the regression model, an alternative interpretation of our model-based inference approach, choices of hyperparameters, details of the posterior simulation scheme, additional simulation studies and associated details, and MCMC convergence diagnostics. C++ and R programs implementing the methods developed in this article and R Markdown files with instructions are provided in a separately attached Codes.zip folder.

Acknowledgments

We thank the Editor, Dr. Michael Stein, an anonymous Associate Editor, and two anonymous referees for comments that led to significant improvements in the clarity and presentation of the paper.

Footnotes

Disclosure Statement

No potential conflict of interest was reported by the author(s).

References

  1. Aldape K, Brindle KM, Chesler L, Chopra R, Gajjar A, and Gilbert MR (2019), “Challenges to Curing Primary Brain Tumours,” Nature Reviews Clinical Oncology, 16, 509–520. [Google Scholar]
  2. Alexander BM, Trippa L, Gaffey S, Arrillaga-Romany IC, Lee EQ, Rinne ML, et al. (2019), “Individualized Screening Trial of Innovative Glioblastoma Therapy (INSIGhT): A Bayesian Adaptive Platform Trial to Develop Precision Medicines for Patients with Glioblastoma,” JCO Precision Oncology, 3, 1–13. [Google Scholar]
  3. Ascolani F, Lijoi A, Rebaudo G, and Zanella G (2022), “Clustering Consistency with Dirichlet Process Mixtures,” Biometrika (to appear). [Google Scholar]
  4. Au S, and Beck J (2003), “Important Sampling in High Dimensions,” Structural Safety, 25, 139–163. [Google Scholar]
  5. Berry SM, Carlin BP, Lee JJ, and Müller P (2010), Bayesian Adaptive Methods for Clinical Trials, Boca Raton, FL: CRC Press. [Google Scholar]
  6. Breiman L (2001), “Random Forests,” Machine Learning, 45, 5–32. [Google Scholar]
  7. Burcu M, Dreyer NA, Franklin JM, Blum MD, Critchlow CW, Perfetto EM, and Zhou W (2020), “Real-World Evidence to Support Regulatory Decision-Making for Medicines: Considerations for External Control Arms,” Pharmacoepidemiology and Drug Safety, 29, 1228–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Camerlenghi F, Dunson DB, Lijoi A, Prünster I, and Rodríguez A (2019), “Latent Nested Nonparametric Priors,” (with discussion), Bayesian Analysis, 14, 1303–1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chandra NK, Canale A, and Dunson DB (2023), “Escaping the Curse of Dimensionality in Bayesian Model-based Clustering,” Journal of Machine Learning Research, 24, 1–42. [Google Scholar]
  10. Chen M-H, and Ibrahim JG (2000), “Power Prior Distributions for Regression Models,” Statistical Science, 15, 46–60. [Google Scholar]
  11. Chen W-C, Wang C, Li H, Lu N, Tiwari R, Xu Y, and Yue LQ (2020), “Propensity Score-Integrated Composite Likelihood Approach for Augmenting the Control Arm of a Randomized Controlled Trial by Incorporating Real-World Data,” Journal of Biopharmaceutical Statistics, 30, 508–520. [DOI] [PubMed] [Google Scholar]
  12. Chipman HA, George EI, and McCulloch RE (2010), “BART: Bayesian Additive Regression Trees,” Annals of Applied Statistics, 4, 266–298. [Google Scholar]
  13. Choi J, Dekkers OM, and le Cessie S (2019), “A Comparison of Different Methods to Handle Missing Data in the Context of Propensity Score Analysis,” European Journal of Epidemiology, 34, 23–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cox DR (1972), “Regression Models and Life-Tables,” Journal of the Royal Statistical Society, Series B, 34, 187–220. [Google Scholar]
  15. Dahl DB (2006), Model-based Clustering for Expression Data via a Dirichlet Process Mixture Model, pp. 201–218, Cambridge: Cambridge University Press. [Google Scholar]
  16. Davi R, Mahendraratnam N, Chatterjee A, Jill Dawson C, and Sherman R (2020), “Informing Single-Arm Clinical Trials with External Controls,” Nature Reviews Drug Discovery, 19, 821–822. [Google Scholar]
  17. Denti F, Camerlenghi F, Guindani M, and Mira A (2021), “A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data,” Journal of the American Statistical Association, 118, 405–416, DOI: 10.1080/01621459.2021.1933499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. FDA (2020), “Rare Diseases at FDA,” available at https://www.fda.gov/patients/rare-diseases-fda. Accessed on 7th December 2021.
  19. — (2021), “Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products, Guidance for Industry. Available at https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adjusting-covariates-randomized-clinical-trials-drugs-and-biological-products.
  20. Ferguson TS (1973), “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, 1, 209–230. [Google Scholar]
  21. Fisher JP, and Adamson DC (2021), “Current FDA-approved Therapies for High-Grade Malignant Gliomas,” Biomedicines, 9, 324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Franklin JM, Glynn RJ, Martin D, and Schneeweiss S (2019), “Evaluating the Use of Nonrandomized Real-World Data Analyses for Regulatory Decision Making,” Clinical Pharmacology & Therapeutics, 105, 867–877. [DOI] [PubMed] [Google Scholar]
  23. Ghosal S, and van der Vaart A (2017), Fundamentals of Nonparametric Bayesian Inference. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. [Google Scholar]
  24. Ghosal S, Ghosh JK, and Ramamoorthi RV (1999), “Posterior Consistency of Dirichlet Mixtures in Density Estimation,” Annals of Statistics, 27, 143–158. [Google Scholar]
  25. Green PJ (1995), “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, 82, 711–732. [Google Scholar]
  26. Grossman SA, and Ellsworth SG (2016), “Published Glioblastoma Clinical Trials from 1980 to 2013: Lessons from the Past and for the Future,” Journal of Clinical Oncology, 34, e13522–e13522. [Google Scholar]
  27. Grossman SA, Schreck KC, Ballman K, and Alexander B (2017), “Point/counterpoint: Randomized versus Single-Arm Phase II Clinical Trials for Patients with Newly Diagnosed Glioblastoma,” Neuro-Oncology, 19, 469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Haase S, Garcia-Fabiani MB, Carney S, Altshuler D, Núñez FJ, et al. (2018), “Mutant ATRX: Uncovering a New Therapeutic Target for Glioma,” Expert Opinion on Therapeutic Targets, 22, 599–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hansen BB, and Klopfer SO (2006), “Optimal Full Matching and Related Designs via Network Flows,” Journal of Computational and Graphical Statistics, 15, 609–627. [Google Scholar]
  30. Hasegawa T, Claggett B, Tian L, Solomon SD, Pfeffer MA, and Wei L-J (2017), “The Myth of Making Inferences for an Overall Treatment Efficacy with Data from Multiple Comparative Studies via Meta-Analysis,” Statistics in Biosciences, 9, 284–297. [Google Scholar]
  31. Hobbs BP, Carlin BP, Mandrekar SJ, and Sargent DJ (2011), “Hierarchical Commensurate and Power Prior Models for Adaptive Incorporation of Historical Information in Clinical Trials,” Biometrics, 67, 1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Jiang L, Nie L, and Yuan Y (2023), “Elastic Priors to Dynamically Borrow Information from Historical Data in Clinical Trials,” Biometrics, 79, 49–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Johnson VE (2007), “Bayesian Model Assessment Using Pivotal Quantities,” Bayesian Analysis, 2, 719–733. [Google Scholar]
  34. King G, and Nielsen R (2019), “Why Propensity Scores Should Not Be Used for Matching,” Political Analysis, 27, 435–454. [Google Scholar]
  35. Li X, and Song Y (2020), “Target Population Statistical Inference with Data Integration Across Multiple Sources-An Approach to Mitigate Information Shortage in Rare Disease Clinical Trials,” Statistics in Biopharmaceutical Research, 12, 322–333. [Google Scholar]
  36. Linero AR, and Daniels MJ (2018), “Bayesian Approaches for Missing not at Random Outcome Data: The Role of Identifying Restrictions,” Statistical Science, 33, 198–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lu N, Wang C, Chen W-C, Li H, Song C, Tiwari R, Xu Y, and Yue LQ (2022), “Leverage Multiple Real-World Data Sources in Single-Arm Medical Device Clinical Studies,” Journal of Biopharmaceutical Statistics, 32, 107–123. [DOI] [PubMed] [Google Scholar]
  38. Mandel JJ, Yust-Katz S, Patel AJ, Cachia D, Liu D, Park M, et al. (2017), “Inability of Positive Phase II Clinical Trials of Investigational Treatments to Subsequently Predict Positive Phase III Clinical Trials in Glioblastoma,” Neuro-Oncology, 20, 113–122. [Google Scholar]
  39. Meloun M, and Militký J (2011), “The Exploratory and Confirmatory Analysis of Univariate Data,” in Statistical Data Analysis, pp. 25–71, New Delhi: Woodhead Publishing India. [Google Scholar]
  40. Miller JW, and Harrison MT (2013), “A Simple Example of Dirichlet Process Mixture Inconsistency for the Number of Components,” in Advances in Neural Information Processing Systems (Vol. 26). [Google Scholar]
  41. Müller P, Quintana F, and Rosner GL (2011), “A Product Partition Model with Regression on Covariates,” Journal of Computational and Graphical Statistics, 20, 260–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Müller P, Chandra NK, and Sarkar A (2023), “Bayesian Approaches to Include Real-World Data in Clinical Studies,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381, 20220158. [Google Scholar]
  43. Nam JY, and de Groot JF (2017), “Treatment of Glioblastoma,” Journal of Oncology Practice, 13, 629–638. [DOI] [PubMed] [Google Scholar]
  44. Nichol A, Bailey M, and Cooper D (2010), “Challenging Issues in Randomised Controlled Trials,” Injury, 41, S20–S23. [DOI] [PubMed] [Google Scholar]
  45. Ostrom QT, Gittleman H, Liao P, Vecchione-Koval T, Wolinsky Y, Kruchko C, and Barnholtz-Sloan JS (2016), “CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2009–2013,” Neuro-Oncology, 18, v1–v75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Page GL, Quintana FA, and Müller P (2022), “Clustering and Prediction with Variable Dimension Covariates,” Journal of Computational and Graphical Statistics, 31, 466–476. [Google Scholar]
  47. Patel B, and Kim AH (2020), “Laser Interstitial Thermal Therapy,” Missouri Medicine, 117, 50–55. [PMC free article] [PubMed] [Google Scholar]
  48. Peto R, and Peto J (1972), “Asymptotically Efficient Rank Invariant Test Procedures,” Journal of the Royal Statistical Society, Series A, 135, 185–207. [Google Scholar]
  49. Prevost TC, Abrams KR, and Jones DR (2000), “Hierarchical Models in Generalized Synthesis of Evidence: An Example based on Studies of Breast Cancer Screening,” Statistics in Medicine, 19, 3359–3376. [DOI] [PubMed] [Google Scholar]
  50. Rodríguez A, Dunson DB, and Gelfand AE (2008), “The Nested Dirichlet Process,” Journal of the American Statistical Association, 103, 1131–1154. [Google Scholar]
  51. Rosenbaum PR, and Rubin DB (1983), “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, 70, 41–55. [Google Scholar]
  52. Schmidli H, Häring DA, Thomas M, Cassidy A, Weber S, and Bretz F (2020), “Beyond Randomized Clinical Trials: Use of External Controls,” Clinical Pharmacology & Therapeutics, 107, 806–816. [DOI] [PubMed] [Google Scholar]
  53. Sethuraman J (1994), “A Constructive Definition of Dirichlet Priors,” Statistica Sinica, 4, 639–650. [Google Scholar]
  54. Skare O, Bølviken E, and Holden L (2003), “Improved Sampling-Importance Resampling and Reduced Bias Importance Sampling,” Scandinavian Journal of Statistics, 30, 719–737. [Google Scholar]
  55. Stuart EA (2010), “Matching Methods for Causal Inference: A Review and a Look Forward,” Statistical Science, 25, 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Sutton AJ, and Abrams KR (2001), “Bayesian Methods in Meta-Analysis and Evidence Synthesis,” Statistical Methods in Medical Research, 10, 277–303. [DOI] [PubMed] [Google Scholar]
  57. Teh YW, Jordan MI, Beal MJ, and Blei DM (2006), “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581. [Google Scholar]
  58. Vanderbeek AM, Rahman R, Fell G, Ventz S, Chen T, Redd R, Parmigiani G, Cloughesy TF, Wen PY, Trippa L, and Alexander BM (2018), “The Clinical Trials Landscape for Glioblastoma: Is It Adequate to Develop New Treatments?,” Neuro-Oncology, 20, 1034–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Vansteelandt S, and Daniel R (2014), “On Regression Adjustment for the Propensity Score,” Statistics in Medicine, 33, 4053–4072. [DOI] [PubMed] [Google Scholar]
  60. Wang C, and Rosner GL (2019), “A Bayesian Nonparametric Causal Inference Model for Synthesizing Randomized Clinical Trial and Real-World Evidence,” Statistics in Medicine, 38, 2573–2588. [DOI] [PubMed] [Google Scholar]
  61. Wang C, Li H, Chen W-C, Lu N, Tiwari R, Xu Y, and Yue LQ (2019), “Propensity Score-Integrated Power Prior Approach for Incorporating Real-World Evidence in Single-Arm Clinical Studies,” Journal of Biopharmaceutical Statistics, 29, 731–748. [DOI] [PubMed] [Google Scholar]
  62. Wang C, Lu N, Chen W-C, Li H, Tiwari R, Xu Y, and Yue LQ (2020), “Propensity Score-Integrated Composite Likelihood Approach for Incorporating Real-World Evidence in Single-Arm Clinical Studies,” Journal of Biopharmaceutical Statistics, 30, 495–507. [DOI] [PubMed] [Google Scholar]
  63. Zhao Z (2004), “Using Matching to Estimate Treatment Effects: Data Requirements, Matching Metrics, and Monte Carlo Evidence,” The Review of Economics and Statistics, 86, 91–107. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES