Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2018 Sep 21;21(2):e47–e64. doi: 10.1093/biostatistics/kxy053

Incorporating historical models with adaptive Bayesian updates

Philip S Boonstra 1,, Ryan P Barbaro 2
PMCID: PMC7868052  PMID: 30247557

Summary

This article considers Bayesian approaches for incorporating information from a historical model into a current analysis when the historical model includes only a subset of covariates currently of interest. The statistical challenge is 2-fold. First, the parameters in the nested historical model are not generally equal to their counterparts in the larger current model, neither in value nor interpretation. Second, because the historical information will not be equally informative for all parameters in the current analysis, additional regularization may be required beyond that provided by the historical information. We propose several novel extensions of the so-called power prior that adaptively combine a prior based upon the historical information with a variance-reducing prior that shrinks parameter values toward zero. The ideas are directly motivated by our work building mortality risk prediction models for pediatric patients receiving extracorporeal membrane oxygenation (ECMO). We have developed a model on a registry-based cohort of ECMO patients and now seek to expand this model with additional biometric measurements, not available in the registry, collected on a small auxiliary cohort. Our adaptive priors are able to use the information in the original model and identify novel mortality risk factors. We support this with a simulation study, which demonstrates the potential for efficiency gains in estimation under a variety of scenarios.

Keywords: Bias-variance tradeoff, Combining information, Hierarchical shrinkage, Power prior, Regularized horseshoe prior

1. Introduction

When a statistical model is published, there are often already models for the same outcome. Although the new model and the existing models may each differ in their target populations, underlying sets of predictors, or in other ways (e.g. Becker and Wu, 2007), there is usually some historical information available when the new model was built. In that sense, this framework of sequential but independent model development is not fully utilizing available historical information. In this article, we propose Bayesian approaches that incorporate the posterior distribution from the historical model into a prior for the new model when the set of historical covariates is strictly nested within the set of the new covariates.

Our motivation for this work is a short-term mortality risk prediction model (“Ped-RESCUERS”) for pediatric patients receiving extracorporeal membrane oxygenation (ECMO) support using information on 1611 pediatric patients treated between the years 2009 and 2012 (Barbaro and others, 2016). The source population was the Extracorporeal Life Support Organization (ELSO), an international registry of ECMO patients, and the pertinent data are limited to patient clinical characteristics (weight, age, sex, primary diagnosis, co-morbidities, complications, and pre-ECMO supportive therapies) and ECMO-specific measurements (blood gas measurements and ventilator settings). In total, Ped-RESCUERS uses eleven predictors. Patient-specific biometric measurements of renal, hepatic, neurological, and hematological dysfunction that may be associated with mortality on ECMO are not generally collected in the ELSO registry. We posited a list of eleven such additional potential risk factors and collected them on a cohort of 178 non-overlapping patients at three ECMO-providing centers. The data consist of both the eleven risk factors in PED-RESCUERS and eleven biometric measurements not in the registry. We require a model that potentially includes all 22 covariates as predictors. We report on these new data in Barbaro and others (2018). However, given the ratio of sample size to number of covariates and the subsequent variability in estimates, it is more statistically incumbent to make use of the information from the original large cohort of patients with unmeasured biometric measurements. Yet, the process of doing so may introduce bias, due to the different predictors included in each model. It is these competing objectives we seek to balance so as to increase overall efficiency, i.e. decrease root mean squared error (RMSE).

In other cases, there may only be very limited historical information, meaning that the number of historical predictors is much less than the total number of potential predictors under study. It is reasonable to expect that incorporating even such limited historical information should result in a model that is non-inferior to a modeling approach ignoring the historical information entirely. For example, one alternative to a prior based solely on historical data would be to regularize estimation and prediction with a prior that shrinks parameters toward zero, as in the Bayesian Lasso (Park and Casella, 2008) and others (Griffin and Brown, 2005; Armagan and others, 2013). Based on this logic, an ideal strategy combines these approaches: incorporating whatever historical information is available and, for those parameters about which it is not informative, controlling variability by shrinking them to zero in the “usual” way.

We achieve this here through an extension of the power prior, which includes the historical data likelihood, raised to a power Inline graphic (Ibrahim and Chen, 2000; Ibrahim and others, 2015). Setting Inline graphic or Inline graphic corresponds, respectively, to ignoring the historical likelihood entirely or a fully Bayesian update. Using Inline graphic allows for partial borrowing of the historical likelihood in the presence of heterogeneity, and this can be made adaptive by considering a hyperprior on Inline graphic itself (Duan and others, 2006; Neuenschwander and others, 2009). The novel idea in our approach is to use Inline graphic to vary the relative contributions of the historical prior (Inline graphic) and a variance-reducing prior that shrinks to zero (Inline graphic).

In the classical power prior, the historical and current models include the same predictors. In one extension, Chen and others (1999) approach a related problem by constructing a second, artificial historical likelihood that uses a constant for the outcome and copies of the added covariates from the current data. This has the effect of shrinking the corresponding regression parameters toward zero and so is actually more similar to a typical shrinkage-to-zero prior. Ibrahim and others (2002) consider the general setting of fitting generalized linear models (GLMs) using power priors when values are missing in covariates of the historical and/or current datasets. Crucially, both the historical and current likelihoods condition on the same set of covariates, and missingness is ancillary to the main statistical problem.

Beyond the power prior, there exist alternative approaches for incorporating historical information. A meta-analysis statistically combines univariable or multivariable associations from multiple studies based upon each analysis’ variance or covariance matrix (Walker and others, 2008; Chen and others, 2012; Jackson and Riley, 2014). The main advantage of a meta-analysis is its simplicity, even when combining more than two models, because only summaries statistics are required. Among other assumptions, however, all models to be combined must include the same predictors. Recently, several authors have proposed strategies for incorporating summary-level historical information via constraints on the likelihood (Chatterjee and others, 2016; Grill and others, 2017; Cheng and others, 2018). Antonelli and others (2017) propose Bayesian approaches for borrowing information from the dataset with additional covariates to improve estimation of the average causal effect in the dataset with fewer covariates. Relative to previous important work in this area, we highlight two distinctive features of our approach. First, we account for the underlying uncertainty in the historical information by using the posterior variance from the historical model as the prior variance for the current model. Second, we explicitly ignore any historical information about the intercept, which, in the case of a binary outcome, borrows information, while still allowing for differences in the underlying true prevalence between the historical and the current models.

The proceeding sections develop the ingredients of our approach that combines historical-based prior shrinkage and shrinkage to zero. Section 2 reviews the “regularized horseshoe prior” (Carvalho and others, 2009, 2010; Piironen and Vehtari, 2015, 2017a, 2017b), which is the shrinkage-to-zero prior that we use. Section 3 outlines the construction of two historical-based priors derived from models fit to a subset of the current set of covariates under consideration. Section 4 proposes how to adaptively combine the historical information with the shrinkage prior. Sections 5 and 6 demonstrate our methods with a simulation study and analysis of the motivating ECMO mortality risk prediction model, respectively. Section 7 concludes with a discussion.

2. Shrinkage-to-zero priors

Let Inline graphic denote the link function of a GLM and Inline graphic and Inline graphic denote conditional and marginal distributions, respectively. We will use the prior/posterior nomenclature to indicate whether conditioning is on data. Capital and lowercase letters, respectively, indicate random data and observed data; all types of Greek letters will be reserved for parameters. Standard font will be used for scalar or vector valued quantities, and boldface font will be reserved for matrix-valued quantities.

A GLM for an outcome Inline graphic is fit to a length-Inline graphic vector of covariates Inline graphic, Inline graphic, using Inline graphic datapoints, Inline graphic. The covariates Inline graphic are standardized to their empirical mean and empirical standard deviation. We do not distinguish between the first Inline graphic and the final Inline graphic elements of Inline graphic yet (these identify the original and added elements, respectively) but will do so in subsequent sections. The vector Inline graphic is of primary interest. Given the likelihood Inline graphic and prior Inline graphic, with Inline graphic a vector of hyperparameters that is conditionally independent of Inline graphic given Inline graphic, the posterior is Inline graphic. Often, the prior Inline graphic is selected to regularize parameters by shrinking estimates toward zero, reducing variance at a cost of some bias, thereby increasing efficiency. Many such shrinkage priors can be written as products of conditionally independent normal priors on Inline graphic: e.g., Inline graphic and Inline graphic. For example, if each Inline graphic is independently inverse-gamma distributed with a common shape and scale parameter equal to Inline graphic, each Inline graphic is marginally Student-Inline graphic distributed with Inline graphic degrees of freedom (e.g. Gelman and others, 2014). A different choice of Inline graphic conferring more adaptive shrinkage properties is the “regularized horseshoe” (Carvalho and others, 2009, 2010; Piironen and Vehtari, 2015, 2017a, 2017b). Given constants Inline graphic, Inline graphic and hyperparameters Inline graphic, Inline graphic, the hyperprior is Inline graphic, where Inline graphic, Inline graphic for Inline graphic, and Inline graphic indicates the positive half-Cauchy distribution. Letting Inline graphic, the conditional prior for Inline graphic is then

graphic file with name M50.gif (2.1)

The SZ subscript indicates “shrinkage to zero”. The hyperparameter Inline graphic globally shrinks Inline graphic, while the Inline graphics multiplicatively offset Inline graphic and thus admit large individual variance components. The original horseshoe (Carvalho and others, 2009) implicitly used Inline graphic and Inline graphic, i.e. Inline graphic, and others have since generalized it. First, Piironen and Vehtari (2017a) suggested considering alternative values of Inline graphic, which scales the global shrinkage, by linking its value to an implicit assumption about the a priori effective number of non-zero parameters in the model, say Inline graphic. The relationship is given by Inline graphic, where Inline graphic is the dispersion. So, for example, if Inline graphic, Inline graphic and Inline graphic, under the original horseshoe, the prior mean of Inline graphic is Inline graphic. If instead Inline graphic, then Inline graphic. Typically, using Inline graphic codifies a prior belief that most of the Inline graphic parameters are non-zero, which is unlikely when Inline graphic itself is large. Further, the expression for Inline graphic highlights that the choice of Inline graphic ought to scale with Inline graphic, all other things being equal. Larger sample sizes warrant smaller values of Inline graphic. Based on this, the authors recommend selecting Inline graphic and then, assuming Inline graphic is fixed, numerically solving Inline graphic for Inline graphic (Inline graphic is a function of Inline graphic), where the expectation is taken with respect to Inline graphic. The result will usually be Inline graphic.

Subsequent work by Piironen and Vehtari (2017b) argued that the original horseshoe tends to under-shrink large elements of Inline graphic, which can also result in numerical difficulties, known as “divergent transitions”, as a result of a stochastic search through heavy tails (Piironen and Vehtari, 2015). As a solution, they suggest to soft-truncate its tails with a diffuse normal prior with variance Inline graphic. Choosing a finite-valued Inline graphic results in the regularized horseshoe prior. Our strategy for choosing the hyperparameters Inline graphic and Inline graphic in this paper is to set Inline graphic equal to a large value, Inline graphic, and then numerically solve Inline graphic for Inline graphic, as before. A large Inline graphic has minimal effect in the middle of the horseshoe prior but effectively thins out the heavy tails. We further discuss Inline graphic in Section 5. The prior in (2.1) is the hierarchical shrinkage prior that we will extend in Section 4 to adaptively incorporate historical information. But first, Section 3 considers the prerequisite non-adaptive prior using the historical information alone.

3. Historical shrinkage prior

Separate Inline graphic into Inline graphic and Inline graphic, of length Inline graphic and Inline graphic, respectively. The original covariates Inline graphic were measured in the historical analysis, and the added covariates Inline graphic were not. We are interested in modeling Inline graphic, but the historical model only estimates the smoothed version Inline graphic. The historical analysis conveys information about

graphic file with name M104.gif (3.1)

This knowledge is quantified by the posterior distribution of Inline graphic given the historical data, about which one likely only has access to summary statistics, e.g. the mean and covariance matrix. Our interest is not in Model (3.1) but rather the embiggened model

graphic file with name M106.gif (3.2)

We have a dataset of Inline graphic observations, Inline graphic and a likelihood function Inline graphic. A standard analysis of Inline graphic alone might employ a shrinkage-to-zero prior as described in Section 2; that prior does not distinguish between historical and current covariates. Keeping in mind our ultimate goal of incorporating the historical information we have about Inline graphic, this section lays out an alternative prior formulation based upon the historical analysis. We will then combine these priors in Section 4.

3.1. Naive Bayesian update

A naive Bayesian (NB) update would directly apply the historical posterior on Inline graphic as a prior on Inline graphic, since these parameters correspond to the same set of covariates, namely Inline graphic. More formally, one might use Inline graphic and Inline graphic as the respective prior mean and variance for Inline graphic. Then, given an optional scaling hyperparameter Inline graphic, a conditional prior might be

graphic file with name M119.gif (3.3)

However, Model (3.1) cannot hold for all patterns Inline graphic if Model (3.2) is the true generating model unless Inline graphic (or, if for some fixed Inline graphic matrix of weights Inline graphic, Inline graphic almost surely. In the special case that Inline graphic is the identity link, this condition may be relaxed to equality in expectation, i.e. Inline graphic). Thus, in general, the naive Bayesian is implicitly assuming that Inline graphic, so as to be able to equate Inline graphic and Inline graphic in (3.3). To be consistent with this assumption, Inline graphic should be accompanied by a prior on Inline graphic that strongly shrinks to zero. We discuss this further in Section 4.

3.2. Sensible Bayesian update

Although the naive Bayesian update may improve efficiency, by construction it assumes Inline graphic and Inline graphic. In general Inline graphic and Inline graphic are not equal, neither in value nor interpretation. Further, it is illogical to begin with a strong prior assumption that Inline graphic- corresponding to the novel set of covariates of interest, is approximately zero. The naive Bayesian update will introduce bias when Inline graphic is far from zero. A more sensible Bayesian update would place the prior on the many-to-few mapping from Inline graphic to Inline graphic. Based on derivations below, we show that this mapping can be approximated by Inline graphic, where Inline graphic is a certain Inline graphic projection matrix. To see this, begin by iterating the conditional expectation of Inline graphic given Inline graphic:

graphic file with name M145.gif

Applying Model (3.1), i.e. taking Inline graphic of both sides, which—because the true(r) model is (3.2)—will only be an approximation of the conditional mean of Inline graphic given Inline graphic, we obtain the following:

graphic file with name M149.gif (3.4)
graphic file with name M150.gif (3.5)

This relates the available historical model with the current model. In particular, (3.5) obtains an approximation for the historical (and misspecified) intercept Inline graphic by plugging in Inline graphic, which is predicated on Inline graphic falling within the observed support of Inline graphic and achieved by centering the covariates. This is useful because taking the difference between (3.4) and (3.5) completely removes Inline graphic from the equation:

graphic file with name M156.gif (3.6)

Equation (3.6) is the basis of the sensible Bayesian update: it links Model (3.2) to a function of the parameters from Model (3.1), about which there is historical information. Furthermore, like the naive Bayesian update, the sensible Bayesian update avoids borrowing information on the historical intercept Inline graphic. This relaxes a critical assumption: we do not require that the historical and current data generating models are identical but, less restrictively, that the underlying true values of Inline graphic are equal.

When the link function Inline graphic is non-linear, constructing a prior based upon (3.6) would necessitate a Jacobian adjustment, and the adaptive priors that we subsequently develop in Section 4 would require numerically integrating over this Jacobian at each iteration of the Markov Chain, rendering such an approach computationally intractable. Practically, then, we must further approximate the mapping to obviate the Jacobian adjustment. Moving Inline graphic across the integrals,

graphic file with name M161.gif (3.7)

For a given Inline graphic-length vector Inline graphic, we can use Equation (3.7) to link a Inline graphic-dimensional set of parameter values Inline graphic to a linear combination of Inline graphic, capturing one dimension of information about Inline graphic. With a linearly independent set of Inline graphic vectors Inline graphic, we can create the desired Inline graphic mapping and capture the available Inline graphic dimensions of information.

In theory, (3.7) holds for any arbitrary vector Inline graphic. However, as a consequence of the derivations in this section, we intuitively understand Inline graphic to correspond to a vector of the original covariates. This is important because we need to be able to calculate or approximate the expectations in the mapping. Let Inline graphic denote a Inline graphic matrix of linearly independent columns, with the Inline graphicth row representing a hypothetical pattern of the original covariates. Analogously, let Inline graphic denote a Inline graphic matrix of Inline graphic hypothetical patterns of the added covariates. Then, the length-Inline graphic vectorized mapping is

graphic file with name M181.gif (3.8)

where Inline graphic. Analogous to the naive Bayesian update,

graphic file with name M183.gif (3.9)

In calculating the posterior, Inline graphic and Inline graphic are treated as fixed and known constants. Section S1 of supplementary material available at Biostatistics online describes in detail how to construct Inline graphic and how to use multiple imputation with chained equations (MICE) to calculate a Monte Carlo estimate of the integral in (3.8).

Contrasting the distributions in (3.3) and (3.9), the latter incorporates a linear offset to account for the differences between Models (3.1) and (3.2). The sensible Bayesian update thus approximates and adjusts for the difference between Inline graphic and Inline graphic. However, the prior in (3.9) will still be insufficient on its own, as it only informs Inline graphic dimensions of a Inline graphic parameter space. We return to this point in Section 4. In summary, both ideas merit further consideration: the sensible Bayesian is intuitively preferable by adjusting for model misspecification, and the naive Bayesian avoids modeling the distribution of Inline graphic given Inline graphic.

4. Adaptive weighting

Alone, neither type of prior from Section 2 or 3 would be acceptable in the context of this article: the shrinkage-to-zero prior in Section 2 ignores the historical data, and the priors in Section 3 may be incomplete, particularly when the historical information is limited to a small number of covariates. In this section, we develop combined versions of the historical priors that adaptively vary between the priors in Sections 2 and 3. Called “naive adaptive Bayes” (NAB) and “sensible adaptive Bayes” (SAB), these seek to incorporate the historical information without sacrificing potential efficiency gains coming from shrinking to zero. We describe the two adaptive priors before formally defining them. Both share the following commonalities. Similar to the power prior, a hyperparameter Inline graphic weights the historical information by inversely scaling the variance Inline graphic; larger (smaller) values of Inline graphic reflect greater (less) incorporation of the historical information. When Inline graphic is equal to zero, both NAB and SAB reduce to the shrinkage-to-zero prior in (2.1).

4.1. Naive adaptive Bayes

Where NAB and SAB differ is at Inline graphic, which corresponds to full use of the historical information. NAB extends the Inline graphic assumption of its non-adaptive counterpart in Section 2.1. Therefore, the historical prior on Inline graphic in (3.3) is fully used when Inline graphic, and additional shrinkage of Inline graphic is unnecessary. Moreover, Inline graphic is strongly shrunk to zero, because that is generally the only parameterization for which Inline graphic. The hyperparameters of NAB are Inline graphic (scalars) and Inline graphic (vectors). The pre-specified constants are scalars Inline graphic, Inline graphic, and Inline graphic. For notational simplicity, define Inline graphic to be

graphic file with name M210.gif

Then, the NAB conditional prior is

graphic file with name M211.gif (4.1)
graphic file with name M212.gif (4.2)

As desired, the impact of the shrinkage-to-zero prior decreases with Inline graphic. Inline graphic and Inline graphic are the same as in Section 2.1, and the constants Inline graphic and Inline graphic are selected as previously described. We set Inline graphic equal to 0.05, i.e. a small number, reflecting the assumption that Inline graphic when Inline graphic; however, we introduce an auxiliary hyperparameter Inline graphic, allowing for non-zero elements of Inline graphic if warranted by the data. The hyperparameter Inline graphic separately controls the historical prior shrinkage. Hyperpriors for Inline graphic and Inline graphic are discussed below. The constant Inline graphic guarantees propriety of the posterior for any Inline graphic. To summarize, NAB varies between standard shrinkage to zero (Inline graphic) and a Bayesian update under the assumption that Inline graphic and Inline graphic (Inline graphic).

Remark 1

The normalizing constant Inline graphic in (4.2) ensures that the prior is proper for any configuration of the hyperparameters and must be calculated when any of the hyperparameters are random. Its analytic expression is derived in Section S2 of supplementary material available at Biostatistics online. The integral calculation must be updated at each step of the Markov Chain. This would become computationally intractable in the presence of a Jacobian from a non-linear transformation and is why we approximated the mapping as in (3.7).

Remark 2

A reviewer observed the similarity between NAB and the class of “penalized complexity” priors of Simpson and others (2017), both of which use the extreme end of the hyperparameter support (Inline graphic, in our notation) to essentially recapitulate some pre-specified simple model and prevent overfitting. Penalized complexity priors are defined for a much broader context and thus allow for more general base models, as opposed to our specific objective of incorporating historical information.

4.2. Sensible adaptive Bayes

For SAB, the modified prior in (3.9) is fully employed when Inline graphic, and any additional shrinkage of Inline graphic to zero is weak. However, because the sensible Bayesian update adjusts for the difference between Inline graphic and Inline graphic, it is not necessary to assume that Inline graphic. Thus, in SAB, the value of Inline graphic does not affect the contribution of the variance-reducing prior on Inline graphic. Defining the SAB hyperparameter Inline graphic to be

graphic file with name M242.gif

the SAB conditional prior is

graphic file with name M243.gif (4.3)

An expression for Inline graphic is derived in Section S3 of supplementary material available at Biostatistics online. As with NAB, a finite-valued Inline graphic ensures that Inline graphic is proper for any Inline graphic.

4.3. Hyperpriors

We describe here our choices of hyperprior for the hyperparameters Inline graphic, Inline graphic, and, for NAB, Inline graphic. The hyperpriors on the global and local shrinkage components, Inline graphic and Inline graphic, remain as given in Section 2.

The hyperparameter Inline graphic distributes prior weight between shrinkage to zero (Inline graphic) and historical shrinkage (Inline graphic). We consider two hyperprior options. The first, called agnostic, is uniform over the unit interval. The second is a truncated normal distribution with mean and standard deviation of 1 and 0.25, respectively. This is optimistic because the mode is Inline graphic, encouraging full use of the historical information.

The hyperparameter Inline graphic independently controls the historical prior shrinkage. This could simply be set to 1; we used an inverse-gamma distribution with shape and scale equal to 2.5.

Finally, the hyperparameter vector Inline graphic, used by NAB, controls the prior scale of Inline graphic when Inline graphic. As with Inline graphic, each element of Inline graphic could be set to 1, which would give that Inline graphic is normal with standard deviation Inline graphic when Inline graphic. We instead model the components of Inline graphic as inverse-gamma, each with shape and scale equal to 0.5, allowing for some elements of Inline graphic to be large.

5. Simulation study

We conducted a simulation study of logistic regression to evaluate our proposed methodology against a variety of data generating scenarios. All analyses were conducted in the R statistical environment (R Core Team, 2016; Wickham, 2009; van Buuren and Groothuis-Oudshoorn, 2011) and its interface with Stan (Carpenter, 2017; Stan Development Team, 2017, 2018), which numerically characterizes posterior distributions using Hamiltonian Monte Carlo. Code to reproduce the simulation study is available at https://github.com/psboonstra/AdaptiveBayesianUpdates.

Varying between each scenario were the fixed, unknown values of Inline graphic to be estimated (ten possibilities described in Table 2, ranging from Inline graphic to 100 predictors), the sample size of the historical data analyses (Inline graphic), and the sample size of the current data analyses (Inline graphic). For each of the 60 unique data generating scenarios, we independently sampled 128 “historical” and “current” datasets of size Inline graphic and Inline graphic, respectively. To generate the data, preliminary values of Inline graphic were sampled from multivariable normal distributions with constant correlation equal to 0.2. After this, half of the elements of each of Inline graphic and Inline graphic were transformed using the sign function, i.e. Inline graphic. Then, given the resulting Inline graphic vector, Inline graphic was sampled from a logistic regression with parameters Inline graphic fixed at one of the values in the third column of Table 2. The true generating value of the intercept in the historical data model (Inline graphic) was larger than that of the current data model (Inline graphic), yielding different marginal prevalences of the outcome. Each historical dataset consisted of Inline graphic independent draws of Inline graphic, whereas each current dataset consisted of Inline graphic independent draws of Inline graphic. In summary, the historical and current generating models differ in the true value of the intercept; the historical and current datasets structurally differ in that the former does not use Inline graphic.

Table 2.

Summary of fixed, true values of Inline graphic from the generating logistic regression model, Inline graphic used in the simulation study as well as the asymptotic true values of Inline graphic from the misspecified reduced model, Inline graphic

Label Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic-Inline graphic-Inline graphic-Inline graphic Inline graphic-Inline graphic-Inline graphic-Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic-Inline graphic-Inline graphic Inline graphic-Inline graphic-Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
     Inline graphic  
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic
     Inline graphic  
Inline graphic Inline graphic Inline graphic Inline graphic
     Inline graphic  

The “Inline graphic” symbol is used to distinguish between elements of Inline graphic and Inline graphic.

Remark 3

The intercept Inline graphic in Model (3.1) is distinct from Inline graphic described in the preceding paragraph: the former denotes the intercept from the asymptotically misspecified sub-model, and the latter is the true intercept from Model (3.2) that actually generated the historical data.

The fourth column of Table 2 gives the asymptotic parameter values from the misspecified logistic regression of Inline graphic on Inline graphic, which the historical data analysis estimates. To emulate the historical analysis, an initial Bayesian logistic regression was fit to the historical outcomes Inline graphic to estimate Model (3.1). We applied a regularized horseshoe prior on Inline graphic using (2.1) with Inline graphic and Inline graphic, Inline graphic, and Inline graphic to determine the value of Inline graphic. So, for example, when Inline graphic, the assumed effective number of non-zero parameters was Inline graphic, and when Inline graphic, solving Inline graphic yields Inline graphic. Fixing Inline graphic corresponds to the largest dispersion in a logistic GLM and usually results in slightly less than Inline graphic effective parameters compared with Inline graphic (Piironen and Vehtari, 2017a). We obtained samples from the historical posterior distribution Inline graphic and estimated Inline graphic and Inline graphic, the ingredients for the adaptive priors in the current data analysis. Then, the “current” analysis was conducted: a second Bayesian logistic regression to estimate the larger model in (3.2), using the current outcomes Inline graphic. Each of the five priors in the third column of Table 1 was paired with the likelihood of Inline graphic, yielding five posterior distributions to be compared. Four of these were adaptive Bayesian updates from in Section 4: two adaptive priors times two distinct hyperpriors on Inline graphic. The other was the regularized horseshoe in (2.1) and was used as a reference; we call this approach ‘Standard’. We used Inline graphic and Inline graphic, Inline graphic, and Inline graphic to solve for Inline graphic. All of the adaptive priors are equivalent to Standard when Inline graphic. For SAB, we estimated Inline graphic (3.9) using Monte Carlo methods based upon 100 independent draws from MICE. We measured performance using Inline graphic, where Inline graphic is the fixed, true value of the regression coefficient vector, and the expectation is taken over both the original and added covariates. For each of the adaptive Bayesian updates, we calculated its RMSE ratio with respect to Standard, such that ratios less than one indicate relatively better performance of the adaptive update. For each unique data generating mechanism, we report the distribution of 128 RMSE ratios. The top panel of Figure 1 plots the RMSE ratios from the first 5 rows of Table 2, for which Inline graphic and Inline graphic, and the bottom panel plots the ratios from the final 5 rows, for which Inline graphic.

Table 1.

Summary of posterior distributions evaluated in the simulation study

Labels Likelihood Prior Prior equation Hyperprior: Inline graphic
Standard Inline graphic Inline graphic Equation (2.1)
NAB(agnostic) Inline graphic Inline graphic Equation (4.1) UnifInline graphic
NAB(optimist) Inline graphic Inline graphic Equation (4.1) Inline graphic
SAB(agnostic) Inline graphic Inline graphic Equation (4.3) UnifInline graphic
SAB(optimist) Inline graphic Inline graphic Equation (4.3) Inline graphic

Because the same likelihood is used for all methods, any differences are due to priors used. The “Standard” approach is precisely the regularized horseshoe prior and used as a benchmark for comparing performance.

Fig. 1.

Fig. 1.

Boxplots of RMSE ratios (Inline graphic-axis, on the logInline graphic-scale) comparing four adaptive priors that make use of the historical information against a standard hierarchical shrinkage prior, plotted against varying sample sizes of the historical data (Inline graphic; Inline graphic-axis) for ten true values of the regression coefficients (Inline graphic, Inline graphic; columns) taken from Table 2 and varying sample sizes of the current data (Inline graphic; rows). Each boxplot compares the posteriors across 128 independent datasets. Smaller ratios indicate better performance of the corresponding adaptive prior.

More historical data, i.e. larger Inline graphic, improved the relative performance of the adaptive updates. For example, across coefficient settings Inline graphicInline graphic, with Inline graphic, the middle quartiles of the SAB(agnostic)/Standard RMSE ratio were Inline graphic when Inline graphic versus Inline graphic when Inline graphic. The NAB-type updates also improved with increasing Inline graphic, but, in absolute terms, they did not always improve upon Standard, e.g. for Inline graphic the quartiles of the RMSE ratio were Inline graphic when Inline graphic and Inline graphic when Inline graphic.

More current data, i.e. larger Inline graphic, hurt the typical relative performance of the adaptive updates, all other aspects being fixed, but also decreased the variability between datasets. This can be seen in Figure 1: when Inline graphic (the second and fourth rows), the boxplots move closer to 1.00 and with less spread relative to Inline graphic (the first and third rows, respectively). A fixed amount of historical data becomes relatively less valuable in the presence of more current data.

The adaptive priors were relatively less useful when Inline graphic was small: across all datasets in the top panel of Figure 1, for which Inline graphic, the middle quartiles of the RMSE ratios for NAB(agnostic) was Inline graphic, and for SAB(agnostic) it was Inline graphic; across all datasets in the bottom panel, for which Inline graphic, these were Inline graphic and Inline graphic, respectively. Comparing the adaptive priors, NAB outperformed SAB in scenarios for which the integral required by the latter, i.e. (3.8), was difficult to estimate well, e.g. Inline graphic.

Figures S1 and S2 of supplementary material available at Biostatistics online plot the separate RMSE ratios for Inline graphic and Inline graphic, respectively. There is a clear trend of improving performance for estimating Inline graphic, which is to be expected given its relationship to Inline graphic. For the small Inline graphic settings, the SAB-type updates were about equivalent to the standard approach in estimating Inline graphic, and the NAB-type updates were worse. Interestingly, in the larger Inline graphic settings, the adaptive priors were marginally better at estimating Inline graphic than the standard approach. Our findings were relatively unchanged upon increasing the shape and scale of Inline graphic, which scales Inline graphic, to 25 (Figure S3 of supplementary material available at Biostatistics online). Finally, Table S1 of supplementary material available at Biostatistics online summarizes the running time of each prior considered. The SAB-type updates were slowest to run, owing in part to the required imputation step.

6. Application: mortality risk prediction in pediatric ECMO patients

We demonstrate our methods on the data example discussed in the introduction. Ped-RESCUERS was fit to Inline graphic historical patients, and Inline graphic risk factors for short-term mortality were included. Our current data consists of Inline graphic patients, on which we have measured both the Inline graphic original and the Inline graphic added covariates, all of which are defined in Table S2 of supplementary material available at Biostatistics online. The overall mortality rate in the Ped-RESCUERS cohort was 40.8%; in the current cohort it was 26.4%.

We fit the following seven Bayesian logistic regression models. Ped-RESC is the model from Barbaro and others (2016) based upon the original covariates fit to the 1611 patients. Ped-RESC2 is this same model fit to the current 178 patients, using weakly informative Cauchy priors on the regression coefficients; we include this model so as to be able to assess differences due to study populations. The other five priors are as considered in the simulation study: a regularized horseshoe prior on all 22 parameters (“Standard”), and agnostic and optimistic versions for each of the SAB and NAB priors. For all five priors, we used Inline graphic to reflect an optimistic prior assumption that 11 of the 22 parameters are non-zero. To account for sporadic missingness in the current data (about 4% in total), we used a pseudo-Bayesian strategy proposed by Zhou and Reiter (2010). We first imputed 100 datasets using MICE. Then, for each completed dataset and each prior, we sampled 400 draws from the posterior distribution of the parameters conditional upon that completed dataset, concatenating these across imputations to construct a sample of Inline graphic posterior draws “averaged” over the imputations. The log-odds ratios (ORs), i.e. elements of Inline graphic, were standardized with respect to the observed distribution of the 178 current patients, allowing for a comparison of magnitudes both between and within priors.

Table 3 gives the posterior medians of the ORs. Also included is the larger of (i) Inline graphic and (ii) Inline graphic. Bolded results correspond to those with a Inline graphic probability of falling above or below 1, a simple binary indicator of variable importance. The first two blocks of rows correspond to the original covariates, and the second two blocks of rows correspond to the added covariates. Figure 2 presents boxplots of the posterior distributions of all ORs.

Table 3.

Posterior medians of standardized odds ratios (ORs) and, in parentheses, variable importance probabilities given as percentages, defined as the larger of (i) the posterior probability that an OR exceeds one and (ii) the posterior probability that an OR falls below one

Model pH PaCOInline graphic (mmHg) MAP(CMV) MAP(HFOV) Admit hours Intubated hours
      (cmHInline graphicO) (cmHInline graphicO) pre-ECMO (log) pre-ECMO (log)
PedRESC 0.46(100.0%) 0.70(96.3%) 1.37(93.9%) 1.43(98.6%) 1.42(99.3%) 1.62(99.5%)
PedRESC2 0.37(96.0%) 0.67(76.9%) 1.54(74.1%) 1.59(83.4%) 1.64(86.8%) 2.30(95.7%)
Standard 0.99(58.8%) 1.01(58.9%) 1.01(55.6%) 1.02(60.8%) 1.01(58.4%) 1.96(90.6%)
NAB(Agn) 0.58(94.5%) 0.88(70.1%) 1.29(84.3%) 1.39(93.0%) 1.33(91.5%) 1.71(98.3%)
NAB(Opt) 0.55(97.5%) 0.82(78.4%) 1.33(88.7%) 1.41(96.2%) 1.36(95.7%) 1.67(99.0%)
SAB(Agn) 0.90(73.7%) 1.02(56.9%) 1.10(69.2%) 1.14(77.4%) 1.03(59.2%) 2.25(99.0%)
SAB(Opt) 0.86(76.9%) 1.02(56.1%) 1.13(71.4%) 1.15(78.4%) 1.02(58.1%) 2.25(99.4%)
Model Malignancy Pre-ECMO DX:Asthma DX:Bronchiolitis DX:Pertussis  
    milrinone        
PedRESC 1.22(95.1%) 1.29(95.8%) 0.75(93.9%) 0.58(100.0%) 1.33(99.5%)  
PedRESC2 1.21(71.1%) 1.31(76.6%) 0.05(99.2%) 0.76(75.1%) 2.07(97.9%)  
Standard 1.00(50.8%) 1.00(51.4%) 0.94(69.8%) 0.98(63.6%) 1.33(85.2%)  
NAB(Agn) 1.15(80.7%) 1.17(81.5%) 0.70(91.4%) 0.65(95.4%) 1.43(98.8%)  
NAB(Opt) 1.17(87.0%) 1.21(87.8%) 0.71(93.6%) 0.63(98.2%) 1.40(99.4%)  
SAB(Agn) 1.03(59.2%) 0.99(54.9%) 0.96(60.6%) 0.61(96.4%) 1.50(96.8%)  
SAB(Opt) 1.03(59.5%) 0.99(52.6%) 0.97(60.0%) 0.57(97.9%) 1.49(96.9%)  
Model Abnormal Bilirubin ALT U/L (log) Extent of Extent of Extent of
  pupillary resp. mg/dL (log)   leukocyt. (log) leukopen. (log) thrombocytopen. (log)
PedRESC
PedRESC2
Standard 0.99(56.4%) 1.07(72.5%) 5.60(99.8%) 1.04(68.7%) 0.97(66.6%) 1.01(55.1%)
NAB(Agn) 0.99(55.1%) 1.05(71.5%) 4.88(99.6%) 1.03(63.8%) 0.98(62.5%) 1.01(54.3%)
NAB(Opt) 0.99(54.6%) 1.04(69.6%) 4.92(99.5%) 1.02(63.1%) 0.98(61.5%) 1.01(54.1%)
SAB(Agn) 0.99(57.5%) 1.13(75.5%) 4.87(99.8%) 1.05(66.4%) 0.95(66.5%) 1.01(55.3%)
SAB(Opt) 0.99(57.8%) 1.12(75.2%) 4.60(99.7%) 1.04(64.6%) 0.96(65.1%) 1.01(55.3%)
Model INR VIS (log) Lactate PF ratio (log) Pre-ECMO acute  
      mMol/L (log)   kidney injury  
PedRESC  
PedRESC2  
Standard 1.02(62.0%) 1.01(57.9%) 1.57(85.6%) 0.94(71.4%) 1.00(53.7%)  
NAB(Agn) 1.01(57.9%) 1.01(53.2%) 1.10(77.6%) 0.96(68.8%) 1.00(50.2%)  
NAB(Opt) 1.01(56.7%) 1.00(53.0%) 1.07(75.1%) 0.97(67.6%) 1.00(50.0%)  
SAB(Agn) 1.05(66.8%) 1.02(59.1%) 1.94(90.7%) 0.87(77.2%) 1.00(52.8%)  
SAB(Opt) 1.05(66.8%) 1.02(60.1%) 1.97(90.6%) 0.88(76.8%) 1.00(51.9%)  

Results in bold have percentages exceeding 75.0%.

Fig. 2.

Fig. 2.

Boxplots of posterior draws of odds ratios (ORs) from seven Bayesian logistic models using different priors. All models estimate ORs corresponding to the original covariates (the bottom eleven rows). Five models (all except “PedRESC” and “PedRESC2”) estimate ORs corresponding to the added covariates (the top eleven rows).

Comparing PED-RESC and PED-RESC2, the direction and magnitude of the observed associations in the sets of original covariates were consistent between the two cohorts, with one exception: in PED-RESC2, no patients with a primary diagnosis of asthma died, i.e. the data were quasi-complete separated. All variable importance probabilities were generally closer to 1 in PED-RESC, due to its larger sample size. The regularized horseshoe (Standard) shrinks nearly all ORs, both original and added, close to one. This is one consequence of the size of Inline graphic relative to Inline graphic. In contrast, all of the adaptive priors recover some of the original associations from PED-RESC.

Using the NAB-type priors, the importance probabilities for the original risk factors were all greater than 75%, as well as those of added covariates of ALT and lactate. These two were also important according to Standard. Neither of the SAB priors found PaCOInline graphic, MAP(CMV), admit hours pre-ECMO, malignancy, preECMO milrinone, or DX:Asthma to be important and, among the added covariates, identified bilirubin, ALT, lactate, and PF ratio as important. The posterior means of Inline graphic were 0.61 and 0.64, respectively, for the agnostic versions of NAB and SAB, and 0.84 and 0.82, respectively, for the optimistic versions.

From Figure 2, there are two general differences between Standard and the adaptive priors. For the two original covariates that Standard also identified as important (DX:Pertussis and intubated hours pre-ECMO), the posterior variability of the adaptive priors is smaller than Standard. Among the remaining original covariates, the adaptive priors have larger posterior variability than Standard; this is a consequence of the shrinkage-to-zero prior, which yields small posterior variance for coefficients that it identifies as likely to be zero-valued. This does not mean Standard is automatically preferred, because some of this shrinkage likely reflects an inability to reliably estimate parameters rather than confidence that they are truly zero. In general, the SAB-type priors deviate more from PED-RESC than the NAB-type priors: NAB shrinks Inline graphic directly towards the PED-RESC estimates, whereas SAB shrinks a linear combination of Inline graphic and Inline graphic toward the PED-RESC estimates.

7. Discussion

We have proposed novel adaptive Bayesian updates of a GLM when the historical model only includes a subset of the covariates of interest. The priors, with acronyms ‘NAB’ or ‘SAB’, adaptively combine prior information from the historical model, including underlying statistical uncertainty, with variance-reducing shrinkage to zero. Thus, they are flexible enough for use in many contexts, ranging from the historical information being highly informative about a few parameters to being weakly informative about most parameters, as evidenced by our simulation study. They generally outperformed or matched in performance a standard approach that ignores historical information. We demonstrated these ideas in our motivating case study for predicting short-term mortality risk in pediatric ECMO patients.

Specifically, we combined a registry-based mortality risk prediction model (the historical model) fit to 1611 patients’ data with a broader model that includes biometric measurements (the current model) recorded on 178 patients. A standard shrinkage prior shrunk most ORs, both original and added, to one, which is typical behavior for such priors in the presence of substantial uncertainty. Taking into account a clinical perspective, it seems unlikely that only two of the eleven original covariates identified by Ped-RESCUERS remain as risk factors for mortality after including the added biometric measurements. Agreeing more with our clinical expectation, the adaptive updates included between five and all eleven of the original Ped-RESCUERS risk factors and also identified two (NAB) or four (SAB) of the added covariates as likely risk factors. Some of this “loss” of importance may be due to correlation between the original and added covariates: PaCOInline graphic and lactate are negatively correlated, both associated with the degree of acidosis in the body. Similarly, bilirubin and ALT both measure liver damage, which may explain that the NAB priors focused on ALT alone whereas SAB found both bilirubin and ALT to be important. One surprising non-finding was SAB’s failure to identify malignancy as a meaningful risk factor. NAB did identify malignancy as important, and this is more consistent with the Ped-RESCUERS model as well as our expectation based upon clinical experience and intuition.

The SAB prior depends upon both a simplifying functional approximation (3.6) and an imputation model for Inline graphic given Inline graphic (3.7). In simulated scenarios for which the imputation model was readily estimated and had sufficient predictive ability, namely Inline graphic, these approximations yielded considerable efficiency gains. Its advantage over NAB was most evident in coefficient setting Inline graphic, in which the true value of Inline graphic differed from the misspecified true value of Inline graphic, and so NAB was substantially worse than Standard because the former was biased. Subsequent investigations have also suggested that SAB improves as Inline graphic is closer to zero. In contrast, in settings Inline graphic and Inline graphic, for which Inline graphic and NAB slightly outperformed SAB, the need for imputation was likely to the detriment of SAB (although it still outperformed Standard). Such Inline graphic and small Inline graphic scenarios would correspond, for example, to an exploration of adding a panel of biomarkers, Inline graphic, which are available on just a few subjects, to an established risk prediction model, Inline graphic. In that setting, the covariates Inline graphic are probably weak predictors of Inline graphic. Empirically, we have found that estimating Inline graphic with 100 imputations worked as a good rule of thumb. Of course, there may be underlying differences between the current and historical populations in the true Inline graphic distribution that no amount of current data or number of imputations could recover. NAB is free of this particular distributional assumption and therefore not automatically inferior to SAB in all scenarios, despite the implied value judgment in our nomenclature of ‘naive’ versus ‘sensible’. The only difference between the historical priors in (3.3) and (3.9) being the additive offset Inline graphic in (3.9), replacing it with Inline graphic, Inline graphic, may be one way to leverage the advantages of both adaptive priors. Importantly, both NAB and SAB ignore information on the historical intercept, meaning that they assume that the historical and current models share Inline graphic in common but not necessarily the full data-generating mechanism Inline graphic.

Shrinkage methods classically make a bias-variance tradeoff to improve overall performance: bias in the direction of zero in exchange for a reduction in variance to improve efficiency. In contrast, the adaptive Bayesian updates we propose, which balance between historical-based shrinkage and shrinkage to zero, are making a bias-bias tradeoff. Both extremes of the adaptive priors (Inline graphic and Inline graphic) reduce variance, and the question is rather one of determining, which type of shrinkage is less biased.

Supplementary Material

kxy053_Supplementary_Materials

Acknowledgments

The authors thank two referees, the Associate Editor and the Coordinating Editor for their thorough reviews and helpful comments. Conflict of Interest: None declared.

Funding

This work was partially supported by the National Institutes of Health [P30 CA046592; R01 CA129102] and the Extracorporeal Life Support Organization.

References

  1. Antonelli J., Zigler C. and Dominici F. (2017). Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research. Biostatistics 18, 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Armagan A., Dunson D. B. and Lee J. (2013). Generalized double Pareto shrinkage. Statistica Sinica 23, 119–143. [PMC free article] [PubMed] [Google Scholar]
  3. Barbaro R. P., Boonstra P. S., Kuo K. W., Selewski D. T., Bailly D. K., Stone C. L., Chow J., Annich G. M., Moler F. W. and Paden M. L. (2018). Evaluating pediatric mortality risk prediction among children receiving extracorporeal respiratory support. ASAIO Journal 10.1097/mat.0000000000000813. [DOI] [PubMed] [Google Scholar]
  4. Barbaro R. P., Boonstra P. S., Paden M. L., Roberts L. A., Annich G. M., Bartlett R. H., Moler F. W. and Davis M. M. (2016). Development and validation of the pediatric risk estimate score for children using extracorporeal respiratory support (Ped-RESCUERS). Intensive Care Medicine 42, 879–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Becker B. J. and Wu M.-J. (2007). The synthesis of regression slopes in meta-analysis. Statistical Science 22, 414–429. [Google Scholar]
  6. Carpenter B. (2017). Stan: a probabilistic programming language. Journal of Statistical Software 76, 1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carvalho C. M., Polson N. G. and Scott J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–480. [Google Scholar]
  8. Carvalho C. M., Polson N. G. and Scott J. G. (2009). Handling sparsity via the horseshoe In: van Dyk D. and Welling M. (editors), Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, Volume 5 Proceedings of Machine Learning Research, Clearwater Beach, Florida USA: PMLR, pp. 73–80. [Google Scholar]
  9. Chatterjee N., Chen Y.-H., Maas P. and Carroll R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association 111, 107–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen H., Manning A. K. and Dupuis J. (2012). A method of moments estimator for random effect multivariate meta-analysis. Biometrics 68, 1278–1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chen M.-H., Ibrahim J. G. and Yiannoutsos C. (1999). Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 223–242. [Google Scholar]
  12. Cheng W., Taylor J. M. G.,Vokonas P. S.,Park S. K. and Mukherjee B. (2018). Improving estimation and prediction in linear regression incorporating external information from an established reduced model. Statistics in Medicine 37, 1515–1530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Duan Y., Ye K. and Smith E. P. (2006). Evaluating water quality using power priors to incorporate historical information. Environmetrics 17, 95–106. [Google Scholar]
  14. Gelman A., Carlin H. B., Stern H. S., Dunson D. B., Vehtari A. and Rubin D. B. (2014). Bayesian Data Analysis, 3rd edition. Boca Raton, FL: CRC Press. [Google Scholar]
  15. Griffin J. E. and Brown P. J. (2005). Alternative prior distributions for variable selection with very many more variables than observations Working Paper 10, University of Warwick. Centre for Research in Statistical Methodology. [Google Scholar]
  16. Grill S., Ankerst D. P., Gail M. H., Chatterjee N. and Pfeiffer R. M. (2017). Comparison of approaches for incorporating new information into existing risk prediction models. Statistics in Medicine 36, 1134–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ibrahim J. G. and Chen M.-H. (2000). Power prior distributions for regression models. Statistical Science 15, 46–60. [Google Scholar]
  18. Ibrahim J. G., Chen M.-H., Gwon Y. and Chen F. (2015). The power prior: theory and applications. Statistics in Medicine 34, 3724–3749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ibrahim J. G., Chen M.-H. and Lipsitz S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics 30, 55–78. [Google Scholar]
  20. Jackson D. and Riley R. D. (2014). A refined method for multivariate meta-analysis and meta-regression. Statistics in Medicine 33, 541–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Neuenschwander B., Branson M. and Spiegelhalter D. J. (2009). A note on the power prior. Statistics in Medicine 28, 3562–3566. [DOI] [PubMed] [Google Scholar]
  22. Park T. and Casella G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103, 681–686. [Google Scholar]
  23. Piironen J. and Vehtari A. (2015). Projection predictive variable selection using Stan+R. arXiv preprint arXiv:1508.02502. https://arxiv.org/abs/1508.02502v1. [Google Scholar]
  24. Piironen J. and Vehtari A. (2017a). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In: Singh A. and Zhu J. (editors), Proceedings of the 20th International Conference on Articial Intelligence and Statistics, Volume 54, Proceedings of Machine Learning Research Fort Lauderdale, FL, USA: PMLR, pp. 905–913. [Google Scholar]
  25. Piironen J. and Vehtari A. (2017b). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics 11, 5018–5051. [Google Scholar]
  26. R Core Team. (2016). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; https://www.R-project.org/. [Google Scholar]
  27. Simpson D., Rue H., Riebler A., Martins T. G. and Sørbye S. H. (2017). Penalising model component complexity: A principled, practical approach to constructing priors. Statistical Science 32, 1–28. [Google Scholar]
  28. Stan Development Team (2017). Stan Modeling Language User’s Guide and Reference Manual, Version 2.17.0. http://mc-stan.org/.
  29. Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.17.3. [Google Scholar]
  30. van Buuren S. and Groothuis-Oudshoorn K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software 45, 1–67. [Google Scholar]
  31. Walker E., Hernandez A. V. and Kattan M. W. (2008). Meta-analysis: Its strengths and limitations. Cleveland Clinic Journal of Medicine 75, 431–439. [DOI] [PubMed] [Google Scholar]
  32. Wickham H. (2009). ggplot2: Elegant Graphics for Data Analysis. New York, NY: Springer. [Google Scholar]
  33. Zhou X. and Reiter J. P. (2010). A note on Bayesian inference after multiple imputation. The American Statistician 64, 159–163. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxy053_Supplementary_Materials

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES