Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 11.
Published in final edited form as: Stat Modelling. 2018 Oct 21;19(6):634–652. doi: 10.1177/1471082x18798067

Bayesian Causal Mediation Analysis with Multiple Ordered Mediators

Tianming Gao 1, Jeffrey M Albert 2
PMCID: PMC7731976  NIHMSID: NIHMS992976  PMID: 33312071

Abstract

Causal mediation analysis provides investigators insight into how a treatment or exposure can affect an outcome of interest through one or more mediators on causal pathway. When multiple mediators on the pathway are causally ordered, identification of mediation effects on certain causal pathways requires a sensitivity parameter to be specified. A mixed model-based approach was proposed in the Bayesian framework to connect potential outcomes at different treatment levels, and identify mediation effects independent of a sensitivity parameter, for the natural direct and indirect effects on all causal pathways. The proposed method is illustrated in a linear setting for mediators and outcome, with mediator-treatment interactions. Sensitivity analysis was performed for the prior choices in the Bayesian models. The proposed Bayesian method was applied to an adolescent dental health study, to see how social economic status can affect dental caries through a sequence of causally ordered mediators in dental visit and oral hygiene index.

Keywords: Bayesian, Causally ordered, Multiple mediators, Sensitivity analysis

1. Introduction

In social and biomedical sciences, researchers are often interested in the effect of a treatment or exposure on an outcome through one or more mediators. Much work has been done in the case of a single mediator. Baron and Kenny (1986) provided a ‘causal steps’ approach to assessing a potential mediator, but this method is limited to simple linear models for the mediator and outcome where no interaction of mediator and treatment on outcome is allowed. To deal with more complex models, methods using a potential outcomes framework have been proposed (Robins and Greenland, 1992; Pearl, 2001; Albert, 2008; Imai et al., 2010).

In the case of multiple mediators, one can focus on the effect mediated through the whole group of mediators simultaneously (VanderWeele and Vansteelandt, 2014), or on a finer structured mediation through each of the multiple mediators. For the latter, methods have been proposed for causally independent mediators (MacKinnon, 2000; Imai and Yamamoto, 2013; Wang et al., 2013; Taguri et al., 2015).

For multiple causally ordered mediators, previous work has treated the first mediator on the causal pathway as a post-exposure confounder to focus on mediation effects through the second mediator only (Tchetgen and Shpitser, 2012; Imai and Yamamoto, 2013; VanderWeele and Chiba, 2014; VanderWeele et al., 2014; De Stavola et al., 2015).

Baron and Kenny (1986)’s approach can be extended to two or more causally ordered mediators, provided all mediators and outcome are linearly modeled with no interaction between treatment and any mediator; mediation effects can then be identified as products of regression coefficients, namely, the coefficient of the treatment in the mediator model times the coefficient of the mediator in the final outcome model. When interactions of mediators and treatment are present in the linear models, the product of regression coefficients approach is no longer adequate, generally, to estimate the mediation effect and potential outcomes approaches are needed to identify mediation effects (VanderWeele and Vansteelandt, 2009).

Albert and Nelson (2011) first presented a generalized mediation analysis through two ordered mediators with identification for all paths, including those through the first mediator alone, and through both mediators. Daniel et al. (2015) provided identification of all causal pathways via the finest decomposition of the total causal effect into direct and indirect effects, in a setting of linear models with possible mediator by treatment interactions. Even after making an extended sequential ignorability assumption, certain path effects through the first mediator on the causal pathway are not identifiable without further assumptions. Sensitivity analysis was proposed using a parameter representing the association of the potential outcomes of the first mediator at the two different treatment levels (Albert and Nelson, 2011; Daniel et al., 2015). The mediation effects can be estimated at each specified value of the sensitivity parameter and the dependence of the estimates on the sensitivity parameter can be assessed.

The literature has not yet considered the use of a distribution for the sensitivity parameter (over its space) which could provide an overall mediation effect estimate. Such an approach would be especially appealing when the researchers have some prior knowledge of the sensitivity parameter. Even when no such prior knowledge is available, it may still be possible and useful to obtain an overall mediation effect.

Bayesian analysis provides a perfect tool for such purposes. Computation of an overall mediation effect that provides a summary measure over a distribution on the sensitivity parameter space can be implemented naturally in the Bayesian framework by giving a prior distribution to the sensitivity parameter. Previous approaches from Bayesian inference in causal mediation analysis include extensions of Baron and Kenny (1986)’s linear model approach to Bayesian models (Yuan and MacKinnon, 2009), and potential outcomes based Bayesian inference in the case of a single mediator (Park and Kaplan, 2015).

In this work, we present methods for causal mediation analysis to identify causal effects along all pathways using a Bayesian framework. The Markov Chain Monte Carlo (MCMC) algorithm, proposed by Tanner and Wong (1987), is employed in this work to conduct the Bayesian analysis. Convergence of the Markov chains is checked both visually in trace plot and with Raftery-Lewis diagnostic to ensure posterior draws are from the stationary distributions (Raftery and Lewis, 1995).

The organization of this paper is as follows. We first provide notation and definitions of mediation effects in the potential outcomes framework. In the next part, Bayesian causal mediation analysis for two causally ordered mediators is introduced. The performance of the model is then evaluated in simulation studies, and sensitivity analysis is conducted on the simulated data. We then apply the proposed Bayesian methods to a study of adolescent dental health, where the effect of social economic status (SES) on dental caries occurs via causal pathways through ordered mediators of dental visits and oral hygiene index (OHI). The paper ends with additional discussions about the proposed methods.

2. Notation

The notation follows the potential outcomes framework of Rubin (1974). Let Ti denote the observed binary treatment (or exposure) for subject i, with Ti = 1 if treated and Ti = 0 if not, Mi denote the observed mediator status for subject i, and Yi denote the observed outcome for subject i. The subscript i is dropped in subsequent notations for simplicity. We use the following notation for potential outcomes (that is, outcomes that would be obtained under a given, possibly contrary to observed, treatment status). M(t) is the potential outcome of mediator M if the treatment level is set to t. Y(t) is the potential outcome of Y if the treatment level is set to t. Y(t,m) is the potential outcome of Y if the treatment level is set to t, and the mediator level is set to m. Y(t, M(t′)) is the potential outcome of Y if the treatment level is set to t, and the mediator level is set to the level it would take when the treatment level is t′.

Under sequential ignorability assumptions (Imai et al., 2010), the natural direct effect is identified as

D(t)=E{Y(1,M(t))Y(0,M(t))},t=0,1

and the natural indirect effect is identified as,

I(t)=E{Y(t,M(1))Y(t,M(0))},t=0,1

The expectations are obtained as averages over the individuals.

The total causal effect is defined as,

TCE=E{Y(1,M(1))Y(0,M(0))}=D(t)+I(1t),t=0,1

Thus, in the single mediator case, there are two ways of decomposing the total causal effect into a natural direct effect and a natural indirect effect.

In the case of two causally ordered mediators, we would need the following potential outcomes to specify mediation effects along all pathways: M1(t),M2(t,m1),Y(t,m1,m2),M2(t,M1(t)) and Y(t,M1(t),M2(t,M1(t))), which can be similarly defined from extensions of a single mediator case earlier in this section.

There are a total of 16 versions of potential outcomes of Y as each of t, t′, t″ and t‴ can take values of either zero or one. The direct and indirect effects can be defined from the difference between pairs of expected values of potential outcomes for Y.

For natural direct effects, there are eight possible versions depending on t′, t″ and t‴,

D(t,t,t)=E{Y(1,M1(t),M2(t,M1(t)))Y(0,M1(t),M2(t,M1(t)))}

For natural indirect effects through M1 alone, the general expression is,

I1(t,t,t)=E{Y(t,M1(1),M2(t,M1(t)))Y(t,M1(0),M2(t,M1(t)))}

For natural indirect effects through M2 alone,

I2(t,t,t)=E{Y(t,M1(t),M2(1,M1(t)))Y(t,M1(t),M2(0,M1(t)))}

For natural indirect effects through both M1 and M2,

I12(t,t,t)=E{Y(t,M1(t),M2(t,M1(1)))Y(t,M1(t),M2(t,M1(0)))}

In above definition of mediation effects, t, t′, t″ and t‴ can take value 0 or 1, rendering eight versions each for natural direct effect D and natural indirect effects I1, I2 and I12.

Total causal effect is:

TCE=E{Y(1,M1(1),M2(1,M1(1)))Y(0,M1(0),M2(0,M1(0)))}

As Daniel et al. (2015) showed, there are 24 ways of decomposing the total causal effect into particular path effects of D, I1, I2 and I12. Causal mediation effects are initially conditioned on a set of covariates (potential confounders). Marginal causal mediation effects are then obtained by integrating or summing over the theoretical or empirical distribution of confounders.

3. Bayesian Causal Mediation Analysis with Multiple Causally Ordered Mediators

3.1. Identification of Mediation Effects

As shown by Daniel et al. (2015), identification of mediation effects with two ordered mediators, defined using the potential outcomes framework outlined in section 2, requires an extension of sequential ignorability assumptions from Imai et al. (2010). The extended assumptions are presented in equation (3.1), where X is a vector of baseline confounders to be controlled for:

{Y(t,m1,m2),M2(t,m1),M1(t)}TX=x{Y(t,m1,m2),M2(t,m1)}M1(t)T=t,X=xY(t,m1,m2)M2(t,m1)T=t,M1=m1,X=x (3.1)

The sequential ignorability assumptions in equation (3.1) indicate that conditional on baseline confounder vector X, the treatment T is randomized (ignorable) for subsequent mediators M1, M2 and outcome Y. The assumptions also indicate that conditional on confounder vector X and observed treatment T, mediator M1 is randomized (ignorable) for mediator M2 and outcome Y; and conditional on observed confounder list X, treatment T and mediator M1, mediator M2 is randomized (ignorable) for outcome Y. That is, there are no unmeasured confounders for the Y—{T, M1, M2}, M2—{T, M1} and M1T relationships. Under these assumptions, as well as consistency (see Daniel et al. (2015)), the expected potential outcomes of Y can be expressed as

E{Y(t,M1(t),M2(t,M1(t)))}=XM1M1M2E(YT=t,M1=m1,M2=m2,X=x)×fM2T,M1,x(m2t,m1,x)×fM1(t),M1(t)T,T,x(m1,m1t,t,x)×fX(x)dμM2(m2)dμM1(m1)dμM1(m1)dμX(x) (3.2)

One term from equation (3.2), fM1(t),M1(t)T,T,x(m1,m1t,t,x), is not observed from the data when t′t‴, as it involves the joint distribution of potential values of mediator M1 at different treatment levels. Consequently, an additional assumption is needed for identification of mediation (path) effects involving potential outcomes where t′t‴. In the case of t′ = t″, the joint distribution of two potential values of mediator M1 is reduced to a marginal distribution of a single potential outcome of M1, and is identifiable under the extended sequential ignorability assumptions.

Daniel et al. (2015) considered the relationship between M1(0) and M1(1), where the correlation between the ‘cross-world’ potential outcomes can vary from totally independent to perfectly correlated (given X). A sensitivity parameter was introduced varying from zero, for complete independence between M1(0) and M1(1), to one, for perfect correlation between M1(0) and M1(1). The mediation effects can then be identified up to this sensitivity parameter.

In this work, a mixed model-based approach is proposed and implemented in a Bayesian framework so that ‘cross-world’ potential outcomes are jointly estimated and identification of mediation effects for all causal pathways is achieved. Moreover, the mediation effects are not identified up to a sensitivity parameter, but are summarized over a distribution of the sensitivity parameters.

From equation (3.2) we know that the joint distribution of potential outcomes of the first mediator, M1, at different treatment levels is needed to identify all mediation effects. However, only one of the potential outcomes from M1(0) and M1(1) is observed depending on the observed treatment level, so we would need an additional assumption to specify the joint distribution of the two potential outcomes of mediator M1.

The approach we propose is based on a mixed model for potential outcomes for M1. In mixed models, besides the fixed effects, we have random effects, For example, the model may include a random intercept and a random slope, and a correlation between the random intercept and the random slope can be specified in the covariance structure for the random effects.

The mixed model for mediator M1 will include treatment T and possibly a vector of confounders X as fixed effect covariates, as well as subject-specific random intercept and random slope of treatment T (linking both potential outcomes on same subject), with the subjects being the individual patients in the study. Unfortunately, M1 is observed either at treatment level of zero or one for each subject, but not at both levels. So it is impossible to fit a regular mixed model with a random slope of treatment T. A mixed model with only random intercept can be still fitted given the observed data, since all we need for that model is the observed M1 and the estimates based on the fixed effects. If we introduce a correlation between the estimable random intercept and the inestimable random slope, we can then directly estimate the random slope given the observed random intercept to make inference on M1(0) from M1(1) or vice versa.

The mixed model for linear mediator M1 is given here in equation (3.3), where M1i is the value of M1, Ti is a binary treatment and Xi, is the baseline covariate vector, for subject i. The mixed model has β=(β0,β1,β2) as the fixed effects, ϵi as the normal errors, and bi = (b0i, b1i)′ as the random effects for subject i which are distributed as bivariate normal with mean 0 and covariance matrix Vb.

M1i=β0+β1Ti+β2Xi+b0i+b1iTi+ϵiϵi~N(0,σ12)bi=(b0i,b1i)~N(0,Vb) (3.3)

The potential values of M1 for subject i can then be predicted from the following equations:

M1i(0)=β0+β2Xi+b0i+ϵiM1i(1)=β0+β1+β2Xi+b0i+b1i+ϵi

The covariance matrix of random effects Vb in equation (3.3) can be considered as sensitivity parameters to link M1 (0) and M1(1) on the same subject and allow identification along all causal pathways. If we supply priors to the covariance matrix, and to all other regression parameters in the mediators and outcome models, then a fully Bayesian approach can be carried out to estimate all potential outcomes and subsequently direct and indirect effects corresponding to all causal pathways.

3.2. Proposed Bayesian Models for Causal Mediation Analysis with Two Ordered Mediators

In this section, we present a complete Bayesian mediation analysis for linear structural models involving two causally ordered mediators and interactions between mediators and treatment. In equation (3.3), a mixed model was presented for linear mediator M1; we now extend this to the Bayesian framework by supplying priors to all parameters in the model including fixed and random effects parameters. The Bayesian model for mediator M1 is proposed as,

M1i=β0+β1Ti+β2Xi+b0i+b1iTi+ϵ1iϵ1i~N(0,σ12)bi=(b0i,b1i)~N(0,Vb)β=(β0,β1,β2)~N(μβ,Vβ)σ12~InvGamma(ν,δ)Vb~InvWishart(r,rR) (3.4)

where a multivariate normal prior is given to the fixed effects vector β, an inverse-gamma prior is given to the error variance σ12, and an inverse-Wishart prior is given to the covariance matrix of random effects Vb. All priors are in the form of conjugate priors to facilitate the calculation of posterior parameter distributions, which is true for model (3.5) and (3.6) as well.

The Bayesian model for the mediator M2 is proposed as,

M2i=γ0+γ1Ti+γ2M1i+γ3Ui+γ12TiM1i+ϵ2iϵ2i~N(0,σ22)γ=(γ0,γ1,γ2,γ3,γ12)~N(μγ,Vγ)σ22~InvGamma(ν1,δ1) (3.5)

where Ui is a list of confounders to adjust for mediator M2. A multivariate normal prior is given to the vector γ, and an inverse-gamma prior is given to the error variance σ22.

The Bayesian model for the outcome Y is proposed as,

Yi=α0+α1Ti+α2M1i+α3M2i+α4Wi+α12TiM1i+α13TiM2i+α23M1iM2i+α123TiM1iM2i+ϵ3iϵ3i~N(0,σ32)α=(α0,α1,α2,α3,α4,α12,α13,α23,α123)~N(μα,Vα)σ32~InvGamma(ν2,δ2) (3.6)

where Wi is a vector of confounders. A multivariate normal prior is given to the vector α, and an inverse-gamma prior is given to the error variance σ32. The models for M1, M2 and Y do not have to include the same set of confounders.

The estimation of potential outcomes and subsequently mediation effects is based upon an extension of Pearl’s mediation formula (related to Robins’ g-computation algorithm) into the Bayesian framework (Robins, 1986; Pearl, 2012). In the frequentist literature, Monte Carlo methods have been proposed when integrations in the mediation formula or g-computation approach are too complicated to implement (Robins, 1989). The g-computation method was modified for Bayesian inference in a case of a single mediator (Park and Kaplan, 2015).

The Bayesian causal mediation analysis with two causally ordered mediators is carried out in the following steps.

  1. The model (3.4) for mediator M1 is fit with priors specified on all parameters, including fixed and random effects, as well as variance parameters.

  2. The values of potential outcomes M1(0) and M1(1) are predicted jointly for each individual using the M1 model parameter posteriors from step 1.

  3. The model (3.5) for mediator M2 is fit with priors specified on all parameters.

  4. The values of M2(t,M1(t)) for each individual are calculated using the predicted value of M1(t‴) from step 2, treatment level t″ and interaction term of t″ × M1(t‴), with the parameter posteriors from the M2 model in step 3.

  5. The model (3.6) for outcome Y is fit with priors specified on all parameters.

  6. The predicted potential outcomes of Y(t,M1(t),M2(t,M1(t))) for each individual are then calculated using the predicted values of M2(t,M1(t)) from step 4, the predicted values of M1(t) from step 2 on the same individual, treatment level t, and relevant interaction terms t×M1(t),t×M2(t,M1(t)), M1(t)×M2(t,M1(t)), t×M1(t)×M2(t,M1(t)), along with parameter posteriors from the Y model in step 5.

  7. The natural direct and indirect effects are calculated from the predicted potential outcomes Y(t,M1(t),M2(t,M1(t))) for each individual, and then averaged over individuals for causal mediation effects for the population.

  8. The mediation effects are then summarized from the posterior distributions, using posterior means, Bayesian credible intervals, and frequentist measures such as the standard deviation and mean square error.

To simplify the algorithm and reduce the computational burden, the number of posterior draws from models (3.4), (3.5) and (3.6) are the same, and this determines the number of generated potential mediator and outcome values for each individual. For a given subject, a posterior value of potential mediators and outcome can be viewed as a random sample from the posterior distribution of potential mediators and the outcome.

Markov chain Monte Carlo (MCMC) methods are used to fit the Bayesian models. The number of Markov chains is one. The number of burn-in iterations is 500, and total number of iterations is 1500, so the number of draws from each stationary posterior distribution is 1000.

The MCMC algorithms are implemented in statistical software of R and the codes are available in the supplemental materials. The computations for a modest size data set are fairly quick. The time to run the analysis using the adolescent dental data (N=200) was less than two minutes on a computer with Intel Core i5-7500 processor.

3.3. Simulation Studies

The performances of the proposed Bayesian mediation analysis methods are evaluated in simulation studies in this section. As suggested by Little (2006), calibrated Bayes, a compromise between frequentist and Bayesian approach, could capture the strengths of both approaches, and inference under a certain model should be Bayesian, but model assessment must include frequentist approaches. In this section, frequentist properties like bias, mean square error and standard deviation will be evaluated to assess the Bayesian mediation analysis methods.

To evaluate the proposed Bayesian mediation analysis methods with multiple ordered mediators, we need to first obtain the true mediation effects under a set of known parameters (in models (3.4) to (3.6)) in order to calculate the bias and mean square error. The interaction of mediators and treatment in the linear models makes closed form solutions unavailable for all mediation effects. Instead, the Monte Carlo method was used to calculate the true mediation effects under a set of known parameters. Monte Carlo estimates converge to the corresponding true values at a rate of 1N from the central limit theorem where N is the sample size. In the simulations, true mediation effects were calculated from the mean of the estimates from 100 Monte Carlo simulations, each using a sample size of 10 000 to further reduce the Monte Carlo error.

In the simulation studies, the models for mediators and outcome are given in equations (3.7) to (3.9), where an interaction of mediator M1 and treatment T was included in the models for mediator M2 and outcome Y. For outcome Y, except for the interaction of mediator M1 and treatment T, no other two-way interactions or three-way interaction were included in the model; we note that models involving three-way interactions would require much larger sample size for accurate estimation of potential outcomes and mediation effects.

M1i=β0+β1Ti+β2Xi+b0i+b1iTi+ϵ1iϵ1i~N(0,σ12)bi=(b0i,b1i)~N(0,Vb)β=(β0,β1,β2)~N(μβ,Vβ)σ12~InvGamma(ν,δ)Vb~InvWishart(r,rR) (3.7)
M2i=γ0+γ1Ti+γ2M1i+γ3Xi+γ12TiM1i+ϵ2iϵ2i~N(0,σ22)γ=(γ0,γ1,γ2,γ3,γ12)~N(μγ,Vγ)σ22~InvGamma(ν1,δ1) (3.8)
Yi=α0+α1Ti+α2M1i+α3M2i+α4Xi+α12TiM1i+ϵ3iϵ3i~N(0,σ32)α=(α0,α1,α2,α3,α4,α12)~N(μα,Vα)σ32~InvGamma(ν2,δ2) (3.9)

In the following simulation studies, the covariate, X, common to all three models in (3.7) to (3.9), is binary taking values 0 and 1 with a 60% probability of being equal to 1. Treatment T is also binary taking values 0 and 1 with 50% probability of being equal to 1. T is not affected by X (or any other covariate) and thus represents a randomized variable. The set of parameters used to generate the simulation data are: β=(0,2,1),σ12=1,Vb=[10.50.51],γ=(0,3,2,1,4) where the last element corresponds to the interaction of M1 and T in predicting M2, σ22=1,α=(0,4,3,2,1,3) where the last element corresponds to the interaction of M1 and T in predicting Y, and σ32=1.

In the Bayesian mediation analysis method proposed, there are parameters, involved in prior distributions, whose values we need to supply to the Bayesian models in order to run Bayesian analysis and make inference of mediation effects from posteriors. Those prior parameters may or may not be available from previous studies. In this section we will conduct sensitivity analysis to check the effects of prior choices on performance of Bayesian estimators.

The sensitivity analysis was conducted in two parts, the first part dealing with priors of fixed effect parameters, and the second part dealing with priors of random effect parameters. The reasoning behind this was that all fixed effect parameters can be inferred from the observed data itself, while the random effect parameters are not all identifiable from the observed data only. For example, there is no information in the observed data about the correlation between random intercept and random slope, such information comes directly from the prior parameter specification.

We also explored the effect of sample size in the sensitivity analysis. In Bayesian analysis, the inference is based on the posterior distributions, which are determined by both the priors and the observed data.

The Bayesian causal mediation estimation was replicated K = 100 times for each specific set of prior parameters to obtain frequentist properties such as bias, bias%, root mean square error (RMSE) and the posterior standard deviation (PSD), defined as follows:

θ=Σk=1KθkKBias=θθBias%=θθθ×100RMSE=Σk=1K(θkθ)2KPSD=Σk=1Kσk2K

where θk is the Bayesian mediation effect estimate for the kth simulation, θ is the mean of K mediation effect estimates, θ is the true causal effect from the Monte Carlo method, and σk2 is the variance of mediation effect estimate for the kth simulation. These definitions apply to all versions of the direct and indirect effects defined in section 2.

3.3.1. Fixed Effect Parameter Priors

In the first part of the sensitivity analysis, we looked at the effect of the choice of priors for the fixed effect parameters on mediation effect estimates. The prior parameters examined in this part of the sensitivity analysis included both the mean vectors μα, μβ, μγ and the precision of the mean vectors in models (3.7) to (3.9).

The mean vectors were set to the true parameter values, namely μα=α, μβ=β and μγ=γ, providing “accurate priors”, that is, prior parameter values equal to those used to generate the data, and set to 0 to provide “inaccurate priors” where the values are not equal to those used to generate the data. Note that this terminology is meant to reflect the accurate (fully informed) or inaccurate assessment of the prior information (which is known for the simulations).

The precision of mean vectors, expressed as the diagonal elements in Vα−1, Vβ−1 and Vγ−1, was varied (using values of 0, 1, 10 and 100) to show the effect of the prior precision on mediation effect estimates.

To focus on the effect of fixed effect prior parameters, the inverse-Wishart prior for model (3.7) was given a large number of degrees of freedom r=100, and a scale matrix R=Vb, indicating that the covariance matrix of random effect parameters is known. This specification eliminates the effect of the random effect parameter priors on the mediation effect estimates so that any difference in mediation effect estimates are from the various choices of fixed effect parameter priors. Non-informative prior parameter values for the inverse-gamma, namely (0.001, 0.001) were used for error variances in models (3.7) to (3.9) to run the Bayesian analysis.

The results are summarized in Table 1 for total causal effect. The simulation results were compared at sample sizes of 100 and 1 000, to explore possible effects of sample size on the robustness of Bayesian estimates to inaccurately (relative to the data generating model) specified priors.

Table 1:

Sensitivity Analysis of Fixed Effect Prior Parameters

Total causal effect1 N=100 N=1 000
μ precision Est. Bias Bias% RMSE PSD Est. Bias Bias% RMSE PSD
accurate 0 52.9 0.28 0.54 3.52 3.67 52.6 −0.01 −0.02 1.22 1.20
1 52.6 −0.02 −0.03 3.44 3.36 52.6 0.05 0.10 1.21 1.22
10 53.1 0.51 0.96 3.03 2.91 52.5 −0.14 −0.26 1.10 1.17
100 52.7 0.14 0.27 1.85 2.17 52.7 0.08 0.15 0.93 0.99

inaccurate 0 52.1 −0.51 −0.97 3.71 3.58 52.7 0.10 0.19 1.42 1.20
1 51.5 −1.09 −2.07 4.00 3.59 52.5 −0.12 −0.22 1.10 1.20
10 39.6 −13.0 −24.7 13.9 4.84 50.5 −2.08 −3.95 2.43 1.20
100 0.81 −51.8 −98.5 51.8 6.07 38.4 −14.3 −27.1 14.3 1.80
1

For complete sensitivity analysis for all versions of direct and indirect effects, see supplemental materials.

The results from Table 1 show that for the accurate priors, the biases of the Bayesian estimates are small. Increasing precision of the accurate prior has little effect on the bias of the Bayesian estimates, but reduces the mean square error of the estimates. For the inaccurate priors, increasing the prior precision increases the bias and mean square error greatly. Such trends were found for all types of mediation effects (see supplemental materials) as well as the total causal effect.

As sample size increases from 100 to 1 000, all measures, including bias, root mean square error and posterior standard deviation, are reduced, for both accurate and inaccurate prior, at all precision levels. The explanation for this trend is from the basics of Bayesian inference. Bayesian inference is based on posteriors distribution, which is determined by both the priors and observed data. When sample size is small, the contribution to the posterior distribution of the priors is large relative to that of the observed data, so the effect of the priors on Bayesian estimates is stronger at smaller sample sizes than at larger sample sizes. As sample size increases, we gather more information about the parameters from the data than from the priors, and the posteriors are closer to the data than the priors. Therefore any problem stemming from the inaccurate priors would be alleviated with the increased sample size.

The results of the sensitivity analysis show that increasing prior precision is only beneficial when the prior is accurate. For inaccurate priors, the more precise the prior is, the more bias it will produce in the Bayesian mediation effect estimates. An increase in the sample size protects against bias due to the inaccurate priors.

3.3.2. Random Effect Parameter Priors

In this part of the sensitivity analysis, we checked the effects of random effect parameter priors on the mediation effect estimates. Recall that the random effects make possible the joint prediction of the two potential outcomes of mediator M1, and subsequently identification of all potential outcomes and mediation effects. The prior for the covariance matrix of the random effects is in the form of an inverse-Wishart distribution. Two parameters control the distribution of an inverse-Wishart distribution, the scale matrix R and the degrees of freedom r in model (3.7). The former controls the mean, and the latter controls the variability.

Recall that the simulation data were generated from models (3.7) to (3.9) where a covariance matrix Vb=[10.50.51] was used for the random effect covariance.

The scale matrix explored here was in the form of R=[1ρρ1] where ρ controls the correlation between the random intercept and the random slope. For each combination of ρ and degrees of freedom r, the Bayesian analysis was conducted on 100 simulation datasets, and frequentist properties, including bias, root mean square error and posterior standard deviation, were obtained from the 100 simulation results.

To focus on the effect of the random effect parameter prior, the fixed effect priors were chosen to be non-informative in this part of the simulation study. Again, the simulations were performed at two different sample sizes of 100 and 1 000 to detect a possible trend with changing sample size.

The results from Table 2 show that Bayesian mediation effect estimates were not as sensitive to the random effect parameter priors as they were to the fixed effect parameter priors. The trend holds true for both ρ, the scale matrix parameter that controls the random effects correlation, and the degrees of freedom r. This trend is confirmed at both sample sizes, even if variability of mediation effect estimators is reduced at the larger sample size.

Table 2:

Sensitivity Analysis of Random Effect Prior Parameters

Total causal effect2 N=100 N=1 000
ρ r Est. Bias Bias% RMSE PSE Est. Bias Bias% RMSE PSD
−0.5 accurate 2 52.6 0.04 0.08 3.86 3.05 52.7 0.08 0.16 1.25 1.22
5 52.5 −0.15 −0.28 3.82 2.50 52.5 −0.05 −0.10 1.24 1.24
10 52.7 0.11 0.20 3.39 2.47 52.7 0.06 0.11 1.18 0.97
100 52.9 0.33 0.63 3.94 3.67 52.7 0.08 0.16 1.31 1.22

0 2 52.5 −0.09 −0.18 4.39 2.89 52.6 −0.04 −0.08 1.19 1.07
5 52.9 0.26 0.48 4.12 2.26 52.4 −0.25 −0.48 1.23 0.96
10 52.2 −0.42 −0.80 3.98 1.92 52.6 −0.04 −0.08 1.23 0.79
100 52.4 −0.22 −0.43 4.74 2.91 52.5 −0.14 −0.27 1.12 1.04

0.5 2 52.6 0.03 0.07 3.50 2.70 52.8 0.16 0.31 1.17 1.01
5 52.7 0.06 0.12 4.02 2.32 52.6 −0.02 −0.05 1.18 0.89
10 52.1 −0.50 −0.95 3.79 2.04 52.7 0.12 0.22 1.28 0.73
100 52.3 −0.30 −0.57 3.92 3.23 52.6 0.004 0.007 1.18 1.22
2

For complete sensitivity analysis for all versions of direct and indirect effects, see supplemental materials.

The Bayesian mediation effect estimates were not strongly affected by the choice of random effect parameter priors. The random effects in the Bayesian model provide a link between the potential outcomes of mediator M1 at different treatment levels, and the random effect parameter prior has little effect on mediation effect estimates, especially when compared to the consequences of inaccurate fixed effect parameter priors.

Similar results have been found in the frequentist approach by Daniel et al. (2015) where a ‘cross-world’ correlation was proposed an a sensitivity parameter, and mediation effects were not found to be strongly affected by this sensitivity parameter.

4. Case Study

In this section, we will apply the proposed Bayesian methods to an adolescent dental health study. The research question we try to answer here is whether the effect of socioeconomic status (SES) on dental caries occurs through a path involving dental visits and oral hygiene index (OHI) as a sequence of mediators. In this setup, the treatment variable T is SES (1 if low, 0 if high), mediator M1 is the continuous variable of dental visit frequency, mediator M2 is the continuous variable of OHI, and outcome Y is the continuous variable of number of decayed, missing or filled teeth (DMFT). The mediator M1, dental visit, could have an effect on mediator M2, OHI, but not the other way around, so we have multiple ordered continuous mediators in this setting. The list of confounders to adjust for in the Bayesian analysis include binary indicator of normal birth weight, sex and race.

The sample size was 200, where 63.5% of the sample were in the low SES group, and 80.5% had two or more dental visits per year, the mean OHI was 1.18 (in a range of 0 to 3), DMFT (number of decayed, missing or filled teeth) ranged from 0 to 20, and 53% of the children in the sample had at least one DMFT. The DMFT outcome was log transformed in the following analysis.

A Bayesian regression model (4.1) was fitted to the continuous mediator M1, dental visits, where birth weight, sex, and race (denoted by X1, X2 and X3, respectively) were the confounders adjusted for.

M1i=β0+β1Ti+β2X1i+β3X2i+β4X3i+b0i+b1iTi+ϵ1iϵ1i~N(0,σ12)bi=(b0i,b1i)~N(0,Vb)β=(β0,β1,β2,β3,β4)~N(μβ,Vβ)σ12~InvGamma(ν,δ)Vb~InvWishart(r,rR) (4.1)

For the continuous mediator M2, oral hygiene index (OHI), the Bayesian regression model shown in model (4.2) was fitted.

M2i=γ0+γ1Ti+γ2M1i+γ3X1i+γ4X2i+γ5X3i+γ12TiM1i+ϵ2iϵ2i~N(0,σ22)γ=(γ0,γ1,γ2,γ3,γ4,γ5,γ12)~N(μγ,Vγ)σ22~InvGamma(ν1,δ1) (4.2)

Lastly, for outcome Y, the log transformed DMFT, the Bayesian regression model shown in model (4.3) was fitted.

Yi=α0+α1Ti+α2M1i+α3M2i+α4X1i+α5X2i+α6X3i+α12TiM1i+ϵ3iϵ3i~N(0,σ32)α=(α0,α1,α2,α3,α4,α5,α6,α12)~N(μα,Vα)σ32~InvGamma(ν2,δ2) (4.3)

The first 1 000 iterations from the burn-in period were discarded from the construction of posteriors, and 10 000 MCMC iterations after the burn-in period were used for posterior inference.

In the Bayesian analysis we need to provide priors for all parameters in models (4.1) to (4.3). Non-informative weak priors were given to the models to run Bayesian analysis. For fixed effects, the prior means were vectors of 0, and the prior variances were diagonal matrices with a large variance of 1 000 to indicate non-informative flat priors for the fixed effect vectors. Also, non-informative inverse-gamma parameter values (0.001,0.001) were used for the prior of residual error variances in the models. For the random effect parameter prior, a non-informative inverse-Wishart prior with two degrees of freedom and a scale matrix of R=[1001] were used for the random effect covariance matrix.

The results of the Bayesian causal mediation analysis are summarized in Table 3. The results show that indirect effect of treatment T, SES, through M2, oral hygiene index, alone on the outcome of dental caries is generally weak, as evidenced by the small values of I2(000) to I2(111). The indirect effects of SES, through both M1, dental visit, and M2, oral hygiene index, represented by I12(000) to I12(111), were even weaker. The indirect effects of SES through M1, dental visit, alone on dental caries were stronger for some versions, I1(000) to I1(011), but weaker for others, I1(100) to I1 (111). The direct effects of SES on dental caries were much stronger, but again with notable difference between different versions of direct effects, with D(000) to D(011) much stronger than D(100) to D(111). The differences in estimates among versions of the same type of mediation effect, representing a strong interaction of treatment and mediator, was captured with the proposed Bayesian causal mediation analysis methods in the potential outcomes framework.

Table 3:

Summary of Bayesian Causal Mediation Analysis for Adolescent Dental Health Study

Type Posterior Mean 95% CI Lower 95% CI Uppe Type Posterior Mean 95% CI Lower 95% CI Uppe
TCE 0.374 0.091 0.663

D(000) 0.330 0.036 0.612 I1(000) 0.134 −0.049 0.320
D(001) 0.332 0.043 0.617 I1(001) 0.135 −0.044 0.320
D(010) 0.331 0.045 0.613 I1(010) 0.133 −0.047 0.321
D(011) 0.330 0.035 0.615 I1(011) 0.134 −0.049 0.324
D(100) 0.191 −0.124 0.493 I1(100) −0.005 −0.164 0.152
D(101) 0.191 −0.113 0.489 I1(101) −0.006 −0.163 0.148
D(110) 0.192 −0.112 0.493 I1(110) −0.006 −0.165 0.153
D(111) 0.192 −0.113 0.500 I1(111) −0.003 −0.162 0.156

I2(000) 0.048 −0.113 0.215 I12(000) 0.016 −0.135 0.169
I2(001) 0.032 −0.129 0.188 I12(001) 0.000 −0.152 0.151
I2(010) 0.047 −0.113 0.209 I12(010) 0.017 −0.132 0.170
I2(011) 0.031 −0.123 0.190 I12(011) 0.001 −0.151 0.150
I2(100) 0.049 −0.113 0.212 I12(100) 0.018 −0.133 0.169
I2(101) 0.029 −0.130 0.190 I12(101) −0.002 −0.154 0.150
I2(110) 0.049 −0.110 0.214 I12(110) 0.018 −0.134 0.170
I2(111) 0.032 −0.125 0.192 I12(111) 0.001 −0.150 0.150

5. Discussions

In this work, Bayesian models have been proposed for causal mediation analysis with multiple ordered mediators, for mediators and outcome following linear models and with possible interactions between mediators and treatment. The random effects in the proposed Bayesian model provide a connection between potential values of the first mediator on the causal pathway and allow identification of mediation effects in the finest decomposition of causal effect. A fully Bayesian approach was proposed where all parameters in the models, fixed effects or random effects, are given priors, whether non-informative or informative (for example, using prior studies). MCMC methods were employed for posterior drawing and inference of causal mediation effects.

Simulation study results show that the choice of prior for fixed effect parameters could have a significant effect on Bayesian estimator bias, especially when inaccurate priors with high precision are used in analysis. For accurate priors, increasing prior precision reduces the posterior estimate variance and mean square error.

The prior of the random effect parameter, on the other hand, has very little effect on Bayesian mediator estimator biases. The random effects in the analysis were introduced to provide a connection between potential outcomes and consequently allow joint estimation of potential outcomes for certain mediators on the causal pathway and identification of mediation effects on all causal pathways. The natural direct and indirect effects defined in section 2 are based on the difference between two of the 16 versions of the potential outcome Y(t,M1(t),M2(t,M1(t))) at t=0,1, t′=0,1, t″=0,1 and t‴=0,1. In the context of linear mediator and final outcome models with no interaction terms, any effect of the random effect parameter prior on the joint estimation of M1(t′) and M1(t‴) would be canceled when calculating the direct and indirect effects. In the simulation studies in this research, the effect of the random effect parameter prior on mediation effect estimates has been found to be weak in the linear setting of mediators and outcome with possible interaction between mediators and treatment.

The limitations of the proposed Bayesian causal mediation analysis with multiple ordered mediators include the strong sequential ignorability assumption in model (3.1). The mediation formula requires correct specification of models for mediators M1, M2 and outcome Y. Misspecification of any of the three models may produce biased estimators.

Supplementary Material

Additional Sensitivity Analysis Tables
R Code

Figure 1:

Figure 1:

Causal mediation analysis setup. Left panel shows mediation with a single mediator M, and right panel shows mediation with two causally ordered mediators M1 and M2, where T is the treatment and Y is the outcome in both cases.

Acknowledgements

We want to thank Dr. Suchitra Nelson for providing the data from her adolescent dental health study. This work was supported by the National Institute of Dental and Craniofacial Research, National Institutes of Health, Research Grant R01-DE025835 (J. Albert).

References

  1. Albert JM (2008). Mediation analysis via potential outcomes models. Statistics in Medicine, 27, 1282–1304. [DOI] [PubMed] [Google Scholar]
  2. Albert JM and Nelson S (2011). Generalized causal mediation analysis. Biometrics, 67, 1028–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baron RM and Kenny DA (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. [DOI] [PubMed] [Google Scholar]
  4. Daniel RM, De Stavola BL, Cousens SN, and Vansteelandt S (2015). Causal mediation analysis with multiple mediators. Biometrics, 71, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. De Stavola BL, Daniel RM, Ploubidis GB, and Micali N (2015). Mediation analysis with intermediate confounding: structural equation modeling viewed through the causal inference lens. American Journal of Epidemiology, 181, 64–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Imai K and Yamamoto T (2013). Identification and sensitivity analysis for multiple causal mechanisms: Revisiting evidence from framing experiments. Political Analysis, 21, 141–171. [Google Scholar]
  7. Imai K, Keele L, and Yamamoto T (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25, 51–71. [Google Scholar]
  8. Little RJ (2006). Calibrated bayes: A bayes/frequentist roadmap. American Statistician, 60, 213–223. [Google Scholar]
  9. MacKinnon DP (2000). Contrasts in multiple mediator models In Rose JS, editor, Multivariate applications in substance use research : new methods for new questions, book section 5, pages 141–160. Lawrence Erlbaum Associates, Inc., Mahwah, New Jersey. [Google Scholar]
  10. Park S and Kaplan D (2015). Bayesian causal mediation analysis for group randomized designs with homogeneous and heterogeneous effects: Simulation and case study. Multivariate Behavioral Research, 50, 316–333. [DOI] [PubMed] [Google Scholar]
  11. Pearl J (2001). Direct and indirect effects. In Breese J and Koller D, editors, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pages 411–420, San Francisco, CA, 2001 Morgan Kaufmann Publishers Inc. [Google Scholar]
  12. Pearl J (2012). The mediation formula: A guide to the assessment of causal pathways in nonlinear models In Berzuini C, Dawid P, and Bernardinelli L, editors, Causality : statistical perspectives and applications, book section 12, pages 151–179. John Wiley and Sons, Ltd, Chichester, UK. [Google Scholar]
  13. Raftery AE and Lewis SM (1995). The number of iterations, convergence diagnostics and generic metropolis algorithms In Gilks WR, Spiegelhalter D, and Richardson S, editors, Practical Markov Chain Monte Carlo, pages 115–130. Chapman and Hall, London, U.K. [Google Scholar]
  14. Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512. [Google Scholar]
  15. Robins JM (1989). The analysis of randomized and nonrandomized aids treatment trials using a new approach to causal inference in longitudinal studies In Sechrest L, Freeman HE, and Mulley AG, editors, Health services research methodology : a focus on AIDS, pages 113–159. U.S. Dept. of Health and Human Services, National Center for Health Services Research and Health Care Technology Assessment, Washington, D.C. [Google Scholar]
  16. Robins JM and Greenland S (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. [DOI] [PubMed] [Google Scholar]
  17. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. [Google Scholar]
  18. Taguri M, Featherstone J, and Cheng J (2015). Causal mediation analysis with multiple causally non-ordered mediators. URL 10.1177/0962280215615899. [DOI] [PMC free article] [PubMed]
  19. Tanner MA and Wong HW (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540. [Google Scholar]
  20. Tchetgen EJ and Shpitser I (2012). Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics, 40, 1816–1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. VanderWeele TJ and Chiba Y (2014). Sensitivity analysis for direct and indirect effects in the presence of exposure-induced mediator-outcome confounders. Epidemiol Biostat Public Health, 11, e9027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. VanderWeele TJ and Vansteelandt S (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface, 2, 457–468. [Google Scholar]
  23. VanderWeele TJ and Vansteelandt S (2014). Mediation analysis with multiple mediators. Epidemiologic Methods, 2, 95–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. VanderWeele TJ, Vansteelandt S, and Robins JM (2014). Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology, 25, 300–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Wang W, Nelson S, and Albert JM (2013). Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula. Statistics in Medicine, 32, 4211–4228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Yuan Y and MacKinnon DP (2009). Bayesian mediation analysis. Psychological Methods, 14, 301–322. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional Sensitivity Analysis Tables
R Code

RESOURCES