Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Nov 4.
Published in final edited form as: Eur J Epidemiol. 2015 Nov 4;30(10):1119–1127. doi: 10.1007/s10654-015-0100-z

G-Computation Demonstration in Causal Mediation Analysis

Aolin Wang a,b, Onyebuchi A Arah a,b,c
PMCID: PMC4674449  NIHMSID: NIHMS735695  PMID: 26537707

Abstract

Recent work has considerably advanced the definition, identification and estimation of controlled direct, and natural direct and indirect effects in causal mediation analysis. Despite the various estimation methods and statistical routines being developed, a unified approach for effect estimation under different effect decomposition scenarios is still needed for epidemiologic research. G-computation offers such unification and has been used for total effect and joint controlled direct effect estimation settings, involving different types of exposure and outcome variables. In this study, we demonstrate the utility of parametric g-computation in estimating various components of the total effect, including (i) natural direct and indirect effects, (ii) standard and stochastic controlled direct effects, and (iii) reference and mediated interaction effects, using Monte Carlo simulations in standard statistical software. For each study subject, we estimated their nested potential outcomes corresponding to the (mediated) effects of an intervention on the exposure wherein the mediator was allowed to attain the value it would have under a possible counterfactual exposure intervention, under a pre-specified distribution of the mediator independent of any causes, or under a fixed controlled value. A final regression of the potential outcome on the exposure intervention variable was used to compute point estimates and bootstrap was used to obtain confidence intervals. Through contrasting different potential outcomes, this analytical framework provides an intuitive way of estimating effects under the recently introduced 3- and 4- way effect decomposition. This framework can be extended to complex multivariable and longitudinal mediation settings.

Keywords: g-computation, parametric g-formula, mediation analysis, effect decomposition, simulation

Introduction

Recent work has considerably advanced the definition, identification and estimation of different controlled or natural effects in causal mediation analysis. As a result, various estimation methods such as linear structural equation modeling, outcome and mediator regression-based methods, parametric g-computation, inverse-probability-weighted (IPW) fitting of marginal structural models (MSMs), and sequential g-estimation [1,2] have been applied to mediation settings. With some notable exceptions [35], relatively few approaches can incorporate general types of mediator and outcome variables. The simulation-based approach introduced by Imai et al. [3] and the regression-based approach proposed by Valeri & VanderWeele [5] are increasingly popular approaches; the former may be unfamiliar to epidemiologists, and the latter requires different regressions or approximations for different mediator and outcome types. Meanwhile, g-computation [1], as an alternative for computing marginal effects over IPW fitting of MSM [6], holds promise to provide a unified framework for effect estimation in causal mediation analysis, especially given its capability in dealing with time-varying exposure and confounding [4,7]. However, a didactic demonstration of the application of g-computation in causal mediation analysis, reflecting recent decompositions for mediation and interaction, is lacking.

In this paper, we demonstrate the utility of (parametric) g-computation in estimating various components of the total effect, such as (i) natural direct and indirect effects, (ii) standard and stochastic controlled direct effects, and (iii) reference and mediated interactions, using standard statistical software. We focus on marginal effects (standardized over covariates). The current approach extends the previous work on g-computation demonstration for total effect [6] and the gformula package (mediation option) in Stata [4] by showing the actual steps in estimation and incorporating estimation for various component effects under 2-, 3- and 4-way effect decomposition [8,9]. The paper is organized as follows: we will first review the definition and identification criteria for different effects in mediation context. Then we will review the g-computation algorithm and introduce steps for mediation analysis. After an illustrative example using a partially simulated data set, we will discuss the strengths and limitations of g-computation and its relation to other existing estimation procedures. Readers familiar with the background material on mediation analysis under the potential outcomes framework can go directly to g-computation steps section.

Notation and definitions

Let X denote the exposure of interest, Y the outcome of interest, M the mediator of interest, and Z a set of covariates not affected by the exposure but which are assumed to be sufficient for confounding control for total, direct and indirect effects estimation. Throughout, we assumed X preceded M, which preceded Y. Let Yx and Mx denote respectively the potential values of the outcome and mediator that would have occurred had exposure X been set, possibly counter to fact, to a specific value x. Similarly, let Yxm denote the potential value of Y that would have occurred had X and M been set, possibly counter to fact, to x and m respectively. We use YxMx* to express potential outcome value had the exposure X been set to x and M to Mx*. Let x (index) and x* (reference) denote two values of the exposure we wish to compare, and m (index) and m* (reference) denote two controlled values of the mediator we wish to compare.

Total effect (TE) compares exposure level x to x*, while allowing the mediator to obtain its natural value under each exposure level. Thus, by assuming generalized consistency [10], TE can also be defined as E[YxMxYx*Mx*]. TE can be decomposed into different types of component effects. There are two-way, three-way, and four-way decompositions of the total effect as presented in Table 1 [8,9,11,12]. The counterfactual definitions are listed in Table 2 (left column). The choice of effect decomposition should be guided by substantive research questions (Table 3).

Table 1.

Summary of effect decomposition in causal mediation analysisa

N-way
decomposition
Counterfactual definition (individual level)
2-way TEb = PDE + TIE = (YxMx*Yx*Mx*) + (YxMxYxMx*)
= TDE + PIE = (YxMxYx*Mx) + (Yx*MxYx*Mx*)
= CDE + PE = (YxmYx*m) + [(YxYx*) − (YxmYx*m)]
3-way TE = PDE + MIE + PIE = (YxMx*Yx*Mx*) + (YxMxYxMx*Yx*Mx + Yx*Mx*) + (Yx*MxYx*Mx*)
= (YxMx*Yx*Mx*) + (YxmYxm*Yx*m + Yx*M*)(MxMx*) + (Yx*MxYx*Mx*)
= CDE + PAI + PIE = (YxmYx*m) + (YxmYxm*Yx*m + Yx*m*)(Mx) + (Yx*MxYx*Mx*)
4-way TE = CDE + RIE + MIE + PIE = (YxmYx*m) + (YxmYxm*Yx*m + Yx*m*)(Mx*) + (YxmYxm*Yx*m + Yx*m*)(MxMx*) + (Yx*MxYx*Mx*)
a

Table adapted from VanderWeele (2014) Tables 5–6 [9].

b

TE: total effect, PDE: pure direct effect, TIE: total indirect effect, TDE: total direct effect, PIE: pure indirect effect, CDE: controlled direct effect (standard), PE: portion eliminated, MIE: mediated interaction effect (referred to as “INTmed” by VanderWeele), PAI: portion attributable to interaction, RIE: reference interaction effect (referred to as “INTref” by VanderWeele).

Table 2.

Definition and empirical analog of total effect and component effectsa

Effect Counterfactual definition Empirical analogb
TEc E[YxYx*]d zm{E (Y|x, m, z) P (m|x, z) − E (Y|x*, m, z) P (m|x*, z)}P (z)e
PDE E[YxMx*Yx*Mx*] zm{E (Y|x, m, z) − (Y|x*, m, z)}P (m|x*, z) P (z)f
TIE E[YxMxYxMx*] zm E (Y|x, m, z){P (m|x, z) − P (m|x*, z)}P (z)
TDE E[YxMxYx*Mx] zm{E (Y|x, m, z) − (Y|x*, m, z)}P (m|x, z) P (z)
PIE E[Yx*MxYx*Mx*] zm E (Y|x*, m, z){P (m|x, z) − P (m|x*, z)}P (z)f
CDEM=m* E[Yxm*Yx*m*] z{E (Y|x, m*, z) − E (Y|x*, m*, z)}P (z)
CDEsto E[YxMYx*M] zm{E (Y|x, m, z) − E (Y|x*, m, z)}P (m′) P (z)
RIE E[(YxmYxm*Yx*m + Yx*m*)(Mx*)] zm{E (Y|x, m, z) − E (Y|x, m*, z) − E (Y|x*, m, z) + E (Y|x*, m*, z)} P (m|x*, z) P (z)
MIE E[(YxmYxm*Yx*m + Yx*m*)(MxMx*)] zm{E (Y|x, m, z) − E (Y|x, m*, z) − E (Y|x*, m, z) + E (Y|x*, m*, z)} {P (m|x, z) − P (m|x*, z)}P (z)
PAI E[(YxmYxm*Yx*m + Yx*m*)(Mx)] zm{E (Y|x, m, z) − E (Y|x, m*, z) − E (Y|x*, m, z) + E (Y|x*, m*, z)} P (m|x, z) P (z)
a

Y: outcome, X: exposure, M: mediator, Z: covariates; x and m represent the index values whereas x* and m* represent the reference values.

b

Under the stable unit treatment value assumption, consistency, conditional exchangeability, positivity, different types of effect can be identified and estimated using the empirical analogs We use E (Y|x, m, z) as a shorthand for E (Y|X = x, M = m, Z = z), and P (m|x, z) as a shorthand for P (M = m|X = x, Z = z).

c

E: total effect, PDE: pure direct effect, TIE: total indirect effect, TDE: total direct effect, PIE: pure indirect effect, CDE: controlled direct effect (standard), CDEsto: stochastic controlled direct effect, RIE: reference interaction effect (referred to as “INTref” by VanderWeele), MIE: mediated interaction effect (referred to as “INTmed” by VanderWeele), PAI: portion attributable to interaction.

d

Effects are defined as risk differences here but other measures of effects are possible (risk ratio, odds ratio etc.).

e

For continuous M and Z, summations are replaced by integrals and the probability functions by appropriate density functions

f

These two expressions are known as the mediation formula [47].

Table 3.

Potential research questions related to various effects under effect decomposition

Effect Research question
TEa Overall, to what extent does X cause Y?
PDE In particular, to what extent does X cause Y via pathways other than through M?
TIE In particular, to what extent does X cause Y via M (i.e. due to X affecting M and subsequently, M affecting Y) and the possible interaction between X and M in affecting Y? In other words, the effect of exposure that “ would be prevented if the exposure did not cause the mediator” (i.e. the portion of the effect for which mediation is “necessary”) [19,47].
TDE In particular, to what extent does X cause Y via pathways other than through M, allowing M to boost up or tune down such effect at the same time?
PIE In particular, to what extent does X cause Y via M only (i.e. due to X affecting M and subsequently, M affecting Y), not accounting for the possible interaction between X and M? In other words, the effect that the exposure would have had if “its only action were to cause the mediator” (i.e. the portion of the effect for which mediation is “sufficient”) (i.e. the portion of [19,47].
CDE What would be the effect of X on Y, when fixing M at a specific value for everyone in the population?
CDEsto What would be the effect of X on Y, when allowing M to attain certain controlled distribution (via intervention) in the population?
RIE What would be the effect of X on Y that is due to interaction between X and M only?
MIE What would be the effect of X on Y that is due to both interaction between X and M and the fact that X causes M?
PAI What would be the effect of X on Y that is due to interaction between X and M, regardless whether X causes M?
a

TE: total effect, PDE: pure direct effect, TIE: total indirect effect, TDE: total direct effect, PIE: pure indirect effect, CDE: controlled direct effect (standard), CDEsto: stochastic controlled direct effect, RIE: reference interaction effect (referred to as “INTref” by VanderWeele), MIE: mediated interaction effect (referred to as “INTmed” by VanderWeele), PAI: portion attributable to interaction.

There are four types of “direct” effects. The standard controlled direct effect (CDE) compares exposure level x to x* while fixing the mediator to a specific level. The CDE estimates the effect of X on Y while fixing M to m for every individual in the population and it can be different for different levels of m [11,12]. The stochastic CDE (CDEsto) compares exposure level x to x* while randomizing the mediator to a pre-specified distribution M’. Accordingly, the stochastic CDE subsumes both the standard CDE and the stochastic mediation contrast [13] in that the standard CDE corresponds to M’ being a constant m for the total population (i.e., full intervention) while the stochastic mediation contrast corresponds to M’ being a constant for a subset of the population and being the observed distribution for the rest (i.e., partial intervention).

The average pure direct effect (PDE) compares exposure level x to x* while the mediator M is set to the natural value it would have attained under the reference level x* of exposure (i.e., Mx*). Accordingly, the average total direct effect (TDE) differs from PDE in that the mediator M is set to the natural value it would have attained under the index level x of exposure (i.e., Mx). Direct effects are relative to the mediator M of interest, that is, they are effects through alternative pathways other than through M.

The average pure indirect effect (PIE) compares mediator level Mx to Mx* while setting exposure to reference level x*. The average total indirect effect (TIE) differs from the PIE in that the exposure is set to index level x.

Reference interaction effect (RIE) and mediated interaction effect (MIE) capture the effect of X on Y due to interaction only and the effect of X on Y due to both interaction and mediation respectively [9]. RIE and MIE are sometimes combined to reflect the effect of X on Y due to overall interaction and termed “portion attributable to interaction” (PAI) [9].

Assumptions for identification and estimation

To identify and estimate the effect decomposition quantities, we invoke the stable unit treatment value assumption (SUTVA) [1,15], and assumptions of consistency [16], conditional exchangeability (no-uncontrolled-confounding), and positivity [17]. The conditional exchangeability assumption for mediation analysis includes the following [12,18]: (i) the effect of the exposure X on the outcome Y is unconfounded conditional on a set Z of measured covariates; (ii) the effect of the mediator M on Y is unconfounded conditional on both X and Z; (iii) the effect of X on M is unconfounded conditional on Z. For identifying standard CDE, (i) and (ii) are sufficient but for stochastic CDE, all three assumptions are needed. Successful randomization of the exposure will support the assumption of no uncontrolled confounding of the exposure-mediator and exposure-outcome relations but will not guarantee the absence of uncontrolled confounding of the mediator-outcome relation [19]. In the presence of possible violation of this assumption, sensitivity analysis are needed [3,20]. To identify natural effects, a fourth conditional exchangeability assumption is needed: (iv) none of the mediator-outcome confounders are affected by exposure. Assumption (iv) is known as the cross-world independence assumption [21] because it requires that conditional on Z, the mediator that would have been observed in a world under X=x* is independent of the outcome that would have been observed in a world under X=x (i.e. Mx*YxMx). This assumption is problematic because these two variables Mx* and YxMx can never be observed together [18,22]. Recent research proposed identification criteria [23] and effect decomposition [18] aimed at circumventing the violation of this fourth assumption. However, this issue is beyond the scope of this article. In addition, we assume no selection bias and measurement error. Under the above assumptions, different types of effect can be nonparametrically identified and estimated using the empirical analogs listed in Table 2 (right column). For the current paper, we adopted a fully parametric approach, i.e., positing parametric (regression) models for each expectation in these empirical analogs to estimate their parameters from the observed data and then integrating over the covariate and/or mediator distribution [24]. We further assumed no model misspecification.

G-computation

G-computation algorithm was first introduced by Robins in 1986 [1] to estimate the causal effect of a time-varying exposure in the presence of time-varying confounders that are affected by exposure, a scenario where traditional regression-based methods would fail. In recent years, several didactic examples were given in the literature [6,25,26], promoting the use of this causal analytic technique. Increasingly, more studies have applied this methodology in estimating the effect of dynamic treatment regimes [27,28] or projecting the impact of hypothetical interventions [2933]. In a simple setting with a single-time exposure and an outcome, g-computation can be seen as the generalization of standardization. Accessible examples of g-computation with detailed discussion of its strengths and limitations can be found elsewhere [2,4,6,30].

G-computation steps in causal mediation analysis

The g-computation steps have been summarized in Figure 1. Step 1 involves obtaining the parameters of the assumed covariate distributions, and fitting the assumed models for the mediator (M model) and the outcome (Y model) using observed data. The key covariates needed for the M model are confounders of the exposure-mediator relation. The key covariates needed for the Y model are confounders of the exposure-outcome relation and mediator-outcome relation. To avoid simulating the covariate set Z, one can replace steps 1a and 2a with resampling (Figure 1). Step 2 entails simulating the covariate set Z (step 2a), an intervention variable X (step 2b), and the potential mediator (step 2c), and outcome Y (step 2d) sequentially for J (J can be as large as computationally feasible) copies of the original sample. The simulation repetition done here is to reduce Monte Carlo simulation error. The intervention variable X should be distinguished from the observed exposure variable as the intervention X and the simulated covariates are marginally independent of each other. In step 2c, we simulate each potential mediator as a function of the simulated covariates and intervention X from the previous steps (2a and 2b), using the parameters obtained from the M model in step 1b. Similarly, in step 2d, we simulate a potential outcome variable that corresponds to each specific type of effect as a function of the simulated covariates, intervention, and mediator from the previous steps, using the parameters obtained from the Y model in step 1c. Step 3 involves regressing each different potential outcome variable on the intervention variable X to obtain estimates of each marginal effect using the pooled data with J copies of the original sample. Repeat step 2–3 on K (usually 200 or more) bootstrapped samples taken at random with replacement from the original data. The Wald type 95% confidence interval (CI) was calculated as: point estimate ± 1.96 × SD, where SD was the standard deviation of the K resultant point estimates from the final regression in the third step.

Figure 1.

Figure 1

Steps for G-computation marginal structural model (G-comp MSM) in causal mediation analysis. Let Z denote a set of covariates, X denote exposure, M denote mediator, and Y denote outcome of interest. Variables in step 1 are observed variables whereas variables in step 2 are all simulated variables. Step 1a and 2a combined can be replaced by resampling.

Illustration

We used a directed acyclic graph (DAG) [34] to represent the data generating process for the illustration example (Figure 2). We used the India sample from the World Health Survey (WHS) [35]. We used all the observed covariates to simulate exposure smoking, mediator body mass index (BMI, 5-unit increase), and outcome composite health score (0–100) sequentially according to the data generating process shown in Figure 2. Covariates for confounding control included age, gender, education, urbanicity, and depression. Detailed description for generating this partially simulated data set can be found in the supplementary materials. In the illustrative example, we will focus on the interpretations for PDE and TIE from the most common decomposition used in epidemiology and the interpretations for component effects based on the recently introduced 4-way decomposition. Since smoking was binary, we used 1 to represent “yes” and 0 “no”. A second illustrative example using binary exposure, mediator and outcome was included in the supplementary materials. All analyses were done using SAS version 9.4 (SAS Institute Inc., Cary, NC) and the accompanying SAS code can be found in the supplementary materials.

Figure 2.

Figure 2

Directed acyclic graph (DAG) representing the data generating process for both illustrative examples. X, M, and Y represent the exposure smoking, mediator body mass index, and outcome composite health score. Z represents a set of exposure-mediator, exposure-outcome and mediator-outcome confounders that includes age, gender, education, urbanicity, and depression.

To estimate different component effects of smoking on health, we implemented the following steps:

  • Step 1. Obtaining empirical parameters

    • (1a)

      We obtained the marginal expectation of each variable except the outcome, and the standard deviation for the continuous age variable.

    • (1b)

      The mediator BMI was regressed on smoking, age, gender, education, urbanicity and depression to obtain the regression coefficients and root mean square error for the linear M model.

    • (1c)

      The outcome, overall health score, was then regressed on smoking, BMI, smoking BMI, age, gender, education and depression to obtain the regression coefficients and root mean square error for the linear Y model.

  • Step 2. Simulating the potential mediators and outcomes

    • (2a)

      We created 1000 copies of the original sample and simulated age, gender, education, urbanicity, and depression that followed the same distribution as the observed variables.

    • (2b)

      We simulated a smoking intervention variable (X) that followed the observed smoking prevalence but was marginally independent of all simulated covariates.

    • (2c)

      We simulated each potential BMI variable as a function of the smoking intervention, age, gender, education, urbanicity and depression (Table 4, simulating M), using the regression coefficients and root mean square error (RMSE) from the M model fit in (1b).

    • (2d)

      We simulated a potential health score variable for each type of effect as a function of the smoking intervention, potential BMI from (2c), product term between smoking intervention and potential BMI, age, gender, education and depression (Table 4, simulating Y), using the regression coefficients and RMSE from the Y model fit in (1c).

      We will use PDE and TIE as examples to explain steps (2c) and (2d) further. For binary exposure, recall that PDE compares X=1 to X=0 while setting the mediator M to M0 (natural value under reference exposure). In (2c), we simulated the potential BMI variable (M0) as a function of non-smoking (setting X=0 in the equation for simulating M) and other determinants of BMI. Next, we simulated the potential health score variable (YPDE) as a function of the smoking intervention variable (X), potential BMI variable (M0) from (2c), and other determinants of health in (2d). On the other hand, TIE compares mediator level M1 to M0 while setting exposure to index level 1. In (2c), we simulated the potential BMI variable (Mx) as a function of smoking intervention (setting X=x in the equation for simulating M) and other determinants of BMI. Then, we simulated the potential health score variable (YTIE) as a function of smoking (setting X=1 in the equation for simulating Y), potential BMI variable (Mx) from (2c), and other determinants of health in (2d). In this way, the potential BMI variable (Mx) transmitted the effect of smoking intervention to health.

  • Step 3. Fitting final marginal structural models (MSMs)

    We regressed each different potential health score variable on the smoking intervention to obtain point estimates of each marginal effect using the pooled sample. We repeated step 2–3 on 200 bootstrapped samples of the same size taken at random with replacement from the original data to obtain Wald type 95% CIs.

Table 4.

Equations used to simulate potential mediators and outcomes in step 2 of the G-computation marginal structural model for the illustrative examplea

M model and Y model from step 1b:
E (M|x, z; α) = αM + αX · x + αZ · z
E (M|x, m, z; β) = βY + βX · x + βM · m + βXM · x · m + βZ · z

Effect Simulating Mc Simulating Yc
TEd Mx = αM + αX · x + αZ · z + εMe YTE = βY + βX · x + βM · mx + βXM · x · mx + βZ · z + εYe
PDE M0 = αM + αX · 0 + αZ · z + εM YPDE = βY + βX · x + βM · m0 + βXM · x · m0 + βZ · z + εY
TIE Mx = αM + αX · x + αZ · z + εM YTIE = βY + βX · 1 + βM · mx + βXM · 1 · mx + βZ · z + εY
TDE M1 = αM + αX · 1 + αZ · z + εM YTDE = βY + βX · x + βM · m1 + βXM · x · m1 + βZ · z + εY
PIE Mx = αM + αX · x + αZ · z + εM YPIE = βY + βX · 0 + βM · mx + βXM · 0 · mx + βZ · z + εY
CDEM=m* M* = 4.8f YCDE = βY + βX · x + βM · m* + βXM · x · m* + βZ · z + εY
CDEsto M = E (M) + εM YCDEsto = βY + βX · x + βM · m′ + βXM · x · m′ + βZ · z + εY
RIE M0 = αM + αX · 0 + αZ · z + εM YRIEg = βY + 0 · x + 0 · mx + βXM · x · (m0m*) + βZ · z + εY
MIE Mmedh = αX · x + εM YMIEg = βY + 0 · x + 0 · mx + βXM · x · mmed + βZ · z + εY
PAI M1 = αM + αX · 1 + αZ · z + εM YPAIg = βY + 0 · x + 0 · mx + βXM · x · (m1m*) + βZ · z + εY
a

Exposure: smoking (1=yes, 0=no); mediator: body mass index (5-unit increase); outcome: composite health score; covariates: age, gender, education, urbanicity, depression.

b

Variables used to fit the M model and Y model in step 1 were observed variables.

c

Lower case “x”, “z”, and “mx” represented specific values of the random variables intervention “X”, simulated covariate set “Z”, and potential mediator “Mx” respectively and the values can differ for different individuals. Intervention “X” was independent of the simulated covariate set “Z”.

d

TE: total effect, PDE: pure direct effect, TIE: total indirect effect, TDE: total direct effect, PIE: pure indirect effect, CDE: controlled direct effect (standard), CDEsto: stochastic controlled direct effect, RIE: reference interaction effect (referred to as “INTref” by VanderWeele), MIE: mediated interaction effect (referred to as “INTmed” by VanderWeele), PAI: portion attributable to interaction.

e

The root mean square error (RMSE) from the M model and the Y model in step 1 respectively.

f

The mediator was fixed at 4.8 (BMI=24).

g

To simulate YRIE, YMIE, and YPAI, we assigned zero for the coefficients for random variables intervention “X” and the potential mediator “Mx” but not the coefficients for the product term between these two variables to mimic “de-activating” the direct and indirect path from X to Y, leaving only a specific type of “interaction” between X and M to transmit the effect of X to Y.

h

The mediated interaction captures the interaction between X and a version of M that is due to X only. Thus, to simulate Mmed, the mediator M responds to no other determinants of M but X.

Results from the illustrative example were presented in Table 5. This example is for illustration purpose and thus the results were not intended for quotation. Estimates followed the effect decompositions as described in Table 1. Smoking had an overall negative impact on health (TE: −0.96, 95% CI: −1.79, −0.13), but the majority of this impact was through pathways other than changing BMI (PDE: −0.70, 95% CI: −1.54, 0.14). When BMI was fixed at 24 for everyone, smoking did not appear to affect health directly (CDE: 0.27, 95% CI: −0.95, 1.50). In a hypothetical intervention where BMI was no longer affected by smoking and other covariates, smoking still had a negative impact on health (CDEsto: −0.81, 95% CI: −1.63, 0.02). The impact of smoking on health that was due to interaction with BMI (RIE: −0.99, 95% CI: −1.71, −0.26) was much larger than the part that was due to both mediation and interaction (MIE: −0.14, 95% CI: −0.27, −0.01). The presence of such smoking-BMI interaction contributed to the difference seen when comparing CDE to PDE and CDEsto.

Table 5.

Effect estimate (95% Confidence Interval) for the illustrative examplea using g-computation of marginal structure modelsb (N=5326)

Effect b (95% CI)
Total Effect (TE) −0.96 (−1.79, −0.13)
Pure Direct Effect (PDE) −0.70 (−1.54, 0.14)
Total Indirect Effect (TIE) −0.26 (−0.42, −0.10)
Total Direct Effect (TDE) −0.87 (−1.69, −0.04)
Pure Indirect Effect (PIE) −0.12 (−0.22, −0.02)
Controlled Direct Effectc (CDE) 0.27 (−0.95, 1.50)
Stochastic Controlled Direct Effectd (CDEsto) −0.81 (−1.63, 0.02)
Reference Interaction Effect (RIE) −0.99 (−1.71, −0.26)
Mediated Interaction Effect (MIE) −0.14 (−0.27, −0.01)
Portion Attributable to Interaction (PAI) −1.14 (−1.96, −0.32)
a

Exposure: smoking (1=yes, 0=no); mediator: body mass index (BMI, 5-unit increase); outcome: composite health score; covariates: age, gender, education, urbanicity, depression.

b

Effect estimates were based on 1000 simulation replicates and confidence intervals were based on 200 bootstrapped samples paired with 200 simulation replicates.

c

The mediator was fixed at 4.8 (BMI=24) for every individual in the sample.

d

The mediator was allowed to obtain a certain distribution of M’ where the mediator has the mean (mean BMI=20.7) and the variability of the observed mediator but was independent of its determinants (exposure and covariates).

Discussion

In this article we demonstrated the utility of parametric g-computation in estimating various marginal effects under different and detailed effect decompositions as well as stochastic controlled direct effects, using Monte Carlo simulations in standard statistical software. To our knowledge, this is the first use of g-computation for 3- and 4- way decomposition of effects recently introduced by VanderWeele [9]. Our approach yielded similar results as those obtained from VanderWeele’s approach [9] (online supplementary materials). However, marginal (standardized) effect measures obtained via g-computation approach are not conceptually equal to the conditional (on covariates) effect measures obtained from the latter approach and results from the two approaches may differ [36]. Alternative imputation [37] or simulation [3] based methods are also available for common mediation parameters.

G-computation has several strengths. It uses models for the outcome and the mediator, which produces more efficient estimates (with narrower confidence intervals) than the weighting approaches that use models for the mediator and/or the exposure [37,38]. It can be used to estimate various types of effect of interest, incorporate nonlinearities and exposure-covariate and mediator-covariate interactions, and deal with general types of outcome, exposure and mediator. This simulation-based approach can be used to estimate various effects on both difference and ratio scales.

However, g-computation is not without its limitations. The parametric g-computation method applied in mediation settings, like the general g-formula, relies on a correctly specified model for the outcome. For natural effects specifically, it additionally requires that the model for the mediator is correctly specified, as with other approaches published previously [3842]. When such parametric distribution for M is in doubt, a distribution-free approach with regard to the mediator [43] can be used. Alternative approaches are to use non-parametric g-computation method that combines bootstrapping and simulation as suggested in Imai et al. [3] or implement a doubly robust estimator that requires at least one of the model for exposure and mediator being correctly specified [37]. In addition, the computation time depends on the sample size and the number of covariates. As these two numbers increase, a random subset of the sample can be selected to perform the Monte Carlo simulation [4].

G-computation in mediation analysis, especially parametric g-formula implemented via Monte Carlo simulation, can be seen as a special application of the longitudinal time-varying g-computation formula [1,7]. In this case, both post-baseline exposure and confounders that are affected by exposure can be seen as mediators. The steps are similar in simulating the baseline confounders and exposure intervention first and then the consequences of the exposure following the data structure represented in a specific DAG, though additional assumptions are needed in the mediation setting. G-computation as a unifying and flexible framework will gain popularity with the increase in applications of mediation analysis to answer mechanistic questions about either contextual or individual level causes [13,44,45].

By showing the steps for g-computation in estimating different quantities of interest in causal mediation analysis, we hope to encourage a wider audience of applied researchers to implement this framework, using software packages of their choice. Given the growing interest in adopting and applying complex systems approaches to examine complex disease etiologies, this method, especially with its simulation component, will be an important intermediate step towards this journey [46].

Supplementary Material

10654_2015_100_MOESM1_ESM

Acknowledgments

Sources of funding

AW was supported by a doctoral scholarship from the Chinese Scholarship Council (CSC), and the Dissertation Year Fellowship from the University of California, Los Angeles. OAA was partly supported by grant R01-HD072296-01A1 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The authors benefited from facilities and resources provided by the California Center for Population Research at UCLA (CCPR), which receives core support (R24-HD041022) from the Eunice Kennedy Shriver NICHD.

Reference

  • 1.Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393–1512. [Google Scholar]
  • 2.Robins JM, Hernan MA. Estimation of the causal effects of time-varying exposures. In: Verbeke G, Davidian M, Fitzmaurice G, Molenberghs G, editors. Longitudinal Data Analysis. Chapman and Hall/CRC; 2009. pp. 553–599. [Google Scholar]
  • 3.Imai K, Keele L. Tingley DA general approach to causal mediation analysis. Psychol Methods. 2010;15(4):309–334. doi: 10.1037/a0020761. [DOI] [PubMed] [Google Scholar]
  • 4.Daniel RM, De Stavola BL, Cousens SN. gformula: Estimating causal effects in the presence of time-varying confounding or mediation using the g-computation formula. Stata J. 2011;11(4):479–517. [Google Scholar]
  • 5.Valeri L, Vanderweele TJ. Mediation Analysis Allowing for Exposure-Mediator Interactions and Causal Interpretation: Theoretical Assumptions and Implementation With SAS and SPSS Macros. Psychol Methods. 2013;18(2):137–150. doi: 10.1037/a0031034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173(7):731–738. doi: 10.1093/aje/kwq472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.VanderWeele T, Tchetgen ET. Mediation Analysis with Time-Varying Exposures and Mediators. Harvard Univ Biostat Work Pap Ser. 2014. Working Paper 168. Available at: http://biostats.bepress.com/harvardbiostat/paper168. [Google Scholar]
  • 8.VanderWeele TJ. A three-way decomposition of a total effect into direct, indirect, and interactive effects. Epidemiology. 2013;24(2):224–232. doi: 10.1097/EDE.0b013e318281a64e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.VanderWeele TJ. A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 2014;25(5):749–761. doi: 10.1097/EDE.0000000000000121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Vansteelandt S. Estimation of Direct and Indirect Effects. In: Berzuini C, Dawid P, Bernardinelli L, editors. Causality: Statistical Perspectives and Applications. Chichester, UK: John Wiley & Sons, Ltd; 2012. pp. 126–150. [Google Scholar]
  • 11.Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3(2):143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
  • 12.Pearl J. Direct and Indirect Effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence; San Francisco. Morgan Kaufmann; 2001. pp. 411–420. Available at: http://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf. [Google Scholar]
  • 13.Naimi AI, Moodie EEM, Auger N, Kaufman JS. Stochastic Mediation Contrasts in Epidemiologic Research: Interpregnancy Interval and the Educational Disparity in Preterm Delivery. Am J Epidemiol. 2014;180(4):436–445. doi: 10.1093/aje/kwu138. [DOI] [PubMed] [Google Scholar]
  • 14.Rubin DB. Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by Basu. J Am Stat Assoc. 1980;75(371):591–593. [Google Scholar]
  • 15.Rubin DB. Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990;5:472–480. [Google Scholar]
  • 16.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
  • 17.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J. Epidemiol. Community Health. 2006;60:578–586. doi: 10.1136/jech.2004.029496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vanderweele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology. 2014;25:300–306. doi: 10.1097/EDE.0000000000000034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Imai K, Keele L, Tingley D, Yamamoto T. Am. Polit. Sci. Rev. Vol. 105. Cambridge University Press; 2011. Unpacking the Black Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies; pp. 765–789. [Google Scholar]
  • 20.VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–551. doi: 10.1097/EDE.0b013e3181df191c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Richardson TS, Robins JM. Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality. 2013 Available at: https://www.csss.washington.edu/Papers/wp128.pdf. [Google Scholar]
  • 22.Naimi AI, Kaufman JS, MacLehose RF. Mediation misgivings: ambiguous clinical and public health interpretations of natural direct and indirect effects. Int J Epidemiol. 2014;43(5):1656–1661. doi: 10.1093/ije/dyu107. [DOI] [PubMed] [Google Scholar]
  • 23.Tchetgen Tchetgen EJ, Vanderweele TJ. Identification of natural direct effects when a confounder of the mediator is directly affected by exposure. Epidemiology. 2014;25(2):282–291. doi: 10.1097/EDE.0000000000000054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Daniel RM, De Stavola BL, Cousens SN, Vansteelandt S. Causal mediation analysis with multiple mediators. Biometrics. 2015;71:1–14. doi: 10.1111/biom.12248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Daniel RM, Cousens SN, De Stavola BL, Kenward MG, Sterne JaC. Methods for dealing with time-dependent confounding. Stat. Med. 2013;32:1584–1618. doi: 10.1002/sim.5686. [DOI] [PubMed] [Google Scholar]
  • 26.Keil AP, Edwards JK, Richardson DB, Naimi AI, Cole SR. The Parametric g-Formula for Time-to-event Data: Intuition and a Worked Example. Epidemiology. 2014;25:889–897. doi: 10.1097/EDE.0000000000000160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Young JG, Cain LE, Robins JM, O’Reilly EJ, Hernán MA. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Stat. Biosci. 2011;3:119–143. doi: 10.1007/s12561-011-9040-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, et al. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Stat. Med. 2012;31:2000–2009. doi: 10.1002/sim.5316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ahern J, Hubbard A, Galea S. Estimating the effects of potential public health interventions on population disease burden: a step-by-step illustration of causal inference methods. Am. J. Epidemiol. 2009;169:1140–1147. doi: 10.1093/aje/kwp015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Taubman SL, Robins JM, Mittleman Ma, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int. J. Epidemiol. 2009;38:1599–1611. doi: 10.1093/ije/dyp192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cole SR, Richardson DB, Chu H, Naimi AI. Analysis of occupational asbestos exposure and lung cancer mortality using the g formula. Am. J. Epidemiol. 2013;177:989–996. doi: 10.1093/aje/kws343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Danaei G, Pan A, Hu FB, Hernán MA. Hypothetical midlife interventions in women and risk of type 2 diabetes. Epidemiology. 2013;24:122–128. doi: 10.1097/EDE.0b013e318276c98a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Garcia-Aymerich J, Varraso R, Danaei G, Camargo CA, Hernán MA. Incidence of adult-onset asthma after hypothetical interventions on body mass index and physical activity: an application of the parametric g-formula. Am. J. Epidemiol. 2014;179:20–26. doi: 10.1093/aje/kwt229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–688. [Google Scholar]
  • 35.World Health Organization. World Health Survey: Guide to Administration and Question by Question Specifications. [Accessed October 22, 2015];Geneva. 2002 Available at: http://www.who.int/healthinfo/survey/whsshortversionguide.pdf.
  • 36.Simpson EH. The interpretation of interaction in contingency tables. J. R. Stat. Soc. Ser. B. 1951;13:238–241. [Google Scholar]
  • 37.Vansteelandt S, Bekaert M, Lange T. Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiol Method. 2012;1(1):131–158. [Google Scholar]
  • 38.Tchetgen EJT, Shpitser I. Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis. Ann Stat. 2012;40(3):1816–1845. doi: 10.1214/12-AOS990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tchetgen Tchetgen E, Shpitser I. Semiparametric Estimation of Models for Natural Direct and Indirect Effects. Harvard Univ Biostat Work Pap Ser. 2011 [Google Scholar]
  • 40.Van der Laan MJ, Petersen ML. Direct Effect Models. Int J Biostat. 2008;4(1) doi: 10.2202/1557-4679.1064. [DOI] [PubMed] [Google Scholar]
  • 41.VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20(1):18–26. doi: 10.1097/EDE.0b013e31818f69ce. [DOI] [PubMed] [Google Scholar]
  • 42.Imai K, Keele L, Yamamoto T. Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Stat Sci. 2010;25(1):51–71. [Google Scholar]
  • 43.Albert JM. Distribution-free mediation analysis for nonlinear models with confounding. Epidemiology. 2012;23(6):879–888. doi: 10.1097/EDE.0b013e31826c2bb9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zhang YT, Laraia BA, Mujahid MS, et al. Does Food Vendor Density Mediate the Association Between Neighborhood Deprivation and BMI? Epidemiology. 2015;26(3):344–352. doi: 10.1097/EDE.0000000000000271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Jackson JW, VanderWeele TJ, Viswanathan a, Blacker D, Schneeweiss S. The Explanatory Role of Stroke as a Mediator of the Mortality Risk Difference Between Older Adults Who Initiate First- Versus Second-Generation Antipsychotic Drugs. Am J Epidemiol. 2014;180(8):847–852. doi: 10.1093/aje/kwu210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hernán MA. Invited commentary: Agent-based models for causal inference—reweighting data and theory in epidemiology. Am J Epidemiol. 2015;181(2):103–105. doi: 10.1093/aje/kwu272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pearl J. The causal mediation formula--a guide to the assessment of pathways and mechanisms. Prev Sci. 2012;13(4):426–436. doi: 10.1007/s11121-011-0270-1. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

10654_2015_100_MOESM1_ESM

RESOURCES