Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jan 12.
Published in final edited form as: Ann Stat. 2012 Jun;40(3):1816–1845. doi: 10.1214/12-AOS990

Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis

Eric J Tchetgen Tchetgen #,†,*, Ilya Shpitser
PMCID: PMC4710381  NIHMSID: NIHMS746449  PMID: 26770002

Abstract

Whilst estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure not through or through a mediator variable that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies, that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.

Key Words and Phrases: Natural direct effects, Natural indirect effects, double robust, mediation analysis, local efficiency

1 Introduction

The evaluation of the total causal effect of a given point exposure, treatment or intervention on an outcome of interest is arguably the most common objective of experimental and observational studies in the fields of epidemiology, biostatistics and in the social sciences. However, in recent years, investigators in these various fields have become increasingly interested in making inferences about the direct or indirect pathways of the exposure effect not through or through a mediator variable that occurs subsequently to the exposure and prior to the outcome. Recently, the counterfactual language of causal inference has proven particularly useful for formalizing mediation analysis. Indeed, causal inference offers a formal mathematical framework for defining varieties of direct and indirect effects, and for establishing necessary and sufficient identifying conditions of these effects. A notable contribution of causal inference to the literature on mediation analysis is the key distinction drawn between so-called controlled direct and indirect effects versus natural direct and indirect effects. In words, the controlled direct effect refers to the exposure effect that arises upon intervening to set the mediator to a fixed level that may differ from its actual observed value (Robins and Greenland, 1992, Pearl, 2001, Robins, 2003). In contrast, the natural (also known as pure) direct effect captures the effect of the exposure when one intervenes to set the mediator to the (random) level it would have been in the absence of exposure (Robins and Greenland, 1992, Pearl 2001). The controlled direct effect combines with the controlled indirect effect to produce the joint effect of the exposure and the mediator, whereas, the natural direct and indirect effects combine to produce the exposure total effect. As noted by Pearl (2001), controlled direct and indirect effects are particularly relevant for policy making whereas natural direct and indirect effects are more useful for understanding the underlying mechanism by which the exposure operates.

To formally define natural direct and indirect effects first requires defining counterfactuals. We assume that for each level of a binary exposure E, and of a mediator variable M, there exist a counterfactual variable Ye,m corresponding to the outcome Y had possibly contrary to fact the exposure and mediator variables taken the value (e, m). Similarly, for E = e, we assume there exist a counterfactual variable Me corresponding to the mediator variable had possibly contrary to fact the exposure variable taken the value e. The current paper concerns the decomposition of the total effect of E on Y in terms of natural direct and natural indirect effects, which expressed on the mean difference scale, is given by:

E(Ye=1Ye=0)total effect=E(Ye=1,Me=1Ye=0,Me=0)=E(Ye=1,Me=1Ye=1,Me=0)natural indirect effect+E(Ye=1,Me=0Ye=0,Me=0)natural direct effect. (1)

where E stands for expectation.

In an effort to account for confounding bias when estimating causal effects, such as the average total effect (1) from non-experimental data, investigators routinely collect and adjust for in data analysis, a large number of confounding factors. Because of the curse of dimensionality, nonparametric methods of estimation are typically not practical in such settings, and one usually resorts to one of two dimension-reduction strategies; either one relies on a model for the outcome given exposure and counfounders, or alternately one relies on a model for the exposure, i.e. the propensity score. Recently, powerful semiparametric methods have been developed to analyze observational studies, that produce so-called double robust and highly efficient estimates of the exposure total causal effect (Robins, 1999, Scharfstein, Rotnitzky and Robins, 1999, Bang and Robins, 2005, Tsiatis, 2006) and similar methods have also been developed to estimate controlled direct and indirect effects (Goetgeluk, Vansteelandt and Goetghebeur, 2008). An important advantage of a double robust method is that it carefully combines both of the aforementioned dimension reduction strategies for confounding adjustment, to produce an estimator of the causal effect that remains consistent and asymptotically normal provided at least one of the two strategies is correct, without necessarily knowing which strategy is indeed correct (van der Laan and Robins, 2003). Unfortunately, similar methods for making semiparametric inferences about marginal natural direct and indirect effects are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about marginal natural direct and indirect effects on the mean of an outcome, while appropriately accounting for a large number of confounding factors for the exposure and the mediator variables.

Our semiparametric framework is particularly appealing, as it gives new insight on issues of efficiency and robustness in the context of mediation analysis. Specifically, in Section 2, we adopt the sequential ignorability assumption of Imai et al (2010) under which, in conjunction with the standard consistency and positivity assumptions, we derive the efficient influence function and thus obtain the semiparametric efficiency bound for the natural direct and natural indirect marginal mean causal effects, in the nonparametric model ℳnonpar in which the observed data likelihood is left unrestricted. We further show that in order to conduct mediation inferences in ℳnonpar, one must estimate at least a subset of the following quantities:

  1. the conditional expectation of the outcome given the mediator, exposure and confounding factors;

  2. the density of the mediator given the exposure and the confounders;

  3. the density of the exposure given the confounders.

Ideally, to minimize the possibility of modeling bias, one may wish to estimate each of these quantities nonparametrically; however, as previously argued, when as we assume throughout, we wish to account for numerous confounders, such nonparametric estimates will likely perform poorly infinite samples. Thus, in Section 2.3 we develop an alternative multiply robust strategy. To do so, we propose to model (i), (ii) and (iii) parametrically (or semiparametrically), but rather than obtaining mediation inferences that rely on the correct specification of a specific subset of these models, instead we carefully combine these three models to produce estimators of the marginal mean direct and indirect effects that remain consistent and asymptotically normal (CAN) in a union model where at least one but not necessarily all of the following conditions hold:

  1. the parametric or semi-parametric models for the conditional expectation of the outcome (i) and for the conditional density of the mediator (ii) are correctly specified;

  2. the parametric or semiparametric models for the conditional expectation of the outcome (i) and for the conditional density of the exposure (iii) are correctly specified;

  3. the parametric or semiparametric models for the conditional densities of the exposure and the mediator (ii) and (iii) are correctly specified.

Accordingly, we define submodels ℳa, ℳb and ℳc of ℳnonpar corresponding to models (a), (b) and (c) respectively. Thus, the proposed approach is triply robust as it produces valid inferences about natural direct and indirect effects in the union model ℳunion = ℳa∪ℳb∪ℳc. Furthermore, as we later show in Section 2.3, proposed estimators also locally semiparametric efficient in the sense that they achieve the respective efficiency bounds for estimating the natural direct and indirect effects in ℳunion, at the intersection submodel ℳa∩ℳb∩ℳc = ℳa∩ℳc = ℳa∩ℳb = ℳb∩ℳc⊂ℳunion⊂ℳnonpar.

Section 3 summarizes a simulation study illustrating the finite sample performance of the various estimators described in Section 2, and Section 4 gives a real data application of these methods. Section 5 describes a strategy to improve the stability of the proposed multiply robust estimator which directly depends on inverse exposure and mediator density weights, when such weights are highly variable, and Section 6 demonstrates the favorable performance of two modified multiply robust estimators in the context of such highly variable weights. In Section 7, we compare the proposed methodology to the prevailing estimators in the literature. Based on this comparison, we conclude that the new approach should generally be preferred because an inference under the proposed method is guaranteed to remain valid under many more data generating laws than an inference based on each of the other existing approaches. In particular, as we argue below the approach of van der Laan and Petersen (2005) is not entirely satisfactory because, despite producing a CAN estimator of the marginal direct effect under the union model ℳa∪ℳc (and therefore an estimator that is double robust), their estimator requires a correct model for the density of the mediator. Thus unlike the direct effect estimator developed in this paper, the van der Laan estimator fails to be consistent under the submodel ℳb⊂ℳunion. Nonetheless, the estimator of van der Laan is in fact locally efficient in model ℳa∪ℳc, provided the model for the mediator’s conditional density is either known, or can be efficiently estimated. This property is confirmed in a supplementary online appendix, where we also provide a general map that relates the efficient influence function for model ℳunion to the corresponding influence function for model ℳa∪ℳc assuming an arbitrary parametric or semiparametric model for the mediator conditional density is correctly specified. In Section 8, we describe a novel double robust sensitivity analysis framework to assess the impact on inferences about the natural direct effect, of a departure from the ignorability assumption of the mediator variable. We conclude with a brief discussion.

2 The nonparametric mediation functional

2.1 Identification

Suppose i.i.d data on O = (Y, E, M, X) is collected for n subjects. Recall that Y is an outcome of interest, E is a binary exposure variable, M is a mediator variable with support S, known to occur subsequently to E and prior to Y, and X is a vector of pre-exposure variables with support X that confound the association between (E, M) and Y. The overarching goal of this paper is to provide some theory of inference about the fundamental functional of mediation analysis which Judea Pearl calls “the mediation causal formula” (Pearl, 2010) and which expressed on the mean scale, is:

θ0=S×XE(Y|E=1,M=m,X=x)fM|E,X(m|E=0,X=x)fX(x)dμ(m,x), (2)

fM|E,X and fX are respectively the conditional density of the mediator M given (E, X) and the density of X, and μ is a dominating measure for the distribution of (M, X). Hereafter, to keep with standard statistical parlance, we shall simply refer to θ0 as the “mediation functional” or “M-functional” since it is formally a functional on the nonparametric statistical model ℳnonpar = {FO (·): FO unrestricted} of all regular laws FO of the observed data O that satisfy the positivity assumption given below; i.e. θ0 = θ0 (FO): ℳnonpar →ℛ, with ℛ the real line. The functional θ0 is of keen interest here because it arises in the estimation of natural direct and indirect effects which we describe next. To do so, we make the consistency assumption:

Consistency

ifE=e,thanMe=Mw.p.1,and ifE=eandM=mthenYe,m=Yw.p.1.

In addition, we adopt the sequential ignorability assumption of Imai et al (2010) which states that for e, e′ ∈ {0, 1}:

Sequential ignorability

{Ye,m,Me}E|X,Ye,mM|E=e,X,

where AB|C states that A is independent of B given C; paired with the following:

positivity

fM|E,X(m|E,X)>0w.p.1for eachmS,andfE|X(e|X)>0w.p.1for eache{0,1}.

Then, under the consistency, sequential ignorability and positivity assumptions, Imai et al (2010) showed that:

θ0=E(Y1,M0),andδeXE(Y|E=e,X=x)fX(x)dμ(x)=S×XE(Y|E=e,M=m,X=x)fM|E,X(m|E=e,X=x)fX(x)dμ(m,x)=E(Ye)=E(Ye,Me),e=0,1, (3)

so that E(Y1,M0) and E(Ye), e = 0, 1, are identified from the observed data, and so is the mean natural direct effect E(Y1,M0)E(Y0)=θ0δ0 and the mean natural indirect effect E(Y1)E(Y1,M0)=δ1θ0. For binary Y, one might alternatively consider the natural direct effect on the risk ratio scale E(Y1,M0)/E(Y0)=θ0/δ0 or on the odds ratio scale {E(Y1,M0)E(1Y0)}/{E(1Y1,M0)E(Y0)}={θ0(1δ0)}/{δ0(1θ0)} and similarly defined natural indirect effects on the risk ratio and odds ratio scales. It is instructive to contrast the expression (2) for E(Y1,M0) with the expression (3) for e = 1 corresponding to E(Y1), and to note that the two expressions bare a striking resemblance except the density of the mediator in the first expression conditions on the unexposed (with E = 0) whereas in the second expression, the mediator density is conditional on the exposed (with E = 1). As we demonstrate below, this subtle difference has remarkable implications for inference.

Pearl (2001) was the first to derive the M-functional θ0=E(Y1,M0) under a different set of assumptions. Others have since contributed alternative sets of identifying assumptions. In this paper, we have chosen to work under the sequential ignorability assumption of Imai et al(2010a,b) but note that alternative related assumptions exist in the literature (Robins and Greenland, 1992, Pearl, 2001, Petersen and van der Laan, 2005, Hafeman and Vanderweele, 2010). Although, we note that Robins and Richardson (2010) disagree with the label “sequential ignorability” because its terminology has previously carried a different interpretation in the literature. Nonetheless, the assumption entails two ignorability-like assumptions that are made sequentially. First, given the observed pre-exposure confounders, the exposure assignment is assumed to be ignorable, that is, statistically independent of potential outcomes and potential mediators. The second part of the assumption states that the mediator is ignorable given the observed exposure and pre-exposure confounders. Specifically, the second part of the sequential ignorability assumption is made conditional on the observed value of the ignorable treatment and the observed pretreatment confounders. We note that the second part of the sequential ignorability assumption is particularly strong and must be made with care. This is partly because, it is always possible that there might be unobserved variables that confound the relationship between the outcome and the mediator variables even upon conditioning on the observed exposure and covariates. Furthermore, the confounders X must all be pre-exposure variables, i.e. they must precede E. In fact, Avin et al (2005) proved that without additional assumptions, one cannot identify natural direct and indirect effects if there are confounding variables that are affected by the exposure even if such variables are observed by the investigator. This implies that similar to the ignorability of the exposure in observational studies, ignorability of the mediator cannot be established with certainty even after collecting as many pre-exposure confounders as possible. Furthermore, as Robins and Richardson (2010) point out, whereas the first part of the sequential ignorability assumption could in principle be enforced in a randomized study, by randomizing E within levels of X; the second part of the sequential ignorability assumption cannot similarly be enforced experimentally, even by randomization. And thus for this latter assumption to hold, one must entirely rely on expert knowledge about the mechanism under study. For this reason, it will be crucial in practice to supplement mediation analyses with a sensitivity analysis that accurately quantifies the degree to which results are robust to a potential violation of the sequential ignorability assumption. Later in the paper, we develop a set of sensitivity analyses that will allow the analyst to quantify the degree to which his or her mediation analysis results are robust to a potential violation of the sequential ignorability assumption.

2.2 Semiparametric efficiency bounds for ℳnonpar

In this section, we derive the efficient influence function for the M-functional θ0 in ℳnonpar, this result is then combined with the efficient influence function for the functional δe (Robins, Rotnitzky and Zhao, 1994, Hahn, 1998) to obtain the efficient influence function for the natural direct and indirect effects, on the mean difference scale. Thus, in the following, we shall use the efficient influence function Sδeeff,nonpar(δe) of δe which is well known to be:

I(E=e)fE|X(e|X){Yη(e,e,X)}+η(e,e,X)+δe,

where for e, e* ∈ {0, 1}, we define

η(e,e,X)=SE(Y|X,M=m,E=e)fM|E,X(m|E=e,X)dμ(m),

so that η(e,e,X)=E(Y|X,E=e), e = 0, 1.

The following theorem is proved in the appendix

Theorem 1

Under the consistency, sequential ignorability and positivity assumptions, the efficient influence function of the M-functional θ0 in model ℳnonpar is given by Sθ0eff,nonpar(θ0)=

Sθ0eff,nonpar(O;θ0)=I{E=1}fM|E,X(M|E=0,X)fE|X(1|X)fM|E,X(M|E=1,X){YE(Y|X,M,E=1)]}+I(E=0)fE|X(0|X){E(Y|X,M,E=1)η(1,0,X)}+η(1,0,X)θ0,

and the efficient influence function of the natural direct and indirect effects on the mean difference scale in model ℳnonpar are respectively given by SNDEeff,nonpar(θ0,δ0)=SNDEeff,nonpar(O;θ0,δ0)=

Sθ0eff,nonpar(θ0)Sδ0eff,nonpar(δ0)=I{E=1}fM|E,X(M|E=0,X)fE|X(1|X)fM|E,X(M|E=1,X){YE(Y|X,M,E=1)]}+I(E=0)fE|X(0|X){E(Y|X,M,E=1)Yη(1,0,X)+η(0,0,X)}+η(1,0,X)η(0,0,X)θ0+δ0,

and SNIEeff,nonpar(δ1,θ0)=sNIEeff,nonpar(O;δ1,θ0)=

Sθ0eff,nonpar(θ0)Sδ1eff,nonpar(δ1)=I(E=1)fE|X(1|X){Yη(1,1,X)fM|E,X(M|E=0,X)fM|E,X(M|E=1,X){YE(Y|X,M,E=1)]}}I(E=0)fE|X(0|X){E(Y|X,M,E=1)η(1,0,X)+η(1,1,X)η(1,0,X)+θ0δ1.

Thus, the semiparametric efficiency bound for estimating the natural direct and the natural indirect effects in ℳnonpar are respectively given by E{SNDEeff,nonpar(θ0,δ0)2} and E{SNIEeff,nonpar(δ1,θ0)2}.

Although not presented here, Theorem 1 is easily extended to obtain the efficient influence functions and the respective semiparametric efficiency bounds for the direct and indirect effects on the risk ratio and the odds ratio scales by a straightforward application of the delta method. An important implication of the theorem is that all regular and asymptotically linear (RAL) estimators of θ0, δ1θ0 and θ0δ0 in model ℳnonpar share the common influence functions Sθ0eff,nonpar(θ0), SNDEeff,nonpar(θ0,δ0) and SNIEeff,nonpar(δ1,θ0) respectively. Specifically, any RAL estimator θ0^ of the M-functional θ0 in model ℳnonpar, shares a common asymptotic expansion:

n1/2(θ0^θ0)=n1/2nSθ0eff,nonpar(θ0)+oP(1),

where n[]=n1i[]i. To illustrate this property of nonparametric RAL estimators and as a motivation for multiply robust estimation when nonparametric methods are not appropriate, we provide a detailed study of three nonparametric strategies for estimating the M-functional in a simple yet instructive setting in which X and M are both discrete with finite support.

Strategy 1

The first strategy entails obtaining the maximum likelihood estimator upon evaluating the M-functional under the empirical law of the observed data:

θ^0ym=nmSE^(Y|E=1,M=m,X)f^M|E,X(m|E=0,X),

where f^Y|E,M,X and f^M|E,X are the empirical probability mass functions, and E^(Y|E=e,M=m,X=x) is the expectation of Y under f^Y|E,M,X.

Strategy 2

The second strategy is based on the following alternative representation of the M-functional

S×XE(Y|E=1,M=m,X=x)dFM|E(m|E=0,X=x)dFX(x)=e=01S×XE(Y|E=1,M=m,X=x)I(e=0)fE|X(e|X=x)dFM,E,X(m,e,x)=E{I(E=0)fE|X(0|X)E(Y|E=1,M,X)}.

Thus, our second estimator takes the form:

θ^0ye=n{I(E=0)f^E|X(0|X)E^(Y|E=1,M,X)},

with f^E|X the empirical estimate of the probability mass function fE|X.

Strategy 3

The last strategy is based on a third representation of the M-functional

S×XE(Y|E=1,M=m,X=x)dFM|E(m|E=0,X=x)dFX(x)=e=01Y×S×XyI(e=1)fE|X(e|X=x)fM|E,X(M|E=0,X)fM|E,X(M|E,X)dFY,M,E,X(y,m,e,x)=E{YI(E=1)fE|X(E|X)fM|E,X(M|E=0,X)fM|E,X(M|E,X)}.

Thus, our third estimator takes the form:

θ^0em=n{YI(E=1)f^E|X(E|X)f^M|E,X(M|E=0,X)f^M|E,X(M|E,X)}.

At first glance the three estimators θ^0em, θ^0ye and θ^0ym might appear to be distinct, however, we observe that provided the empirical distribution function F^O=F^Y|E,M,X×F^M|E,X×F^E|X×F^X satisfies the positivity assumption, and thus F^Ononpar, then actually θ^0em=θ^0ye=θ^0ym=θ0(F^O) since the three representations agree on the nonparametric model ℳnonpar. Therefore we may conclude that these three estimators are in fact asymptotically efficient in ℳnonpar with common influence function Sθ0eff,nonpar(θ0). Furthermore, from this observation, one further concludes that (asymptotic) inferences obtained using one of the three representations are identical to inferences using either of the other two representations.

At this juncture, we note that the above equivalence no longer applies when as we have previously argued will likely occur in practice, (M, X) contains 3 or more continuous variables and/or X is too high dimensional for models to be saturated or nonparametric, and thus parametric (or semiparametric) models are specified for dimension reduction. Specifically, for such settings, we observe that three distinct modeling strategies are available. Under the first strategy, the estimator θ^0ym,par is obtained as θ^0ym,par using parametric model estimates E^par(Y|E,M,X) and f^M|E,Xpar(m|E,X) instead of their nonparametric counterparts; similarly under the second strategy, the estimator θ^0ye,par is obtained as θ^0ye using estimates of parametric models E^par(Y|E=1,M=m,X) and f^E|Xpar(e|X) and finally, under the third strategy, θ^0em,par is obtained as θ^0em using f^E|Xpar(e|X) and f^M|E,Xpar(m|E,X). Then it follows that θ^0ym,par is CAN under the submodel ℳa, but is generally inconsistent if either E^par (Y|E,M,X) or f^M|E,Xpar(m|E,X) fails to be consistent. Similarly, θ^0ye,par and θ^0em,par are respectively CAN under the submodels ℳb and ℳc, but each estimator generally fails to be consistent outside of the corresponding submodel. In the next section, we propose an approach that produces a triply robust estimator by combining the above three strategies so that only one of models ℳa, ℳb and ℳc needs to be valid for consistency of the estimator.

2.3 Triply robust estimation

The proposed triply robust estimator θ^0triply solves

nS^θ0eff,nonpar(θ^0triply)=0,

where S^θ0eff,nonpar(θ) is equal to Sθ0eff,nonpar(θ) evaluated at {E^par(Y|E,M,X), f^M|E,Xpar(m|E,X), f^E|Xpar(e|X)}; that is

θ^0triply=n[I{E=1}f^M|E,Xpar(M|E=0,X)f^E|Xpar(1|X)f^M|E,Xpar(M|E=1,X){YE^per(Y|E,M,E=1)]}+I(E=0)f^E|Xpar(0|X){E^per(Y|E,M,E=1)η^par(1,0,X)}+η^par(1,0,X)], (4)

is CAN in model ℳunion = ℳa∪ℳb∪ℳc, where

η^par(e,e,X)=SE^par(Y|X,M=m,E=e)f^M|E,Xpar(m|E=e,X)dμ(m).

In the next theorem, the estimator in the above display is combined with a doubly robust estimator δ^edoubly of δe (see van der Laan and Robins, 2003 or Tsiatis, 2006), to obtain multiply-robust estimators of natural direct and indirect effects, where

δ^edoubly=n[I(E=e)f^E|Xpar(e|X){Yη^par(e,e,X)}+η^par(e,e,X)].

To state the result, we set E^par(Y|X,M,E)=Epar(Y|X,M,E;β^y)=g1(β^yTh(X,M,E)), where g is a known link function h is a user specified function of (X, M, E) so that Epar(Y|X,M,E;βy)=g1(βyTh(X,M,E)) entails a working regression model for E(Y|X,M,E) and β^y solves the estimating equation

0=n[Sy(βy^)]=n[h(X,M,E)(Yg1(β^yTh(X,M,E)))].

Similarly, we set f^M|E,Xpar(m|E,X)=fM|E,Xpar(m|E,X;β^m) for fM|E,Xpar(m|E,X;βm) a parametric model for the density of [M|E,X] with β^m solving

0=n[Sm(β^m)]=n[βmlogfM|E,Xpar(M|E,X;β^m)],

and we set f^E|Xpar(e|X)=fE|Xpar(e|X;β^e) for fE|Xpar(e|X;βe) a parametric model for the density of [E|X] with β^e solving

0=n[Se(β^e)]=n[βelogfE|Xpar(E|X;β^e)].

Theorem 2

Suppose that the assumptions of Theorem 1 hold, and that the regularity conditions stated in the appendix hold and that βm, βe and βy are variation independent.

  1. Mediation functional: Then, n(θ^0triplyθ0) is RAL under model ℳunion with influence function
    Sθ0uninor(θ0,β)=Sθ0eff,nonpar(θ0,β)E{Sθ0eff,nonpar(θ0,β)}βT|βE{Sβ(β)βT|β}1Sβ(β),
    and thus converges in distribution to a N(0,θ0), where
    θ0(θ0,β)=E(Sθ0uninor(θ0,β)2),
    with βT=(βmT,βeT,βyT) and Sβ(β)=(SmT(βm),SeT(βe),SyT(βy))T, and with β* denoting the probability limit of the estimator β^=(β^mT,β^eT,β^yT)T
  2. Natural direct effect: Similarly, n(θ^0triplyδ^0doubly(θ0δ0)) is RAL under model ℳunion with influence function SNDEunion(θ0,δ0,β) defined as Sθ0union(θ0,β) with SNDEeff,nonpar(θ0,δ0,β) replacing Sθ0eff,nonpar(θ0,β), and asymptotic variance θ0δ0(δ1,θ0,β) defined accordingly.

  3. Natural indirect effect: Similarly, n(δ^1doublyθ^0triply(δ1θ0)) is RAL under model union with influence function SNIEunion(δ1,θ0,β) defined as Sθ0union(θ0,β) with SNIEeff,nonpar(δ1,θ0,β) replacing Sθ0eff,nonpar(θ0,β), and asymptotic variance: δ1θ0(δ1,θ0,β) defined accordingly.

  4. θ^0triply, θ^0triplyδ^0doubly and δ^1doublyθ^0triply are semiparametric locally efficient in the sense that they are RAL under model ℳunion and respectively achieve the semiparametric efficiency bound for θ0, θ0δ0, and δ1θ0 under model ℳunion at the intersection submodel ℳa∩ℳb∩ℳc, with respective efficient influence functions: Sθ0eff,nonpar(θ0,β), SNDEeff,nonpar(θ0,δ0,β) and SNIEeff,nonpar(δ1,θ0,β).

Empirical versions of θ0δ0(δ1,θ0,β) and δ1θ0(δ1,θ0,β) are easily obtained, and the corresponding Wald type confidence intervals can be used to make formal inferences about natural direct and indirect effects. It is also straightforward to extend the approach to the risk ratio and odds ratio scales for binary Y. By a theorem due to Robins and Rotnitzky (2001), part iv) of the theorem implies that when all models are correct, θ^0triply, θ^0triplyδ^0doubly and δ^1doublyθ^0triply are semiparametric efficient in model ℳnonpar at the intersection submodel ℳa∩ℳb∩ℳc.

3 A simulation study of estimators of direct effect

In this section, we report a simulation study which illustrates the finite sample performance of the various estimators described in previous sections. We generated 1000 samples of size n = 600, 1000 from the following model:

(Model.X)X1~Bernoulli(0.4);[X2|X1]~Bernoulli(0.3+0.4X1);[X3|X1,X2]~0.0240.4X1+0.4X2+N(0,1));(Model.E)[E|X1,X2,X3]~Bernoulli([1+exp{(0.4+X1X2+0.1X31.5X1X3)}]1);(Model.M)[M|E,X1,X2,X3]Bernoulli([1+exp{(0.5X1+0.5X20.9X3+E1.5X1X3)}]1);(Model.Y)[Y|M,E,X1,X2,X3]1+0.2X1+0.3X2+14X32.5E3.5M+5EM+N(0,1).

We then evaluated the performance of the following four estimators of the natural direct effect θ^0emδ^0doubly, θ^0yeδ^0doubly, θ^0ymδ^0doubly, and θ^0triplyδ^0doubly. Note that the doubly robust estimator δ^0doubly was used throughout to estimate δ0=E(Y0). To assess the impact of modeling error, we evaluated these estimators in four separate scenarios. In the first scenario, all models were correctly specified, whereas the remaining three scenarios respectively mis-specified only one of Model.E, Model.M and Model.Y. In order to mis-specify Model.E and Model.M, we respectively left out the X1 X3 interaction when fitting each model and we assumed an incorrect log-log link function. The incorrect model for Y simply assumed no EM interaction.

Tables 1 and 2 summarize the simulation results which largely agree with the theory developed in the previous sections. Mainly, all proposed estimators performed well at both moderate and large sample sizes in the absence of modeling error. Furthermore, under the partially mis-specified model in which Model.Y was incorrect, both estimators, θ^0yeδ^0doubly and θ^0ymδ^0doubly showed significant bias irrespective of sample size, while θ^0emδ^0doubly and θ^0triplyδ^0doubly both performed well. Similarly when Model.M was incorrect, the estimators θ^0emδ^0doubly and θ^0ymδ^0doubly resulted in large bias, when compared to the relatively small bias of θ^0yeδ^0doubly and θ^0triplyδ^0doubly. Finally, mis-specifying Model.E lead to estimators θ^0yeδ^0doubly and θ^0ymδ^0doubly that were significantly more biased than the estimators θ^0ymδ^0doubly and θ^0triplyδ^0doubly. Interestingly, the efficiency loss of the multiply robust estimator remained small when compared to the consistent non-robust estimator under the various scenarios, suggesting that, at least in this simulation study, the benefits of robustness appear to outweigh the loss of efficiency.

Table 1.

Simulation results n = 600

ym ye em union
All correct bias   0.002 0.008   0.002 0.005
MC s.e.*   0.005   0.007 0.006 0.006
Y wrong bias −0.500 −0.500   0.0001 0.004
MC s.e.   0.005 0.006 0.006 0.006
M wrong bias   0.038 0.008 −0.054 0.003
MC s.e.   0.005   0.007 0.006 0.006
E wrong bias   0.003 0.027 0.059 0.004
MC s.e.   0.005   0.005 0.005 0.005

ym:θ^0ymδ^0doubly; ye:θ^0yeδ^0doubly; em:θ^0emδ^0doubly; union:θ^0triplyδ^0doubly.

*

Monte Carlo standard error

Table 2.

Simulation results n=1000

ym ye em union
All correct bias   0.001 0.009   0.001 0.001
MC s.e.*   0.004   0.005 0.004 0.004
Y wrong bias −0.484 −0.484   0.003 0.003
MC s.e.   0.004 0.004 0.004 0.004
M wrong bias   0.136 −0.008 0.056 0.01
MC s.e.   0.004   0.05 0.004 0.01
E wrong bias   0.001 −0.024 −0.054 0.001
MC s.e.   0.004   0.004 0.004 0.004

ym:θ^0ymδ^0doubly; ye:θ^0yeδ^0doubly; em:θ^0emδ^0doubly; union:θ^0triplyδ^0doubly.

*

Monte Carlo standard error

4 A data application

In this section, we illustrate the methods in a real world application from the psychology literature on mediation. We re-analyze data from The Job Search Intervention Study (JOBS II) also analyzed by Imai et al (2010b). JOBS II is a randomized field experiment that investigates the efficacy of a job training intervention on unemployed workers. The program is designed not only to increase reemployment among the unemployed but also to enhance the mental health of the job seekers. In the study, 1,801 unemployed workers received a pre-screening questionnaire and were then randomly assigned to treatment and control groups. The treatment group with E = 1 participated in job skills workshops in which participants learned job search skills and coping strategies for dealing with setbacks in the job search process. The control group with E = 0 received a booklet describing job search tips. An analysis considers a continuous outcome measure Y of depressive symptoms based on the Hopkins Symptom Checklist (Vinokur, Price, & Schul, 1995; Vinokur & Schul, 1997, Imai et al, 2010b). In the JOBS II data, a continuous measure of job search self-efficacy represented the hypothesized mediating variable M. The data also included baseline covariates X measured before administering the treatment including: pretreatment level of depression, education, income, race, marital status, age, sex, previous occupation, and the level of economic hardship.

Note that by randomization, the density of [E|X] was known by design not to depend on covariates, and therefore its estimation is not prone to modeling error. The continuous outcome and mediator variables were modeled using linear regression models with Gaussian error, with main effects for (E, M, X) included in the outcome regression and main effects for (E, X) included in the mediator regression. Table 3 summarizes results obtained using θ^0em, θ^0ye, θ^0ym and θ^0triply together with δ^edoubly, e = 0, 1, to estimate the direct and indirect effects of the treatment.

Table 3.

Estimated Causal Effects of Interest Using the Job Search Intervention Study Data

ym ye em union
Direct effect Estimate −0.0310 −0.0310  0.0280 −0.0409
s.e.* 0.0124 0.0620 0.0465 0.021 7
Indirect effect Estimate −0.0160 −0.0160 -0.0750 −0.0070
s.e.* 0.0372  0.0620 0.0434 0.021 7
*

Nonparametric bootstrap standard errors

Point estimates of both natural direct and indirect effects closely agreed under models ℳym and ℳye, and also agreed with the results of Imai et al (2010b). We should note that inferences under our choice of ℳym are actually robust to the normality assumption and, as in Imai et al (2010b), only require that the mean structure of [Y|E, M, X] and [M|E, X] are correct. In contrast, inferences under model ℳem require a correct model for the mediator density. This distinction may partly explain the apparent disagreement in the estimated direct effect under ℳem when compared to the other methods, also suggesting that the Gaussian error model for M is not entirely appropriate. The multiply robust estimate of the natural direct effect is consistent with estimates obtained under models ℳym and ℳye, and is statistically significant, suggesting that the intervention may have beneficial direct effects on participants’ mental health; while the multiply robust approach suggests a much smaller indirect effect than all other estimators although none achieved statistical significance.

5 Improving the stability of θ^0triply when weights are highly variable

The triply robust estimator θ^0triply which involves inverse probability weights for the exposure and mediator variables, clearly relies on the positivity assumption, for good finite sample performance. But as recently shown by Kang and Shafer (2007) in the context of missing outcome data, a practical violation of positivity in data analysis can severely compromise inferences based on such methodology; although their analysis did not directly concern the M-functional θ0. Thus, it is crucial to critically examine, as we do below in a simulation study, the extent to which the various estimators discussed in this paper are susceptible to a practical violation of the positivity assumption, and to consider possible approaches to improve the finite sample performance of these estimators in the context of highly variable empirical weights. Methodology to enhance the finite sample behaviour of δ^jdoubly is well studied in the literature and is not considered here, see for example Robins et al (2007), Cao et al (2009) and Tan (2010). We first describe an approach to enhance the finite sample performance of θ^0triply, particularly in the presence of highly variable empirical weights. To focus the exposition, we only consider the case of a continuous Y and a binary M, but in principle, the approach could be generalized to a more general setting. The proposed enhancement involves two modifications.

The first modification adapts to the mediation context, an approach developed for the missing data context (and for the estimation of total effects) in Robins et al (2007). The basic guiding principle of the approach is to carefully modify the estimation of the outcome and mediator models in order to ensure that the triply robust estimator given by equation (4) has the simple M-functional representation

θ^0triply,=n{η^par,(1,0,X)}

where η^par,(1,0,X) is carefully estimated to ensure multiple robustness. The reason for favoring an estimator with the above representation is that it is expected to be more robust to practical positivity violation because it does not directly depend on inverse probability weights. However, as we show next, to ensure multiple robustness, estimation of ηpar involves inverse probability weights, and therefore, θ^0triply, indirectly depends on such weights. Our strategy involves a second step to minimize the potential impact of this indirect dependence on weights.

In the following, we assume to simplify the exposition that a simple linear model is used

Epar(Y|X,M,E=1)=Epar(Y|X,M,1;βy)=[1,XT,M]βy.

Then, similarly to Robins et al (2007), one can verify that the above M-functional representation of a triply robust estimator is obtained by estimating fM|E,Xpar(M|E=0,X) with f^M|E,Xpar,(M|E=0,X) obtained via weighted logistic regression in the unexposed-only, with weight f^M|Epar(0|X)1; and by estimating Epar(Y|X,M,E=1) using weighted OLS of Y on (M, X) in the exposed-only, with weight

f^M|E,Xpar,(M|E=0,X){f^E|Xpar(1|X)f^M|E,Xpar,(M|E=1,X)}1;

provided that both working models include an intercept: The second enhancement to minimize undue influence of variable weights on the M-functional estimator, entails using f^E|Xpar, in the previous step instead of f^E|Xpar, where

logitf^E|Xpar,(1|X)=logitf^E|Xpar(1|X)+C^1

with

C^1=log((1n(E))+log(n[Ef^E|Xpar(0|X)/f^E|Xpar(1/X)])

This second modification ensures a certain boundedness property of inverse propensity score-weighting. Specifically, for any bounded function R = r(Y, M) of Y and M; consider for a moment the goal of estimating the counterfactual mean E{r(Y1,M1)}; then it is well known that even though R is bounded, the simple inverse-probability weighting estimator n{ERf^E|Xpar(1|X)1} could easily be unbounded, particularly if positivity is practically violated. In contrast, as we show next, the estimator n{ERf^E|Xpar,(1|X)1} is generally bounded. To see why, note that

n{ERf^E|Xpar,(1|X)1}=n{ERf^E|Xpar,(0|X)f^E|Xpar,(1|X)1}+n{R}=n{REf^E|Xpar,(0|X)f^E|Xpar,(1|X)1n[Ef^E|Xpar(0|X)f^E|Xpar(1|X)1](1n(E))}+n{R}

which is bounded since the second term is bounded, and the first term is a convex combination of bounded variables, and therefore is also bounded. Furthermore, n[Ef^E|Xpar,(0|X)f^E|Xpar,(1|X)1] converges in probability to (1E(E)) provided that f^E|Xpar converges to fE|X, ensuring that the expression in the above display is consistent for E{r(Y1,M1)}. The nonparametric bootstrap is most convenient for inference using f^E|Xpar,.

In the next section, we study in the context of highly variable weights, the behavior of our previous estimators of θ0 together with that of the enhanced estimators θ^0triply,,j=n{η^par,,j(1,0,X)}, j =1, 2, where η^par,,1 is constructed as described above using f^E|Xpar, and η^par,,2 uses f^E|Xpar,.

6 A simulation study where positivity is practically violated

We adapted to the mediation setting, the missing data simulation scenarios in Kang and Schafer (2007) which were specifically designed so that, when misspecified, working models are nonetheless nearly correct but yield highly variable inverse probability weights with practical positivity violation in the context of estimation. We generated 1000 samples of size n = 200; 1000 from the following model:

(Model.X)Z=Z1,Z2,Z3,Z4~iidN(0,1);X1=exp(Z1/2);X2=Z2/{1+exp(Z1)}+10;X3=(Z1Z3/25+0.6)3andX4=(Z2+Z4+20)2,so thatZmay be expressed in terms ofX.(Model.E)[E|X1,X2,X3]~Bernoulli([1+exp{(Z10.5Z2+0.25Z3+0.1Z4)}]1);(Model.M)[M|E,X1,X2,X3]~Bernoulli([1+exp{(0.5Z1+0.5Z20.9Z3+Z41.5E)}]1)(Model.Y)[Y|M,E,X1,X2,X3]~210+27.4Z1+13.7Z3+13.7Z3+M+E+N(0,1)

Correctly specified working models were thus achieved when an additive linear regression of Y on Z, a logistic regression of M with linear predictor additive in Z and E and a logistic regression of E with linear predictor additive in the Z, respectively. Incorrect specification involved fitting these models with X replacing Z, which produces higly variable weights. For instance, an estimated propensitiy score as small as 5.5 × 10−33 occured in the simulation study reflecting an effective violation of positivity; similarly, a mediator predicted probability as small as 3 × 10−20 also occured in the simulation study.

Tables 4 and 5 summarize simulation results for θ^0ym, θ^0ye, θ^0em, θ^0triply, θ^0triply,,1 and θ^0triply,,2. When all three working models are correct, all estimators perform well in terms of bias, but there are clear differences between the estimators in terms of efficiency. In fact, θ^0ym, θ^0triply, θ^0triply,,1 and θ^0triply,,2 have comparable efficiency for n = 200, 1000, but θ^0ye, θ^0em is far more variable. Moreover, under mis-specification of a single model, θ^0triply, θ^0triply,,1 and θ^0triply,,2 remain nearly unbiased, and for the most part substantially more efficient than the corresponding consistent estimator in {θ^0ym,θ^0ye,θ^0em}. When at least two models are mis-specified, the multiply robust estimators θ^0triply, θ^0triply,,1 and θ^0triply,,2 generally outperform the other estimators, although θ^0triply occasionally succumbs to the unstable weights resulting in disastrous mean squared error; see Table 5 when Model.M and Model.E are both incorrect. In contrast, θ^0triply,,2 generally improves on θ^0triply,,1 which generally outperforms θ^0triply and for the most part θ^0triply,,1 and θ^0triply,,2 appears to eliminate any possible deleterious impact of highly variable weights.

Table 4.

Simulation results n = 200

ym ye em union
union,1
union,2
All correct bias 0.001 −0.207 0.498 0.003 −0.08 −0.079
MC s.e.* 2.614 8.333 20.214 2.615 1 2.615 5 2.6153
Y wrong bias −9.87 −10.221 0.498 −0.147 −0.502 −0.202
MC s.e. 3.322 10.539 20.214 4.461 3.177 3.141
M wrong bias −0.033 −0.207 −9.497 0.001 0.046 0.046
MC s.e. 2.613 8.333 15.376 2.615 2.614 2.614
E wrong bias −0.001 0.132 210.450 0.066 −0.089 0.087
MC s.e. 2.614 4.373 2336.92 4.891 2.619 2.615
Y, E wrong bias −9.869 −13.535 210.454 −33.090 −1.4609 −2.487
MC s.e. 3.322 5.256 2336.92 375.334 5.187 4.245
Y, M wrong bias −9.355 −10.220 −9.496 −4.346 −3.579 −3.579
MC s.e. 3.224 10.539 15.376 3.912 3.480 3,441
E, M wrong bias −0.032 0.132 205.060 0.088 −0.001 −3.77×10−5
MC s.e. 2.614 4.373 2289.788 4.763 2.623 2.618
Y, E, M wrong bias −9.355 −13.535 205.060 −37.757 −4.223 −5.253
MC s.e. 3.224 5.356 2289.78 379.122 5.835 4.828

ym:θ^0ym; ye:θ^0ye; em:θ^0em; union:θ^0triply; union,1:θ^0triply,,1; union,2:θ^0triply,,2.

*

Monte Carlo standard error

Table 5.

Simulation results n = 1000

ym ye em union
union,1
union,2
All correct bias 0.0324 0.004 −0.106 0.034 −0.047 −0.047
MC s.e. 1.136 3.06 6.490 1.136 1.137 1.137
Y wrong bias −10.256 −10.305 −0.106 0.063 −0.147 −0.148
MC s.e. 1.675 4.005 6.490 1.769 1.419 1.407
M wrong bias −5×104 0.004 −9.706 0.033 0.076 0.076
MC s.e. 1.136 3.060 5.395 1.137 1.137 1.135
E wrong bias 0.032 0.135 2.4×106 1908.76 −0.038 −0.030
MC s.e. 1.136 1.794 4.3×107 53911.63 1.400 1.242
Y, E wrong bias −10.256 −14.011 2.4×106 −1.1×106 6.201 1.024
MC s.e. 1.675 2.386 4.3×107 2.1×107 9.406 5.097
Y, M wrong bias −9.705 −10.305 −9.706 −4.216 −3.555 −3.557
MC s.e. 1.626 4.004 5.395 1.667 1.527 1.510
E, M wrong bias 5.7×104 0.135 2.5×106 2034.83 0.0539 0.0599
MC s.e. 1.136 1.794 4.6×107 56090.10 1.429 1.272
Y, E, M wrong bias −9.075 −14.011 2.5×106 −1.2×106 4.659 −0.755
MC s.e. 1.626 2.386 4.6×107 2.2×107 10.121 5.910

ym:θ^0ym; ye:θ^0ye; em:θ^0em; union:θ^0triply; union,1:θ^0triply,,1; union,2:θ^0triply,,2.

*

Monte Carlo standard error

7 A comparison to some existing estimators

In this section, we briefly compare the proposed approach to some existing estimators in the literature. Perhaps the most common approach for estimating direct and indirect effects when Y is continuous uses a system of linear structural equations; whereby, a linear structural equation for the outcome given the exposure, the mediator and the confounders is combined with a linear structural equation for the mediator given the exposure and confounders to produce an estimator of natural direct and indirect effects. The classical approach of Baron and Kenny (1986) is a particular instance of this approach. In recent work mainly motivated by Pearl’s mediation functional, several authors (Imai et al, 2010, Pearl, 2010, VanderWeele, 2009, VanderWeele and Vansteedlandt, 2010) have demonstrated how the simple linear structural equation approach generalizes to accommodate both, the presence of an interaction between exposure and mediator variables, and a nonlinear link function either in the regression model for the outcome or in the regression model for the mediator, or both. In fact, when the effect of confounders is also modeled in such structural equations, inferences based on the latter can be viewed as special instances of inferences obtained under a particular specification of model ℳa for the outcome and the mediator densities. And thus, as previously shown in the simulations, an estimator obtained under a system of structural equations will generally fail to produce a consistent estimator of natural direct and indirect effects when model ℳa is incorrect whereas, by using the proposed multiply robust estimator valid inferences can be recovered under the union model bc, even if a fails.

A notable improvement on the system of structural equations approach is the double robust estimator of a natural direct effect due to van der Laan and Petersen (2005). Their estimator solves the estimating equation constructed using an empirical version of SNDE,singletoneff,ac(θ0,δ0) given in the online appendix. They show their estimator remains CAN in the larger submodel ac and therefore, they can recover valid inferences even when the outcome model is incorrect, provided both the exposure and mediator models are correct: Unfortunately, the van der Laan estimator is still not entirely satisfactory because unlike the proposed multiply robust estimator, it requires that the model for the mediator density is correct. Nonetheless, if the mediator model is correct, the authors establish that their estimator achieves the efficiency bound for model ac at the intersection submodel ac where all models are correct; and thus it is locally semiparametric efficient in ac. Interestingly, as we report in the online supplement, the semiparametric efficiency bounds for models ac and abc are distinct, because the density of the mediator variable is not ancillary for inferences about the M-functional. Thus, any restriction placed on the mediator’s conditional density can, when correct, produce improvements in efficiency. This is in stark contrast with the role played by the density of the exposure variable, which as in the estimation of the marginal causal effect, remains ancillary for inferences about the M-functional and thus the efficiency bound for the latter is unaltered by any additional information on the former (Robins et al 1994). In the online appendix, we provide a general functional map that relates the efficient influence function for the larger model abc to the efficient influence for the smaller model ac where the model for the mediator is either parametric or semiparametric. Our map is instructive because it makes explicit using simple geometric arguments, the information that is gained from increasing restrictions on the law of the mediator. In the online appendix, we illustrate the map by recovering the efficient influence function of van der Laan in the case of a singleton model (i.e. a known conditional density) for the mediator and in the case of a parametric model for the mediator.

8 A semiparametric sensitivity analysis

We describe a semiparametric sensitivity analysis framework to assess the extent to which a violation of the ignorability assumption for the mediator might alter inferences about natural direct and indirect effects. Although only results for the natural direct effect are given here, the extension for the indirect effect is easily deduced from the presentation. Let

t(e,m,x)=E[Y1,m|E=e,M=m,X=x]E[Y1,m|E=e,Mm,X=x],

then

Ye,mM|E=e,X,

i.e. a violation of the ignorability assumption for the mediator variable, generally implies that

t(e,m,x)0for some(e,m,x).

Thus, we proceed as in Robins, Rotnitzky and Scharfstein (1999), and propose to recover inferences by assuming the selection bias function t (e; m; x) is known, which encodes the magnitude and direction of the unmeasured confounding for the mediator. In the following, S is assumed to be finite. To motivate the proposed approach, suppose for the moment that fM|E,X (M|E,X) is known, then under the assumption that the exposure is ignorable given X, we show in the appendix that:

E[Y1,m|M0=m,X=x]=E[Y1,m|E=0,M=m,X=x]=E[Y|E=1,M=m,X=x]t(1,m,x)(1fM|E,X(m|E=1,X=x))+t(0,m,x)(1fM|E,X(m|E=0,X=x)),

and therefore the M-functional is identified by:

mSE{E[Y|E=1,M=m,X]t(1,m,X)(1fM|E,X(m|E=1,X))+t(0,m,X)(1fM|E,X(m|E=0,X))}fM|E,X(m|E=0,X), (5)

which is equivalently represented as:

E[I{E=1}fM|E,X(M|E=0,X)fE|X(1|X)fM|E,X(M|E=1,X)×{Yt(1,M,X)(1fM|E,X(m|E=1,X))+t(0,M,X)(1fM|E,X(M|E=0,X))}]. (6)

Below, these two equivalent representations (5) and (6) are carefully combined to obtain a double robust estimator of the M-functional assuming t (·,·,·) is known. A sensitivity analysis is then obtained by repeating this process and reporting inferences for each choice of t (·,·,·) in a finite set of user–specified functions T={tλ(,,):λ} indexed by a finite dimensional parameter λ with t0(,,)T corresponding to the no unmeasured confounding assumption, i.e. t0 (·,·,·) ≡ 0. Throughout, the model fM|E,Xpar(|E,X;βm) for the probability mass function of M is assumed to be correct. Thus, to implement the sensitivity analysis, we develop a semiparametric estimator of the natural direct effect in the union model ac, assuming t (·,·,·) =tλ* (·,·,·) for a fixed λ*. The proposed doubly robust estimator of the natural direct effect is then given by θ^0doubly(λ)δ^0doubly where θ^0doubly is as previously described, and

θ^0doubly(λ)=Pn[I{E=1}f^M|E,Xpar(M|E=0,X )f^E|Xpar(1|X)f^M|E,Xpar(M|E=1,X ){YE^par(Y|X,M,E=1)]}+η~par(1,0,X;λ)],

with

η~par(1,0,X;λ)=mS{E^par(Y|X,M=m,E=1)+tλ(0,m,X)(1f^M|E,Xpar(m|E=0,X))tλ(1,m,M)(1f^M|E,Xpar(m|E=1,X))}f^M|E,Xpar(m|E=0,X).

Our sensitivity analysis then entails reporting the set {θ^0doubly(λ)δ^0doubly:λ} (and the associated confidence intervals) which summarizes how sensitive inferences are to a deviation from the ignorability assumption λ=0. A theoretical justification for the approach is given by the following formal result which is proved in the supplemental appendix

Theorem 4

Suppose t (·,·,·) =tλ*(·,·,·), then under the consistency, positivity assumptions, and the ignorability assumption for the exposure, θ^0doubly(λ)δ^0doubly is a CAN estimator of the natural direct effect in ac.

The influence function of θ^0doubly(λ) is provided in the appendix, and can be used to construct a corresponding confidence interval.

It is important to note that the sensitivity analysis technique presented here differs in crucial ways from previous techniques developed by Hafeman (2008), VanderWeele (2010) and Imai et al (2010a). First, the methodology of Vanderweele (2010) postulates the existence of an unmeasured confounder U (possibly vector valued) which when included in X recovers the sequential ignorability assumption. The sensitivity analysis then requires specification of a sensitivity parameter encoding the effect of the unmeasured confounder on the outcome within levels of (E, X, M), and another parameter for the effect of the exposure on the density of the unmeasured confounder given (X, M). This is a daunting task which renders the approach generally impractical, except perhaps in the simple setting where it is reasonable to postulate a single binary confounder is unobserved, and one is willing to make further simplifying assumptions about the required sensitivity parameters (VanderWeele, 2010). In comparison, the proposed approach circumvents this difficulty by concisely encoding a violation of the ignorability assumption for the mediator through the selection bias function tλ (e, m, x). Thus the approach makes no reference and thus is agnostic about the existence, dimension, and nature of unmeasured confounders U: Furthermore, in our proposal, the ignorability violation can arise due to an unmeasured confounder of the mediator-outcome relationship that is also an effect of the exposure variable, a setting not handled by the technique of VanderWeele (2010). The method of Hafeman (2008) which is restricted to binary data, shares some of the limitations given above. Finally, in contrast with our proposed double robust approach, a coherent implementation of the sensitivity analysis techniques of Imai et al (2010a, 2010b) and VanderWeele (2010) both rely on correct specification of all posited models. We refer the reader to VanderWeele (2010) for further discussion of Hafeman (2008) and Imai et al (2010a).

9 Discussion

The main contribution of the current paper is a theoretically rigorous yet practically relevant semiparametric framework for making inferences about natural direct and indirect causal effects in the presence of a large number of confounding factors. Semiparametric efficiency bounds are given for the nonparametric model, and multiply robust locally efficient estimators are developed that can be used when nonparametric estimation is not possible.

Although the paper focuses on a binary exposure, we note that the extension to a polytomous exposure is trivial. In future work, we shall extend our results for marginal effects by considering conditional natural direct and indirect effects given a subset of pre-exposure variables. These models are particular important in making inferences about so-called moderated mediation effects, a topic of growing interest particularly in the field of psychology(Preacher, Rucker and Hayes, 2007). In related work, we have recently extended our results to a survival analysis setting (Tchetgen Tchetgen, 2011).

A major limitation of the current paper is that it assumes that the mediator is measured without error, an assumption that may be unrealistic in practice; and if incorrect may result in biased inferences about mediated effects. We note that much of the recent literature on causal mediation analysis makes a similar assumption. In future work, it will be important to build on the results derived in the current paper to appropriately account for a mis-measured mediator.

Acknowledgments

The authors would like to acknowledge Andrea Rotnitzky who provided invaluable comments that improved the presentation of the results given in Section 7 of the manuscript. The authors also thank James Robins and Tyler VanderWeele for useful comments that significantly improved the presentation of this article.

APPENDIX

PROOF OF THEOREM 1

Let FO;t =FY|M,X,E;t FM|E,X;t FE|X;t FX;t denote a one dimensional regular parametric submodel of nonpar, with FO;0 = FO, and let

θt=θ0(FO;t)=S×XEt(Y|E=1,M=m,X=x)fM|E,X;t(m|E=0,X=x)fX;t(x)dμ(m,x)

The efficient influence function Sθ0eff,nonpar(θ0) is the unique random variable to satisfy the following equation

t=0θt=E{Sθ0eff,nonpar(θ0)U}

for U the score of FO;t at t = 0; and ∇t=0 denoting differentiation wrt t at t = 0: We observe that

θtt|t=0=S×Xt=0Et(Y|E=1,M=m,X=x)fM|E,X(m|E=0,X=x)fX(x)dμ(m,x)+S×XE(Y|E=1,M=m,X=x)t=0fM|E,X(m|E=0,X=x)fX(x)dμ(m,x)+S×XE(Y|E=1,M=m,X=x)fM|E,X(m|E=0,X=x)t=0fX;t(x)dμ(m,x)

Consider the first term, it is straightforward to verify that:

S×Xt=0Et(Y|E=1,M=m,X=x)fM|E,X(m|E=0,X=x)fX(x)dμ(m,x)=E[UI(E=1)fE|X(E|X){YE(Y|E,M=m,X=x)}fM|E,X(M|E=0,X)fM|E,X(M|E=1,X)]

Similarly, one can easily verify that

S×XE(Y|E=1,M=m,X=x)t=0fM|E,X;t(m|E=0,X=x)fX(x)dμ(m,x)=E[UI(E=0)fE|X(E|X){E(Y|E=1,M=m,X=x)η(1,0,X)}]

and finally, one can also verify that

S×XE(Y|E=1,M=m,X=x)fM|E,X(m|E=0,X=x)t=0fX;t(x)dμ(m,x)=E[U{η(1,0,X)θ0}]

Thus, we obtain

t=0θt=E{Sθ0eff,nonpar(θ0)U}

Given Sδeeff,nonpar(δe) the results for the direct and indirect effect follow from the fact that the influence function of a difference of two functionals equals the difference of the respective influence functions. Because the model is nonparametric, there is a unique influence function for each functional, and it is efficient in the model leading to the efficiency bound results.

PROOF OF THEOREM 2

We begin by showing that

E{Sθ0eff,nonpar(θ0;βm,βe,βy)}=0 (7)

under model union. First note that (βy,βm)=(βy,βm) under model a Equality (7) now follows because Epar(Y|X,M,E=1;βy)=E(Y|X,M,E=1) and η(1,0,X;βy,βm)=E[{Epar(Y|X,M,E=1;βy)}|E=0,X]=η(1,0,X)

E{Sθ0eff,nonpar(θ0;βm,βe,βy)}=E[I{E=1}fM|E,Xpar(M|E=0,X;βm)fE|Xpar(1|X;βe)fM|E,Xpar(M|E=1,X;βm)E{YEpar(Y|X,M,E=1;βy)]|E=1,M,X}=0]+E[I(E=0)fE|Xpar(1|X;βe)E[{Epar(Y|X,M,E=1;βy)η(1,0,X;βy,βm)}|E=0,X]=0]+E[η(1,0,X;βy,βm)]θ0=0

Second, (βy,βe)=(βy,βe) under model b Equality (7) now follows because Epar(Y|X,M,E=1;βy)=E(Y|X,M,E=1) and fE|Xpar(1|X;βe)=fE|X(1|X):

E{Sθ0eff,nonpar(θ0;βm,βe,βy)}=E[I{E=1}fM|E,Xpar(M|E=0,X;βm)fE|Xpar(1|X;βe)fM|E,Xpar|(M|E=1,X;βm)E{YEpar(Y|X,M,E=1;βy)]|E=1,M,X}=0]+E[I(E=0)fE|Xpar(1|X;βe)E[{Epar(Y|X,M,E=1;βy)η(1,0,X;βy,βm)}|E=0,X]=0]+E[η(1,0,X;βy,βm)]θ0=E[E[{Epar(Y|X,M,E=1;βy)}|E=0,X]]θ0=0

Third, equality (7) holds under model c because

E{Sθ0eff,nonpar(θ0;βm,βe,βy)}=E[I{E=1}fM|E,Xpar(M|E=0,X;βm)fE|Xpar(1|X;βe)fM|E,Xpar|(M|E=1,X;βm)E{YEpar(Y|X,M,E=1;βy)]}]+E[I(E=0)fE|Xpar(1|X;βe)E[{Epar(Y|X,M,E=1;βy)η(1,0,X;βy,βm)}|E=0,X]]+E[η(1,0,X;βy,βm)]θ0=E[E[{E(Y|X,M,E=1)}|E=0,X]]E[E[Epar(Y|X,M,E=1;βy)|E=0,X]]+E[E[Epar(Y|X,M,E=1;βy)|E=0,X]]E[η(1,0,X;βy,βm)]+E[η(1,0,X;βy,βm)]θ0=E[E[{E(Y|X,M,E=1)}|E=0,X]]θ0

Assuming that the regularity conditions of Theorem 1A in Robins, Mark and Newey (1992) hold for Sθ0eff,nonpar(θ0;βm,βe,βy),Sβ(β); the expression for Sθ0union(θ0,β) follows by standard Taylor expansion arguments and it now follows that

n(θ^0triplyθ0)=1n1/2i=1nSθ0,iunion(θ0,β)+op(1) (8)

The asymptotic distribution of n(θ^θ0triplyθ0) under model ℳunion follows from the previous equation by Slutsky’s Theorem and the Central Limit Theorem.

We note that δ^edoubly is CAN in the union model ℳunion since it is CAN in the larger model where either the density for the exposure is correct, or the density of the mediator and the outcome regression are both correct and thus η(e,e,X;βy,βm)=E(Y|X,E=e). This gives the multiply robust result for direct and indirect effects. The asymptotic distribution of direct and indirect effect estimates then follow from similar arguments as above.

At the intersection submodel

E{Sθ0eff,nonpar(θ0,β)}βT=0

hence

Sθ0union(θ0,β)=Sθ0eff,nonpar(θ0,β).

The semiparametric efficiency claim then follows for θ^0triply and a similar argument gives the result for direct and indirect effects.

PROOF OF THEOREMS 3 & 4

The proofs are given in the online appendix.

Footnotes

AMS 1991 Subject Classifications. Primary: 62G05.

References

  1. Avin C, Shpitser I, Pearl J. Identifiability of path-specific effects. IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; Edinburgh, Scotland, UK. July 30–August 5, 2005; 2005. pp. 357–363. [Google Scholar]
  2. Bang H, Robins J. Doubly robust estimation in Missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]
  3. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
  4. Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semi-parametric Models. Springer; New York: 1993. [Google Scholar]
  5. Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:732–734. doi: 10.1093/biomet/asp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Goetgeluk S, Vansteelandt S, Goetghebeur E. Estimation of controlled direct effects. Journal of the Royal Statistical Society – Series B. 2008;70:1049–1066. [Google Scholar]
  7. Hafeman D. Opening the Black Box: A Reassessment of Mediation from a Counterfactual Perspective[dissertation] New York: Columbia University; 2008. [Google Scholar]
  8. Hafeman D, VanderWeele T. Alternative assumptions for the identification of direct and indirect effects. Epidemiology. 2009 doi: 10.1097/EDE.0b013e3181c311b2. In press. [DOI] [PubMed] [Google Scholar]
  9. Hahn J. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica. 1998;66:315–331. [Google Scholar]
  10. van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer Verlag; New York: 2003. [Google Scholar]
  11. van der Laan M, Petersen M. Direct Effect Models. (Working Paper 187).U.C Berkeley Division of Biostatistics Working Paper Series. 2005 http://www.bepress.com/ucbbiostat/paper187.
  12. Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science. 2010a;25:51–71. [Google Scholar]
  13. Imai K, Keele L, Tingley D. A General Approach to Causal Mediation Analysis. Psychological Methods. 2010b Dec;15(4):309–334. doi: 10.1037/a0020761. (lead article) [DOI] [PubMed] [Google Scholar]
  14. Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statist Sci. 2007;22:523–39. doi: 10.1214/07-STS227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Pearl J. Direct and indirect effects. Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01); San Francisco, CA. Morgan Kaufmann; 2001. pp. 411–42. [Google Scholar]
  16. Pearl J. The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models. Technical report. 2011 http://ftp.cs.ucla.edu/pub/statser/r379.pdf.
  17. Preacher KJ, Rucker DD, Hayes AF. Assessing moderated mediation hypotheses: Strategies, methods, and prescriptions. Multivariate Behavioral Research. 2007;42:185–227. doi: 10.1080/00273170701341316. [DOI] [PubMed] [Google Scholar]
  18. Robins JM, Greenl S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
  19. Robins JM, Mark SD, Newey WK. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed] [Google Scholar]
  20. Robins JM, Rotnitzky A, Scharfstein D. Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Vol. 116. NY: Springer-Verlag; 1999. pp. 1–92. IMA. [Google Scholar]
  21. Robins J. Semantics of causal DAG models and the identification of direct and indirect effects. In: Green P, Hjort N, Richardson S, editors. Highly Structured Stochastic Systems. Oxford, UK: Oxford University Press; 2003. pp. 70–81. [Google Scholar]
  22. Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer”. Statistica Sinica. 2001;11(4):920–936. [Google Scholar]
  23. Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science. 2000;1999:6–10. [Google Scholar]
  24. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
  25. Robins JM, Sued M, Lei-Gomez Q, Rotnitsky A. Comment: Performance of double-robust estimators when “Inverse Probability” weights are highly variable. Statistical Science. 2007;22(4):544–559. [Google Scholar]
  26. Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. In: Shrout P, editor. To appear in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures. Oxford University Press; 2010. [Google Scholar]
  27. Scharfstein DO, Rotnitzky A, Robins JM. Rejoinder to comments on “Adjusting for non-ignorable drop-out using semiparametric non-response models”. Journal of the American Statistical Association. 1999;94:1096–1120. [Google Scholar]
  28. Tchetgen Tchetgen EJ. On Causal Mediation Analysis with a Survival Outcome. The International Journal of Biostatistics. 2011;7(1) doi: 10.2202/1557-4679.1351. Article 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Tsiatis AA. Semiparametric Theory and Missing Data. Springer, Verlag; New York: 2006. [Google Scholar]
  30. VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20:18–26. doi: 10.1097/EDE.0b013e31818f69ce. [DOI] [PubMed] [Google Scholar]
  31. VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome - with discussion. American Journal of Epidemiology. 2010;172:1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–551. doi: 10.1097/EDE.0b013e3181df191c. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES