Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis

Eric J Tchetgen Tchetgen; Ilya Shpitser

doi:10.1214/12-AOS990

. Author manuscript; available in PMC: 2016 Jan 12.

Published in final edited form as: Ann Stat. 2012 Jun;40(3):1816–1845. doi: 10.1214/12-AOS990

Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis

Eric J Tchetgen Tchetgen ^#,^†,^*, Ilya Shpitser ^†

PMCID: PMC4710381 NIHMSID: NIHMS746449 PMID: 26770002

Abstract

Whilst estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure not through or through a mediator variable that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies, that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.

Key Words and Phrases: Natural direct effects, Natural indirect effects, double robust, mediation analysis, local efficiency

1 Introduction

The evaluation of the total causal effect of a given point exposure, treatment or intervention on an outcome of interest is arguably the most common objective of experimental and observational studies in the fields of epidemiology, biostatistics and in the social sciences. However, in recent years, investigators in these various fields have become increasingly interested in making inferences about the direct or indirect pathways of the exposure effect not through or through a mediator variable that occurs subsequently to the exposure and prior to the outcome. Recently, the counterfactual language of causal inference has proven particularly useful for formalizing mediation analysis. Indeed, causal inference offers a formal mathematical framework for defining varieties of direct and indirect effects, and for establishing necessary and sufficient identifying conditions of these effects. A notable contribution of causal inference to the literature on mediation analysis is the key distinction drawn between so-called controlled direct and indirect effects versus natural direct and indirect effects. In words, the controlled direct effect refers to the exposure effect that arises upon intervening to set the mediator to a fixed level that may differ from its actual observed value (Robins and Greenland, 1992, Pearl, 2001, Robins, 2003). In contrast, the natural (also known as pure) direct effect captures the effect of the exposure when one intervenes to set the mediator to the (random) level it would have been in the absence of exposure (Robins and Greenland, 1992, Pearl 2001). The controlled direct effect combines with the controlled indirect effect to produce the joint effect of the exposure and the mediator, whereas, the natural direct and indirect effects combine to produce the exposure total effect. As noted by Pearl (2001), controlled direct and indirect effects are particularly relevant for policy making whereas natural direct and indirect effects are more useful for understanding the underlying mechanism by which the exposure operates.

To formally define natural direct and indirect effects first requires defining counterfactuals. We assume that for each level of a binary exposure E, and of a mediator variable M, there exist a counterfactual variable Y_e,m corresponding to the outcome Y had possibly contrary to fact the exposure and mediator variables taken the value (e, m). Similarly, for E = e, we assume there exist a counterfactual variable M_e corresponding to the mediator variable had possibly contrary to fact the exposure variable taken the value e. The current paper concerns the decomposition of the total effect of E on Y in terms of natural direct and natural indirect effects, which expressed on the mean difference scale, is given by:

\begin{array}{l} \overset{total effect}{\overset{︷}{E (Y_{e = 1} - Y_{e = 0})}} = E (Y_{e = 1, M_{e = 1}} - Y_{e = 0, M_{e = 0}}) \\ = \overset{natural indirect effect}{\overset{︷}{E (Y_{e = 1, M_{e = 1}} - Y_{e = 1, M_{e = 0}})}} + \overset{natural direct effect}{\overset{︷}{E (Y_{e = 1, M_{e = 0}} - Y_{e = 0, M_{e = 0}})}} . \end{array}

(1)

where $E$ stands for expectation.

In an effort to account for confounding bias when estimating causal effects, such as the average total effect (1) from non-experimental data, investigators routinely collect and adjust for in data analysis, a large number of confounding factors. Because of the curse of dimensionality, nonparametric methods of estimation are typically not practical in such settings, and one usually resorts to one of two dimension-reduction strategies; either one relies on a model for the outcome given exposure and counfounders, or alternately one relies on a model for the exposure, i.e. the propensity score. Recently, powerful semiparametric methods have been developed to analyze observational studies, that produce so-called double robust and highly efficient estimates of the exposure total causal effect (Robins, 1999, Scharfstein, Rotnitzky and Robins, 1999, Bang and Robins, 2005, Tsiatis, 2006) and similar methods have also been developed to estimate controlled direct and indirect effects (Goetgeluk, Vansteelandt and Goetghebeur, 2008). An important advantage of a double robust method is that it carefully combines both of the aforementioned dimension reduction strategies for confounding adjustment, to produce an estimator of the causal effect that remains consistent and asymptotically normal provided at least one of the two strategies is correct, without necessarily knowing which strategy is indeed correct (van der Laan and Robins, 2003). Unfortunately, similar methods for making semiparametric inferences about marginal natural direct and indirect effects are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about marginal natural direct and indirect effects on the mean of an outcome, while appropriately accounting for a large number of confounding factors for the exposure and the mediator variables.

Our semiparametric framework is particularly appealing, as it gives new insight on issues of efficiency and robustness in the context of mediation analysis. Specifically, in Section 2, we adopt the sequential ignorability assumption of Imai et al (2010) under which, in conjunction with the standard consistency and positivity assumptions, we derive the efficient influence function and thus obtain the semiparametric efficiency bound for the natural direct and natural indirect marginal mean causal effects, in the nonparametric model ℳ_nonpar in which the observed data likelihood is left unrestricted. We further show that in order to conduct mediation inferences in ℳ_nonpar, one must estimate at least a subset of the following quantities:

the conditional expectation of the outcome given the mediator, exposure and confounding factors;
the density of the mediator given the exposure and the confounders;
the density of the exposure given the confounders.

Ideally, to minimize the possibility of modeling bias, one may wish to estimate each of these quantities nonparametrically; however, as previously argued, when as we assume throughout, we wish to account for numerous confounders, such nonparametric estimates will likely perform poorly infinite samples. Thus, in Section 2.3 we develop an alternative multiply robust strategy. To do so, we propose to model (i), (ii) and (iii) parametrically (or semiparametrically), but rather than obtaining mediation inferences that rely on the correct specification of a specific subset of these models, instead we carefully combine these three models to produce estimators of the marginal mean direct and indirect effects that remain consistent and asymptotically normal (CAN) in a union model where at least one but not necessarily all of the following conditions hold:

the parametric or semi-parametric models for the conditional expectation of the outcome (i) and for the conditional density of the mediator (ii) are correctly specified;
the parametric or semiparametric models for the conditional expectation of the outcome (i) and for the conditional density of the exposure (iii) are correctly specified;
the parametric or semiparametric models for the conditional densities of the exposure and the mediator (ii) and (iii) are correctly specified.

Accordingly, we define submodels ℳ_a, ℳ_b and ℳ_c of ℳ_nonpar corresponding to models (a), (b) and (c) respectively. Thus, the proposed approach is triply robust as it produces valid inferences about natural direct and indirect effects in the union model ℳ_union = ℳ_a∪ℳ_b∪ℳ_c. Furthermore, as we later show in Section 2.3, proposed estimators also locally semiparametric efficient in the sense that they achieve the respective efficiency bounds for estimating the natural direct and indirect effects in ℳ_union, at the intersection submodel ℳ_a∩ℳ_b∩ℳ_c = ℳ_a∩ℳ_c = ℳ_a∩ℳ_b = ℳ_b∩ℳ_c⊂ℳ_union⊂ℳ_nonpar.

Section 3 summarizes a simulation study illustrating the finite sample performance of the various estimators described in Section 2, and Section 4 gives a real data application of these methods. Section 5 describes a strategy to improve the stability of the proposed multiply robust estimator which directly depends on inverse exposure and mediator density weights, when such weights are highly variable, and Section 6 demonstrates the favorable performance of two modified multiply robust estimators in the context of such highly variable weights. In Section 7, we compare the proposed methodology to the prevailing estimators in the literature. Based on this comparison, we conclude that the new approach should generally be preferred because an inference under the proposed method is guaranteed to remain valid under many more data generating laws than an inference based on each of the other existing approaches. In particular, as we argue below the approach of van der Laan and Petersen (2005) is not entirely satisfactory because, despite producing a CAN estimator of the marginal direct effect under the union model ℳ_a∪ℳ_c (and therefore an estimator that is double robust), their estimator requires a correct model for the density of the mediator. Thus unlike the direct effect estimator developed in this paper, the van der Laan estimator fails to be consistent under the submodel ℳ_b⊂ℳ_union. Nonetheless, the estimator of van der Laan is in fact locally efficient in model ℳ_a∪ℳ_c, provided the model for the mediator’s conditional density is either known, or can be efficiently estimated. This property is confirmed in a supplementary online appendix, where we also provide a general map that relates the efficient influence function for model ℳ_union to the corresponding influence function for model ℳ_a∪ℳ_c assuming an arbitrary parametric or semiparametric model for the mediator conditional density is correctly specified. In Section 8, we describe a novel double robust sensitivity analysis framework to assess the impact on inferences about the natural direct effect, of a departure from the ignorability assumption of the mediator variable. We conclude with a brief discussion.

2 The nonparametric mediation functional

2.1 Identification

Suppose i.i.d data on O = (Y, E, M, X) is collected for n subjects. Recall that Y is an outcome of interest, E is a binary exposure variable, M is a mediator variable with support $S$ , known to occur subsequently to E and prior to Y, and X is a vector of pre-exposure variables with support $X$ that confound the association between (E, M) and Y. The overarching goal of this paper is to provide some theory of inference about the fundamental functional of mediation analysis which Judea Pearl calls “the mediation causal formula” (Pearl, 2010) and which expressed on the mean scale, is:

θ_{0} = \iint_{S \times X} E (Y | E = 1, M = m, X = x) f_{M | E, X} (m | E = 0, X = x) f_{X} (x) d μ (m, x),

(2)

f_M_|_E,X and f_X are respectively the conditional density of the mediator M given (E, X) and the density of X, and μ is a dominating measure for the distribution of (M, X). Hereafter, to keep with standard statistical parlance, we shall simply refer to θ₀ as the “mediation functional” or “M-functional” since it is formally a functional on the nonparametric statistical model ℳ_nonpar = {F_O (·): F_O unrestricted} of all regular laws F_O of the observed data O that satisfy the positivity assumption given below; i.e. θ₀ = θ₀ (F_O): ℳ_nonpar →ℛ, with ℛ the real line. The functional θ₀ is of keen interest here because it arises in the estimation of natural direct and indirect effects which we describe next. To do so, we make the consistency assumption:

Consistency

\begin{array}{l} if E = e, than M_{e} = M w . p . 1, \\ and if E = e and M = m then Y_{e, m} = Y w . p . 1. \end{array}

In addition, we adopt the sequential ignorability assumption of Imai et al (2010) which states that for e, e′ ∈ {0, 1}:

Sequential ignorability

\begin{array}{l} {Y_{e', m}, M_{e}} ⊥ ⊥ E | X, \\ Y_{e', m} ⊥ ⊥ M | E = e, X, \end{array}

where A ⫫ B|C states that A is independent of B given C; paired with the following:

positivity

\begin{array}{l} f_{M | E, X} (m | E, X) > 0 w . p . 1 for each m \in S, \\ and f_{E | X} (e | X) > 0 w . p . 1 for each e \in {0, 1} . \end{array}

Then, under the consistency, sequential ignorability and positivity assumptions, Imai et al (2010) showed that:

\begin{array}{l} θ_{0} = E (Y_{1, M_{0}}), and \\ δ_{e} \equiv \int_{X} E (Y | E = e, X = x) f_{X} (x) d μ (x) \\ = \iint_{S \times X} E (Y | E = e, M = m, X = x) f_{M | E, X} (m | E = e, X = x) f_{X} (x) d μ (m, x) \\ = E (Y_{e}) = E (Y_{e, M_{e}}), e = 0, 1, \end{array}

(3)

so that $E (Y_{1, M_{0}})$ and $E (Y_{e})$ , e = 0, 1, are identified from the observed data, and so is the mean natural direct effect $E (Y_{1, M_{0}}) - E (Y_{0}) = θ_{0} - δ_{0}$ and the mean natural indirect effect $E (Y_{1}) - E (Y_{1, M_{0}}) = δ_{1} - θ_{0}$ . For binary Y, one might alternatively consider the natural direct effect on the risk ratio scale $E (Y_{1, M_{0}}) / E (Y_{0}) = θ_{0} / δ_{0}$ or on the odds ratio scale ${E (Y_{1, M_{0}}) E (1 - Y_{0})} / {E (1 - Y_{1, M_{0}}) E (Y_{0})} = {θ_{0} (1 - δ_{0})} / {δ_{0} (1 - θ_{0})}$ and similarly defined natural indirect effects on the risk ratio and odds ratio scales. It is instructive to contrast the expression (2) for $E (Y_{1, M_{0}})$ with the expression (3) for e = 1 corresponding to $E (Y_{1})$ , and to note that the two expressions bare a striking resemblance except the density of the mediator in the first expression conditions on the unexposed (with E = 0) whereas in the second expression, the mediator density is conditional on the exposed (with E = 1). As we demonstrate below, this subtle difference has remarkable implications for inference.

Pearl (2001) was the first to derive the M-functional $θ_{0} = E (Y_{1, M_{0}})$ under a different set of assumptions. Others have since contributed alternative sets of identifying assumptions. In this paper, we have chosen to work under the sequential ignorability assumption of Imai et al(2010a,b) but note that alternative related assumptions exist in the literature (Robins and Greenland, 1992, Pearl, 2001, Petersen and van der Laan, 2005, Hafeman and Vanderweele, 2010). Although, we note that Robins and Richardson (2010) disagree with the label “sequential ignorability” because its terminology has previously carried a different interpretation in the literature. Nonetheless, the assumption entails two ignorability-like assumptions that are made sequentially. First, given the observed pre-exposure confounders, the exposure assignment is assumed to be ignorable, that is, statistically independent of potential outcomes and potential mediators. The second part of the assumption states that the mediator is ignorable given the observed exposure and pre-exposure confounders. Specifically, the second part of the sequential ignorability assumption is made conditional on the observed value of the ignorable treatment and the observed pretreatment confounders. We note that the second part of the sequential ignorability assumption is particularly strong and must be made with care. This is partly because, it is always possible that there might be unobserved variables that confound the relationship between the outcome and the mediator variables even upon conditioning on the observed exposure and covariates. Furthermore, the confounders X must all be pre-exposure variables, i.e. they must precede E. In fact, Avin et al (2005) proved that without additional assumptions, one cannot identify natural direct and indirect effects if there are confounding variables that are affected by the exposure even if such variables are observed by the investigator. This implies that similar to the ignorability of the exposure in observational studies, ignorability of the mediator cannot be established with certainty even after collecting as many pre-exposure confounders as possible. Furthermore, as Robins and Richardson (2010) point out, whereas the first part of the sequential ignorability assumption could in principle be enforced in a randomized study, by randomizing E within levels of X; the second part of the sequential ignorability assumption cannot similarly be enforced experimentally, even by randomization. And thus for this latter assumption to hold, one must entirely rely on expert knowledge about the mechanism under study. For this reason, it will be crucial in practice to supplement mediation analyses with a sensitivity analysis that accurately quantifies the degree to which results are robust to a potential violation of the sequential ignorability assumption. Later in the paper, we develop a set of sensitivity analyses that will allow the analyst to quantify the degree to which his or her mediation analysis results are robust to a potential violation of the sequential ignorability assumption.

2.2 Semiparametric efficiency bounds for ℳ_nonpar

In this section, we derive the efficient influence function for the M-functional θ₀ in ℳ_nonpar, this result is then combined with the efficient influence function for the functional δ_e (Robins, Rotnitzky and Zhao, 1994, Hahn, 1998) to obtain the efficient influence function for the natural direct and indirect effects, on the mean difference scale. Thus, in the following, we shall use the efficient influence function $S_{δ e}^{eff, nonpar} (δ_{e})$ of δ_e which is well known to be:

\frac{I (E = e)}{f_{E | X} (e | X)} {Y - η (e, e, X)} + η (e, e, X) + δ_{e},

where for e, e* ∈ {0, 1}, we define

η (e, e^{*}, X) = \int_{S} E (Y | X, M = m, E = e) f_{M | E, X} (m | E = e^{*}, X) d μ (m),

so that $η (e, e, X) = E (Y | X, E = e)$ , e = 0, 1.

The following theorem is proved in the appendix

Theorem 1

Under the consistency, sequential ignorability and positivity assumptions, the efficient influence function of the M-functional θ₀ in model ℳ_nonpar is given by $S_{θ_{0}}^{eff, nonpar} (θ_{0}) =$

\begin{array}{l} S_{θ_{0}}^{eff, nonpar} (O; θ_{0}) = \frac{I {E = 1} f_{M | E, X} (M | E = 0, X)}{f_{E | X} (1 | X) f_{M | E, X} (M | E = 1, X)} {Y - E (Y | X, M, E = 1)]} \\ + \frac{I (E = 0)}{f_{E | X} (0 | X)} {E (Y | X, M, E = 1) - η (1, 0, X)} + η (1, 0, X) - θ_{0}, \end{array}

and the efficient influence function of the natural direct and indirect effects on the mean difference scale in model ℳ_nonpar are respectively given by $S_{NDE}^{eff, nonpar} (θ_{0}, δ_{0}) = S_{NDE}^{eff, nonpar} (O; θ_{0}, δ_{0}) =$

\begin{array}{l} S_{θ_{0}}^{eff, nonpar} (θ_{0}) - S_{δ_{0}}^{eff, nonpar} (δ_{0}) \\ = \frac{I {E = 1} f_{M | E, X} (M | E = 0, X)}{f_{E | X} (1 | X) f_{M | E, X} (M | E = 1, X)} {Y - E (Y | X, M, E = 1)]} \\ + \frac{I (E = 0)}{f_{E | X} (0 | X)} {E (Y | X, M, E = 1) - Y - η (1, 0, X) + η (0, 0, X)} \\ + η (1, 0, X) - η (0, 0, X) - θ_{0} + δ_{0}, \end{array}

and $S_{NIE}^{eff, nonpar} (δ_{1}, θ_{0}) = s_{NIE}^{eff, nonpar} (O; δ_{1}, θ_{0}) =$

\begin{array}{l} S_{θ_{0}}^{eff, nonpar} (θ_{0}) - S_{δ_{1}}^{eff, nonpar} (δ_{1}) \\ = \frac{I (E = 1)}{f_{E | X} (1 | X)} {Y - η (1, 1, X) - \frac{f_{M | E, X} (M | E = 0, X)}{f_{M | E, X} (M | E = 1, X)} {Y - E (Y | X, M, E = 1)]}} \\ - \frac{I (E = 0)}{f_{E | X} (0 | X)} {E (Y | X, M, E = 1) - η (1, 0, X) + η (1, 1, X) - η (1, 0, X) + θ_{0} - δ_{1} . \end{array}

Thus, the semiparametric efficiency bound for estimating the natural direct and the natural indirect effects in ℳ_nonpar are respectively given by $E {S_{NDE}^{eff, nonpar} {(θ_{0}, δ_{0})}^{2}}$ and $E {S_{NIE}^{eff, nonpar} {(δ_{1}, θ_{0})}^{2}}$ .

Although not presented here, Theorem 1 is easily extended to obtain the efficient influence functions and the respective semiparametric efficiency bounds for the direct and indirect effects on the risk ratio and the odds ratio scales by a straightforward application of the delta method. An important implication of the theorem is that all regular and asymptotically linear (RAL) estimators of θ₀, δ₁−θ₀ and θ₀−δ₀ in model ℳ_nonpar share the common influence functions $S_{θ_{0}}^{eff, nonpar} (θ_{0})$ , $S_{NDE}^{eff, nonpar} (θ_{0}, δ_{0})$ and $S_{NIE}^{eff, nonpar} (δ_{1}, θ_{0})$ respectively. Specifically, any RAL estimator $\hat{θ_{0}}$ of the M-functional θ₀ in model ℳ_nonpar, shares a common asymptotic expansion:

n^{1 / 2} (\hat{θ_{0}} - θ_{0}) = n^{1 / 2} ℙ_{n} S_{θ_{0}}^{eff, nonpar} (θ_{0}) + o_{P} (1),

where $ℙ_{n} [\cdot] = n^{- 1} \sum_{i} {[\cdot]}_{i}$ . To illustrate this property of nonparametric RAL estimators and as a motivation for multiply robust estimation when nonparametric methods are not appropriate, we provide a detailed study of three nonparametric strategies for estimating the M-functional in a simple yet instructive setting in which X and M are both discrete with finite support.

Strategy 1

The first strategy entails obtaining the maximum likelihood estimator upon evaluating the M-functional under the empirical law of the observed data:

{\hat{θ}}_{0}^{y m} = ℙ_{n} \sum_{m \in S} \hat{E} (Y | E = 1, M = m, X) {\hat{f}}_{M | E, X} (m | E = 0, X),

where ${\hat{f}}_{Y | E, M, X}$ and ${\hat{f}}_{M | E, X}$ are the empirical probability mass functions, and $\hat{E} (Y | E = e, M = m, X = x)$ is the expectation of Y under ${\hat{f}}_{Y | E, M, X}$ .

Strategy 2

The second strategy is based on the following alternative representation of the M-functional

\begin{array}{l} \iint_{S \times X} E (Y | E = 1, M = m, X = x) d F_{M | E} (m | E = 0, X = x) d F_{X} (x) \\ = \sum_{e = 0}^{1} \iint_{S \times X} E (Y | E = 1, M = m, X = x) \frac{I (e = 0)}{f_{E | X} (e | X = x)} d F_{M, E, X} (m, e, x) \\ = E {\frac{I (E = 0)}{f_{E | X} (0 | X)} E (Y | E = 1, M, X)} . \end{array}

Thus, our second estimator takes the form:

{\hat{θ}}_{0}^{y e} = ℙ_{n} {\frac{I (E = 0)}{{\hat{f}}_{E | X} (0 | X)} \hat{E} (Y | E = 1, M, X)},

with ${\hat{f}}_{E | X}$ the empirical estimate of the probability mass function f_E_|_X.

Strategy 3

The last strategy is based on a third representation of the M-functional

\begin{array}{l} \iint_{S \times X} E (Y | E = 1, M = m, X = x) d F_{M | E} (m | E = 0, X = x) d F_{X} (x) \\ = \sum_{e = 0}^{1} \underset{Y \times S \times X}{∭} y \frac{I (e = 1)}{f_{E | X} (e | X = x)} \frac{f_{M | E, X} (M | E = 0, X)}{f_{M | E, X} (M | E, X)} d F_{Y, M, E, X} (y, m, e, x) \\ = E {Y \frac{I (E = 1)}{f_{E | X} (E | X)} \frac{f_{M | E, X} (M | E = 0, X)}{f_{M | E, X} (M | E, X)}} . \end{array}

Thus, our third estimator takes the form:

{\hat{θ}}_{0}^{e m} = ℙ_{n} {Y \frac{I (E = 1)}{{\hat{f}}_{E | X} (E | X)} \frac{{\hat{f}}_{M | E, X} (M | E = 0, X)}{{\hat{f}}_{M | E, X} (M | E, X)}} .

At first glance the three estimators ${\hat{θ}}_{0}^{e m}$ , ${\hat{θ}}_{0}^{y e}$ and ${\hat{θ}}_{0}^{y m}$ might appear to be distinct, however, we observe that provided the empirical distribution function ${\hat{F}}_{O} = {\hat{F}}_{Y | E, M, X} \times {\hat{F}}_{M | E, X} \times {\hat{F}}_{E | X} \times {\hat{F}}_{X}$ satisfies the positivity assumption, and thus ${\hat{F}}_{O} \in ℳ_{nonpar}$ , then actually ${\hat{θ}}_{0}^{e m} = {\hat{θ}}_{0}^{y e} = {\hat{θ}}_{0}^{y m} = θ_{0} ({\hat{F}}_{O})$ since the three representations agree on the nonparametric model ℳ_nonpar. Therefore we may conclude that these three estimators are in fact asymptotically efficient in ℳ_nonpar with common influence function $S_{θ_{0}}^{eff, nonpar} (θ_{0})$ . Furthermore, from this observation, one further concludes that (asymptotic) inferences obtained using one of the three representations are identical to inferences using either of the other two representations.

At this juncture, we note that the above equivalence no longer applies when as we have previously argued will likely occur in practice, (M, X) contains 3 or more continuous variables and/or X is too high dimensional for models to be saturated or nonparametric, and thus parametric (or semiparametric) models are specified for dimension reduction. Specifically, for such settings, we observe that three distinct modeling strategies are available. Under the first strategy, the estimator ${\hat{θ}}_{0}^{y m, par}$ is obtained as ${\hat{θ}}_{0}^{y m, par}$ using parametric model estimates ${\hat{E}}^{par} (Y | E, M, X)$ and ${\hat{f}}_{M | E, X}^{par} (m | E, X)$ instead of their nonparametric counterparts; similarly under the second strategy, the estimator ${\hat{θ}}_{0}^{y e, par}$ is obtained as ${\hat{θ}}_{0}^{y e}$ using estimates of parametric models ${\hat{E}}^{par} (Y | E = 1, M = m, X)$ and ${\hat{f}}_{E | X}^{par} (e | X)$ and finally, under the third strategy, ${\hat{θ}}_{0}^{e m, par}$ is obtained as ${\hat{θ}}_{0}^{e m}$ using ${\hat{f}}_{E | X}^{par} (e | X)$ and ${\hat{f}}_{M | E, X}^{par} (m | E, X)$ . Then it follows that ${\hat{θ}}_{0}^{y m, par}$ is CAN under the submodel ℳ_a, but is generally inconsistent if either ${\hat{E}}^{par} (Y | E, M, X)$ or ${\hat{f}}_{M | E, X}^{par} (m | E, X)$ fails to be consistent. Similarly, ${\hat{θ}}_{0}^{y e, par}$ and ${\hat{θ}}_{0}^{e m, par}$ are respectively CAN under the submodels ℳ_b and ℳ_c, but each estimator generally fails to be consistent outside of the corresponding submodel. In the next section, we propose an approach that produces a triply robust estimator by combining the above three strategies so that only one of models ℳ_a, ℳ_b and ℳ_c needs to be valid for consistency of the estimator.

2.3 Triply robust estimation

The proposed triply robust estimator ${\hat{θ}}_{0}^{triply}$ solves

ℙ_{n} {\hat{S}}_{θ_{0}}^{eff, nonpar} ({\hat{θ}}_{0}^{triply}) = 0,

where ${\hat{S}}_{θ_{0}}^{eff, nonpar} (θ)$ is equal to $S_{θ_{0}}^{eff, nonpar} (θ)$ evaluated at ${{\hat{E}}^{par} (Y | E, M, X)$ , ${\hat{f}}_{M | E, X}^{par} (m | E, X)$ , ${\hat{f}}_{E | X}^{par} (e | X)}$ ; that is

\begin{matrix} {\hat{θ}}_{0}^{triply} = ℙ_{n} [\frac{I {E = 1} {\hat{f}}_{M | E, X}^{par} (M | E = 0, X)}{{\hat{f}}_{E | X}^{par} (1 | X) {\hat{f}}_{M | E, X}^{par} (M | E = 1, X)} {Y - {\hat{E}}^{per} (Y | E, M, E = 1)]} \\ + \frac{I (E = 0)}{{\hat{f}}_{E | X}^{par} (0 | X)} {{\hat{E}}^{per} (Y | E, M, E = 1) - {\hat{η}}^{par} (1, 0, X)} + {\hat{η}}^{par} (1, 0, X)], \end{matrix}

(4)

is CAN in model ℳ_union = ℳ_a∪ℳ_b∪ℳ_c, where

{\hat{η}}^{par} (e, e^{*}, X) = \int_{S} {\hat{E}}^{par} (Y | X, M = m, E = e) {\hat{f}}_{M | E, X}^{par} (m | E = e^{*}, X) d μ (m) .

In the next theorem, the estimator in the above display is combined with a doubly robust estimator ${\hat{δ}}_{e}^{doubly}$ of δ_e (see van der Laan and Robins, 2003 or Tsiatis, 2006), to obtain multiply-robust estimators of natural direct and indirect effects, where

{\hat{δ}}_{e}^{doubly} = ℙ_{n} [\frac{I (E = e)}{{\hat{f}}_{E | X}^{par} (e | X)} {Y - {\hat{η}}^{par} (e, e, X)} + {\hat{η}}^{par} (e, e, X)] .

To state the result, we set ${\hat{E}}^{par} (Y | X, M, E) = E^{par} (Y | X, M, E; {\hat{β}}_{y}) = g^{- 1} ({\hat{β}}_{y}^{T} h (X, M, E))$ , where g is a known link function h is a user specified function of (X, M, E) so that $E^{par} (Y | X, M, E; β_{y}) = g^{- 1} (β_{y}^{T} h (X, M, E))$ entails a working regression model for $E (Y | X, M, E)$ and ${\hat{β}}_{y}$ solves the estimating equation

0 = ℙ_{n} [S_{y} (\hat{β_{y}})] = ℙ_{n} [h (X, M, E) (Y - g^{- 1} ({\hat{β}}_{y}^{T} h (X, M, E)))] .

Similarly, we set ${\hat{f}}_{M | E, X}^{par} (m | E, X) = f_{M | E, X}^{par} (m | E, X; {\hat{β}}_{m})$ for $f_{M | E, X}^{par} (m | E, X; β_{m})$ a parametric model for the density of [M|E,X] with ${\hat{β}}_{m}$ solving

0 = ℙ_{n} [S_{m} ({\hat{β}}_{m})] = ℙ_{n} [\frac{\partial}{\partial β_{m}} \log f_{M | E, X}^{par} (M | E, X; {\hat{β}}_{m})],

and we set ${\hat{f}}_{E | X}^{par} (e | X) = f_{E | X}^{par} (e | X; {\hat{β}}_{e})$ for $f_{E | X}^{par} (e | X; β_{e})$ a parametric model for the density of [E|X] with ${\hat{β}}_{e}$ solving

0 = ℙ_{n} [S_{e} ({\hat{β}}_{e})] = ℙ_{n} [\frac{\partial}{\partial β_{e}} \log f_{E | X}^{par} (E | X; {\hat{β}}_{e})] .

Theorem 2

Suppose that the assumptions of Theorem 1 hold, and that the regularity conditions stated in the appendix hold and that β_m, β_e and β_y are variation independent.

Mediation functional: Then, $\sqrt{n} ({\hat{θ}}_{0}^{triply} - θ_{0})$ is RAL under model ℳ_union with influence function
$\begin{array}{l} S_{θ_{0}}^{uninor} (θ_{0}, β^{*}) \\ = S_{θ_{0}}^{eff, nonpar} (θ_{0}, β^{*}) - \frac{\partial E {S_{θ_{0}}^{eff, nonpar} (θ_{0}, β)}}{\partial β^{T}} |_{β^{*}} E {\frac{\partial S_{β} (β)}{\partial β^{T}} |_{β^{*}}}^{- 1} S_{β} (β^{*}), \end{array}$
and thus converges in distribution to a $N (0, \sum_{θ_{0}})$ , where
$\sum_{θ_{0}} (θ_{0}, β^{*}) = E (S_{θ_{0}}^{uninor} {(θ_{0}, β^{*})}^{2}),$
with $β^{T} = (β_{m}^{T}, β_{e}^{T}, β_{y}^{T})$ and $S_{β} (β) = {(S_{m}^{T} (β_{m}), S_{e}^{T} (β_{e}), S_{y}^{T} (β_{y}))}^{T}$ , and with β* denoting the probability limit of the estimator $\hat{β} = {({\hat{β}}_{m}^{T}, {\hat{β}}_{e}^{T}, {\hat{β}}_{y}^{T})}^{T}$
Natural direct effect: Similarly, $\sqrt{n} ({\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly} - (θ_{0} - δ_{0}))$ is RAL under model ℳ_union with influence function $S_{NDE}^{union} (θ_{0}, δ_{0}, β^{*})$ defined as $S_{θ_{0}}^{union} (θ_{0}, β^{*})$ with $S_{NDE}^{eff, nonpar} (θ_{0}, δ_{0}, β^{*})$ replacing $S_{θ_{0}}^{eff, nonpar} (θ_{0}, β^{*})$ , and asymptotic variance $\sum_{θ_{0} - δ_{0}} (δ_{1}, θ_{0}, β^{*})$ defined accordingly.
Natural indirect effect: Similarly, $\sqrt{n} ({\hat{δ}}_{1}^{doubly} - {\hat{θ}}_{0}^{triply} - (δ_{1} - θ_{0}))$ is RAL under model union with influence function $S_{NIE}^{union} (δ_{1}, θ_{0}, β^{*})$ defined as $S_{θ_{0}}^{union} (θ_{0}, β^{*})$ with $S_{NIE}^{eff, nonpar} (δ_{1}, θ_{0}, β^{*})$ replacing $S_{θ_{0}}^{eff, nonpar} (θ_{0}, β^{*})$ , and asymptotic variance: $\sum_{δ_{1} - θ_{0}} (δ_{1}, θ_{0}, β^{*})$ defined accordingly.
${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{δ}}_{1}^{doubly} - {\hat{θ}}_{0}^{triply}$ are semiparametric locally efficient in the sense that they are RAL under model ℳ_union and respectively achieve the semiparametric efficiency bound for θ₀, θ₀ − δ_0, and δ₁ − θ₀ under model ℳ_union at the intersection submodel ℳ_a∩ℳ_b∩ℳ_c, with respective efficient influence functions: $S_{θ_{0}}^{eff, nonpar} (θ_{0}, β^{*})$ , $S_{NDE}^{eff, nonpar} (θ_{0}, δ_{0}, β^{*})$ and $S_{NIE}^{eff, nonpar} (δ_{1}, θ_{0}, β^{*})$ .

Empirical versions of $\sum_{θ_{0} - δ_{0}} (δ_{1}, θ_{0}, β^{*})$ and $\sum_{δ_{1} - θ_{0}} (δ_{1}, θ_{0}, β^{*})$ are easily obtained, and the corresponding Wald type confidence intervals can be used to make formal inferences about natural direct and indirect effects. It is also straightforward to extend the approach to the risk ratio and odds ratio scales for binary Y. By a theorem due to Robins and Rotnitzky (2001), part iv) of the theorem implies that when all models are correct, ${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{δ}}_{1}^{doubly} - {\hat{θ}}_{0}^{triply}$ are semiparametric efficient in model ℳ_nonpar at the intersection submodel ℳ_a∩ℳ_b∩ℳ_c.

3 A simulation study of estimators of direct effect

In this section, we report a simulation study which illustrates the finite sample performance of the various estimators described in previous sections. We generated 1000 samples of size n = 600, 1000 from the following model:

\begin{array}{l} (Model . X) X_{1} ~ Bernoulli (0.4); [X_{2} | X_{1}] ~ Bernoulli (0.3 + 0.4 X_{1}); \\ [X_{3} | X_{1}, X_{2}] ~ - 0.024 - 0.4 X_{1} + 0.4 X_{2} + N (0, 1)); \\ (Model . E) [E | X_{1}, X_{2}, X_{3}] ~ Bernoulli ({[1 + \exp {- (0.4 + X_{1} - X_{2} + 0.1 X_{3} - 1.5 X_{1} X_{3})}]}^{- 1}); \\ (Model . M) [M | E, X_{1}, X_{2}, X_{3}] \sim Bernoulli ({[1 + exp {- (0.5 - X_{1} + 0.5 X_{2} - 0.9 X_{3} + E - 1.5 X_{1} X_{3})}]}^{- 1}); \\ (Model . Y) [Y | M, E, X_{1}, X_{2}, X_{3}] \sim 1 + 0.2 X_{1} + 0.3 X_{2} + 14 X_{3} - 2.5 E - 3.5 M + 5 E M + N (0, 1) . \end{array}

We then evaluated the performance of the following four estimators of the natural direct effect ${\hat{θ}}_{0}^{e m} - {\hat{δ}}_{0}^{doubly}$ , ${\hat{θ}}_{0}^{y e} - {\hat{δ}}_{0}^{doubly}$ , ${\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ , and ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ . Note that the doubly robust estimator ${\hat{δ}}_{0}^{doubly}$ was used throughout to estimate $δ_{0} = E (Y_{0})$ . To assess the impact of modeling error, we evaluated these estimators in four separate scenarios. In the first scenario, all models were correctly specified, whereas the remaining three scenarios respectively mis-specified only one of Model.E, Model.M and Model.Y. In order to mis-specify Model.E and Model.M, we respectively left out the X₁ X₃ interaction when fitting each model and we assumed an incorrect log-log link function. The incorrect model for Y simply assumed no EM interaction.

Tables 1 and 2 summarize the simulation results which largely agree with the theory developed in the previous sections. Mainly, all proposed estimators performed well at both moderate and large sample sizes in the absence of modeling error. Furthermore, under the partially mis-specified model in which Model.Y was incorrect, both estimators, ${\hat{θ}}_{0}^{y e} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ showed significant bias irrespective of sample size, while ${\hat{θ}}_{0}^{e m} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ both performed well. Similarly when Model.M was incorrect, the estimators ${\hat{θ}}_{0}^{e m} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ resulted in large bias, when compared to the relatively small bias of ${\hat{θ}}_{0}^{y e} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ . Finally, mis-specifying Model.E lead to estimators ${\hat{θ}}_{0}^{y e} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ that were significantly more biased than the estimators ${\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ and ${\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ . Interestingly, the efficiency loss of the multiply robust estimator remained small when compared to the consistent non-robust estimator under the various scenarios, suggesting that, at least in this simulation study, the benefits of robustness appear to outweigh the loss of efficiency.

Table 1.

Simulation results n = 600

		ℳ_ym	ℳ_ye	ℳ_em	ℳ_union
All correct	bias	0.002	0.008	0.002	0.005
	MC s.e.^*	0.005	0.007	0.006	0.006
Y wrong	bias	−0.500	−0.500	0.0001	0.004
	MC s.e.	0.005	0.006	0.006	0.006
M wrong	bias	0.038	0.008	−0.054	0.003
	MC s.e.	0.005	0.007	0.006	0.006
E wrong	bias	0.003	0.027	0.059	0.004
	MC s.e.	0.005	0.005	0.005	0.005

Open in a new tab

$ℳ_{y m} : {\hat{θ}}_{0}^{y m} - {\hat{δ}}_{0}^{doubly}$ ; $ℳ_{y e} : {\hat{θ}}_{0}^{y e} - {\hat{δ}}_{0}^{doubly}$ ; $ℳ_{e m} : {\hat{θ}}_{0}^{e m} - {\hat{δ}}_{0}^{doubly}$ ; $ℳ_{union} : {\hat{θ}}_{0}^{triply} - {\hat{δ}}_{0}^{doubly}$ .

Monte Carlo standard error

Table 2.

Simulation results n=1000

		ℳ_ym	ℳ_ye	ℳ_em	ℳ_union
All correct	bias	0.001	0.009	0.001	0.001
	MC s.e.^*	0.004	0.005	0.004	0.004
Y wrong	bias	−0.484	−0.484	0.003	0.003
	MC s.e.	0.004	0.004	0.004	0.004
M wrong	bias	0.136	−0.008	0.056	0.01
	MC s.e.	0.004	0.05	0.004	0.01
E wrong	bias	0.001	−0.024	−0.054	0.001
	MC s.e.	0.004	0.004	0.004	0.004

Open in a new tab

Monte Carlo standard error

4 A data application

In this section, we illustrate the methods in a real world application from the psychology literature on mediation. We re-analyze data from The Job Search Intervention Study (JOBS II) also analyzed by Imai et al (2010b). JOBS II is a randomized field experiment that investigates the efficacy of a job training intervention on unemployed workers. The program is designed not only to increase reemployment among the unemployed but also to enhance the mental health of the job seekers. In the study, 1,801 unemployed workers received a pre-screening questionnaire and were then randomly assigned to treatment and control groups. The treatment group with E = 1 participated in job skills workshops in which participants learned job search skills and coping strategies for dealing with setbacks in the job search process. The control group with E = 0 received a booklet describing job search tips. An analysis considers a continuous outcome measure Y of depressive symptoms based on the Hopkins Symptom Checklist (Vinokur, Price, & Schul, 1995; Vinokur & Schul, 1997, Imai et al, 2010b). In the JOBS II data, a continuous measure of job search self-efficacy represented the hypothesized mediating variable M. The data also included baseline covariates X measured before administering the treatment including: pretreatment level of depression, education, income, race, marital status, age, sex, previous occupation, and the level of economic hardship.

Note that by randomization, the density of [E|X] was known by design not to depend on covariates, and therefore its estimation is not prone to modeling error. The continuous outcome and mediator variables were modeled using linear regression models with Gaussian error, with main effects for (E, M, X) included in the outcome regression and main effects for (E, X) included in the mediator regression. Table 3 summarizes results obtained using ${\hat{θ}}_{0}^{e m}$ , ${\hat{θ}}_{0}^{y e}$ , ${\hat{θ}}_{0}^{y m}$ and ${\hat{θ}}_{0}^{triply}$ together with ${\hat{δ}}_{e}^{doubly}$ , e = 0, 1, to estimate the direct and indirect effects of the treatment.

Table 3.

Estimated Causal Effects of Interest Using the Job Search Intervention Study Data

		ℳ_ym	ℳ_ye	ℳ_em	ℳ_union
Direct effect	Estimate	−0.0310	−0.0310	0.0280	−0.0409
	s.e.^*	0.0124	0.0620	0.0465	0.021 7
Indirect effect	Estimate	−0.0160	−0.0160	-0.0750	−0.0070
	s.e.^*	0.0372	0.0620	0.0434	0.021 7

Open in a new tab

Nonparametric bootstrap standard errors

Point estimates of both natural direct and indirect effects closely agreed under models ℳ_ym and ℳ_ye, and also agreed with the results of Imai et al (2010b). We should note that inferences under our choice of ℳ_ym are actually robust to the normality assumption and, as in Imai et al (2010b), only require that the mean structure of [Y|E, M, X] and [M|E, X] are correct. In contrast, inferences under model ℳ_em require a correct model for the mediator density. This distinction may partly explain the apparent disagreement in the estimated direct effect under ℳ_em when compared to the other methods, also suggesting that the Gaussian error model for M is not entirely appropriate. The multiply robust estimate of the natural direct effect is consistent with estimates obtained under models ℳ_ym and ℳ_ye, and is statistically significant, suggesting that the intervention may have beneficial direct effects on participants’ mental health; while the multiply robust approach suggests a much smaller indirect effect than all other estimators although none achieved statistical significance.

5 Improving the stability of ${\hat{θ}}_{0}^{triply}$ when weights are highly variable

The triply robust estimator ${\hat{θ}}_{0}^{triply}$ which involves inverse probability weights for the exposure and mediator variables, clearly relies on the positivity assumption, for good finite sample performance. But as recently shown by Kang and Shafer (2007) in the context of missing outcome data, a practical violation of positivity in data analysis can severely compromise inferences based on such methodology; although their analysis did not directly concern the M-functional θ₀. Thus, it is crucial to critically examine, as we do below in a simulation study, the extent to which the various estimators discussed in this paper are susceptible to a practical violation of the positivity assumption, and to consider possible approaches to improve the finite sample performance of these estimators in the context of highly variable empirical weights. Methodology to enhance the finite sample behaviour of ${\hat{δ}}_{j}^{doubly}$ is well studied in the literature and is not considered here, see for example Robins et al (2007), Cao et al (2009) and Tan (2010). We first describe an approach to enhance the finite sample performance of ${\hat{θ}}_{0}^{triply}$ , particularly in the presence of highly variable empirical weights. To focus the exposition, we only consider the case of a continuous Y and a binary M, but in principle, the approach could be generalized to a more general setting. The proposed enhancement involves two modifications.

The first modification adapts to the mediation context, an approach developed for the missing data context (and for the estimation of total effects) in Robins et al (2007). The basic guiding principle of the approach is to carefully modify the estimation of the outcome and mediator models in order to ensure that the triply robust estimator given by equation (4) has the simple M-functional representation

{\hat{θ}}_{0}^{triply, †} = ℙ_{n} {{\hat{η}}^{par, †} (1, 0, X)}

where ${\hat{η}}^{par, †} (1, 0, X)$ is carefully estimated to ensure multiple robustness. The reason for favoring an estimator with the above representation is that it is expected to be more robust to practical positivity violation because it does not directly depend on inverse probability weights. However, as we show next, to ensure multiple robustness, estimation of η^par involves inverse probability weights, and therefore, ${\hat{θ}}_{0}^{triply, †}$ indirectly depends on such weights. Our strategy involves a second step to minimize the potential impact of this indirect dependence on weights.

In the following, we assume to simplify the exposition that a simple linear model is used

E^{par} (Y | X, M, E = 1) = E^{par} (Y | X, M, 1; β_{y}) = [1, X^{T}, M] β_{y} .

Then, similarly to Robins et al (2007), one can verify that the above M-functional representation of a triply robust estimator is obtained by estimating $f_{M | E, X}^{par} (M | E = 0, X)$ with ${\hat{f}}_{M | E, X}^{par, †} (M | E = 0, X)$ obtained via weighted logistic regression in the unexposed-only, with weight ${\hat{f}}_{M | E}^{par} {(0 | X)}^{- 1}$ ; and by estimating $E^{par} (Y | X, M, E = 1)$ using weighted OLS of Y on (M, X) in the exposed-only, with weight

{\hat{f}}_{M | E, X}^{par, †} (M | E = 0, X) {{\hat{f}}_{E | X}^{par} (1 | X) {\hat{f}}_{M | E, X}^{par, †} (M | E = 1, X)}^{- 1};

provided that both working models include an intercept: The second enhancement to minimize undue influence of variable weights on the M-functional estimator, entails using ${\hat{f}}_{E | X}^{par, †}$ in the previous step instead of ${\hat{f}}_{E | X}^{par}$ , where

logit {\hat{f}}_{E | X}^{par, †} (1 | X) = logit {\hat{f}}_{E | X}^{par} (1 | X) + {\hat{C}}_{1}

with

{\hat{C}}_{1} = - \log ((1 - ℙ_{n} (E)) + \log (ℙ_{n} [E {\hat{f}}_{E | X}^{par} (0 | X) / {\hat{f}}_{E | X}^{par} (1 / X)])

This second modification ensures a certain boundedness property of inverse propensity score-weighting. Specifically, for any bounded function R = r(Y, M) of Y and M; consider for a moment the goal of estimating the counterfactual mean $E {r (Y_{1}, M_{1})}$ ; then it is well known that even though R is bounded, the simple inverse-probability weighting estimator $ℙ_{n} {E R {\hat{f}}_{E | X}^{par} {(1 | X)}^{- 1}}$ could easily be unbounded, particularly if positivity is practically violated. In contrast, as we show next, the estimator $ℙ_{n} {E R {\hat{f}}_{E | X}^{par, †} {(1 | X)}^{- 1}}$ is generally bounded. To see why, note that

\begin{array}{l} ℙ_{n} {E R {\hat{f}}_{E | X}^{par, †} {(1 | X)}^{- 1}} = ℙ_{n} {E R {\hat{f}}_{E | X}^{par, †} (0 | X) {\hat{f}}_{E | X}^{par, †} {(1 | X)}^{- 1}} + ℙ_{n} {R} \\ = ℙ_{n} {R \frac{E {\hat{f}}_{E | X}^{par,} (0 | X) {\hat{f}}_{E | X}^{par,} {(1 | X)}^{- 1}}{ℙ_{n} [E {\hat{f}}_{E | X}^{par} (0 | X) {\hat{f}}_{E | X}^{par} {(1 | X)}^{- 1}]} (1 - ℙ_{n} (E))} + ℙ_{n} {R} \end{array}

which is bounded since the second term is bounded, and the first term is a convex combination of bounded variables, and therefore is also bounded. Furthermore, $ℙ_{n} [E {\hat{f}}_{E | X}^{par, †} (0 | X) {\hat{f}}_{E | X}^{par, †} {(1 | X)}^{- 1}]$ converges in probability to $(1 - E (E))$ provided that ${\hat{f}}_{E | X}^{par}$ converges to f_E_|_X, ensuring that the expression in the above display is consistent for $E {r (Y_{1}, M_{1})}$ . The nonparametric bootstrap is most convenient for inference using ${\hat{f}}_{E | X}^{par, †}$ .

In the next section, we study in the context of highly variable weights, the behavior of our previous estimators of θ₀ together with that of the enhanced estimators ${\hat{θ}}_{0}^{triply, †, j} = ℙ_{n} {{\hat{η}}^{par, †, j} (1, 0, X)}$ , j =1, 2, where ${\hat{η}}^{par, †, 1}$ is constructed as described above using ${\hat{f}}_{E | X}^{par}$ , and ${\hat{η}}^{par, †, 2}$ uses ${\hat{f}}_{E | X}^{par, †}$ .

6 A simulation study where positivity is practically violated

We adapted to the mediation setting, the missing data simulation scenarios in Kang and Schafer (2007) which were specifically designed so that, when misspecified, working models are nonetheless nearly correct but yield highly variable inverse probability weights with practical positivity violation in the context of estimation. We generated 1000 samples of size n = 200; 1000 from the following model:

\begin{array}{l} (Model . X) Z = Z_{1}, Z_{2}, Z_{3}, Z_{4} \overset{i i d}{~} N (0, 1); X_{1} = \exp (Z_{1} / 2); \\ X_{2} = Z_{2} / {1 + \exp (Z_{1})} + 10; X_{3} = {(Z_{1} Z_{3} / 25 + 0.6)}^{3} \\ and X_{4} = {(Z_{2} + Z_{4} + 20)}^{2}, so that Z may be expressed in terms of X . \\ (Model . E) [E | X_{1}, X_{2}, X_{3}] ~ Bernoulli ({[1 + \exp {(Z_{1} - 0.5 Z_{2} + 0.25 Z_{3} + 0.1 Z_{4})}]}^{- 1}); \\ (Model . M) [M | E, X_{1}, X_{2}, X_{3}] ~ Bernoulli ({[1 + \exp {- (0.5 - Z_{1} + 0.5 Z_{2} - 0.9 Z_{3} + Z_{4} - 1.5 E)}]}^{- 1}) \\ (Model . Y) [Y | M, E, X_{1}, X_{2}, X_{3}] ~ 210 + 27.4 Z_{1} + 13.7 Z_{3} + 13.7 Z_{3} + M + E + N (0, 1) \end{array}

Correctly specified working models were thus achieved when an additive linear regression of Y on Z, a logistic regression of M with linear predictor additive in Z and E and a logistic regression of E with linear predictor additive in the Z, respectively. Incorrect specification involved fitting these models with X replacing Z, which produces higly variable weights. For instance, an estimated propensitiy score as small as 5.5 × 10⁻³³ occured in the simulation study reflecting an effective violation of positivity; similarly, a mediator predicted probability as small as 3 × 10⁻²⁰ also occured in the simulation study.

Tables 4 and 5 summarize simulation results for ${\hat{θ}}_{0}^{y m}$ , ${\hat{θ}}_{0}^{y e}$ , ${\hat{θ}}_{0}^{e m}$ , ${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply, †, 1}$ and ${\hat{θ}}_{0}^{triply, †, 2}$ . When all three working models are correct, all estimators perform well in terms of bias, but there are clear differences between the estimators in terms of efficiency. In fact, ${\hat{θ}}_{0}^{y m}$ , ${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply, †, 1}$ and ${\hat{θ}}_{0}^{triply, †, 2}$ have comparable efficiency for n = 200, 1000, but ${\hat{θ}}_{0}^{y e}$ , ${\hat{θ}}_{0}^{e m}$ is far more variable. Moreover, under mis-specification of a single model, ${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply, †, 1}$ and ${\hat{θ}}_{0}^{triply, †, 2}$ remain nearly unbiased, and for the most part substantially more efficient than the corresponding consistent estimator in ${{\hat{θ}}_{0}^{y m}, {\hat{θ}}_{0}^{y e}, {\hat{θ}}_{0}^{e m}}$ . When at least two models are mis-specified, the multiply robust estimators ${\hat{θ}}_{0}^{triply}$ , ${\hat{θ}}_{0}^{triply, †, 1}$ and ${\hat{θ}}_{0}^{triply, †, 2}$ generally outperform the other estimators, although ${\hat{θ}}_{0}^{triply}$ occasionally succumbs to the unstable weights resulting in disastrous mean squared error; see Table 5 when Model.M and Model.E are both incorrect. In contrast, ${\hat{θ}}_{0}^{triply, †, 2}$ generally improves on ${\hat{θ}}_{0}^{triply, †, 1}$ which generally outperforms ${\hat{θ}}_{0}^{triply}$ and for the most part ${\hat{θ}}_{0}^{triply, †, 1}$ and ${\hat{θ}}_{0}^{triply, †, 2}$ appears to eliminate any possible deleterious impact of highly variable weights.

Table 4.

Simulation results n = 200

ℳ_ym

ℳ_ye

ℳ_em

ℳ_union

ℳ_{union}^{†, 1}

ℳ_{union}^{†, 2}

All correct

bias

0.001

−0.207

0.498

0.003

−0.08

−0.079

MC s.e.^*

2.614

8.333

20.214

2.615 1

2.615 5

2.6153

Y wrong

bias

−9.87

−10.221

0.498

−0.147

−0.502

−0.202

MC s.e.

3.322

10.539

20.214

4.461

3.177

3.141

M wrong

bias

−0.033

−0.207

−9.497

0.001

0.046

MC s.e.

2.613

8.333

15.376

2.615

2.614

E wrong

bias

−0.001

0.132

210.450

0.066

−0.089

0.087

MC s.e.

2.614

4.373

2336.92

4.891

2.619

2.615

Y, E wrong

bias

−9.869

−13.535

210.454

−33.090

−1.4609

−2.487

MC s.e.

3.322

5.256

2336.92

375.334

5.187

4.245

Y, M wrong

bias

−9.355

−10.220

−9.496

−4.346

−3.579

MC s.e.

3.224

10.539

15.376

3.912

3.480

3,441

E, M wrong

bias

−0.032

0.132

205.060

0.088

−0.001

−3.77×10⁻⁵

MC s.e.

2.614

4.373

2289.788

4.763

2.623

2.618

Y, E, M wrong

bias

−9.355

−13.535

205.060

−37.757

−4.223

−5.253

MC s.e.

3.224

5.356

2289.78

379.122

5.835

4.828

Open in a new tab

$ℳ_{y m} : {\hat{θ}}_{0}^{y m}$ ; $ℳ_{y e} : {\hat{θ}}_{0}^{y e}$ ; $ℳ_{e m} : {\hat{θ}}_{0}^{e m}$ ; $ℳ_{union} : {\hat{θ}}_{0}^{triply}$ ; $ℳ_{union}^{†, 1} : {\hat{θ}}_{0}^{triply, †, 1}$ ; $ℳ_{union}^{†, 2} : {\hat{θ}}_{0}^{triply, †, 2}$ .

Monte Carlo standard error

Table 5.

Simulation results n = 1000

ℳ_ym

ℳ_ye

ℳ_em

ℳ_union

ℳ_{union}^{†, 1}

ℳ_{union}^{†, 2}

All correct

bias

0.0324

0.004

−0.106

0.034

−0.047

MC s.e.

1.136

3.06

6.490

1.136

1.137

Y wrong

bias

−10.256

−10.305

−0.106

0.063

−0.147

−0.148

MC s.e.

1.675

4.005

6.490

1.769

1.419

1.407

M wrong

bias

−5×10⁴

0.004

−9.706

0.033

0.076

MC s.e.

1.136

3.060

5.395

1.137

1.135

E wrong

bias

0.032

0.135

2.4×10⁶

1908.76

−0.038

−0.030

MC s.e.

1.136

1.794

4.3×10⁷

53911.63

1.400

1.242

Y, E wrong

bias

−10.256

−14.011

2.4×10⁶

−1.1×10⁶

6.201

1.024

MC s.e.

1.675

2.386

4.3×10⁷

2.1×10⁷

9.406

5.097

Y, M wrong

bias

−9.705

−10.305

−9.706

−4.216

−3.555

−3.557

MC s.e.

1.626

4.004

5.395

1.667

1.527

1.510

E, M wrong

bias

5.7×10⁴

0.135

2.5×10⁶

2034.83

0.0539

0.0599

MC s.e.

1.136

1.794

4.6×10⁷

56090.10

1.429

1.272

Y, E, M wrong

bias

−9.075

−14.011

2.5×10⁶

−1.2×10⁶

4.659

−0.755

MC s.e.

1.626

2.386

4.6×10⁷

2.2×10⁷

10.121

5.910

Open in a new tab

Monte Carlo standard error

7 A comparison to some existing estimators

In this section, we briefly compare the proposed approach to some existing estimators in the literature. Perhaps the most common approach for estimating direct and indirect effects when Y is continuous uses a system of linear structural equations; whereby, a linear structural equation for the outcome given the exposure, the mediator and the confounders is combined with a linear structural equation for the mediator given the exposure and confounders to produce an estimator of natural direct and indirect effects. The classical approach of Baron and Kenny (1986) is a particular instance of this approach. In recent work mainly motivated by Pearl’s mediation functional, several authors (Imai et al, 2010, Pearl, 2010, VanderWeele, 2009, VanderWeele and Vansteedlandt, 2010) have demonstrated how the simple linear structural equation approach generalizes to accommodate both, the presence of an interaction between exposure and mediator variables, and a nonlinear link function either in the regression model for the outcome or in the regression model for the mediator, or both. In fact, when the effect of confounders is also modeled in such structural equations, inferences based on the latter can be viewed as special instances of inferences obtained under a particular specification of model ℳ_a for the outcome and the mediator densities. And thus, as previously shown in the simulations, an estimator obtained under a system of structural equations will generally fail to produce a consistent estimator of natural direct and indirect effects when model ℳ_a is incorrect whereas, by using the proposed multiply robust estimator valid inferences can be recovered under the union model $ℳ_{b} \cup ℳ_{c}$ , even if $ℳ_{a}$ fails.

A notable improvement on the system of structural equations approach is the double robust estimator of a natural direct effect due to van der Laan and Petersen (2005). Their estimator solves the estimating equation constructed using an empirical version of $S_{NDE, singleton}^{eff, ℳ_{a} \cup ℳ_{c}} (θ_{0}, δ_{0})$ given in the online appendix. They show their estimator remains CAN in the larger submodel $ℳ_{a} \cup ℳ_{c}$ and therefore, they can recover valid inferences even when the outcome model is incorrect, provided both the exposure and mediator models are correct: Unfortunately, the van der Laan estimator is still not entirely satisfactory because unlike the proposed multiply robust estimator, it requires that the model for the mediator density is correct. Nonetheless, if the mediator model is correct, the authors establish that their estimator achieves the efficiency bound for model $ℳ_{a} \cup ℳ_{c}$ at the intersection submodel $ℳ_{a} \cap ℳ_{c}$ where all models are correct; and thus it is locally semiparametric efficient in $ℳ_{a} \cup ℳ_{c}$ . Interestingly, as we report in the online supplement, the semiparametric efficiency bounds for models $ℳ_{a} \cup ℳ_{c}$ and $ℳ_{a} \cup ℳ_{b} \cup ℳ_{c}$ are distinct, because the density of the mediator variable is not ancillary for inferences about the M-functional. Thus, any restriction placed on the mediator’s conditional density can, when correct, produce improvements in efficiency. This is in stark contrast with the role played by the density of the exposure variable, which as in the estimation of the marginal causal effect, remains ancillary for inferences about the M-functional and thus the efficiency bound for the latter is unaltered by any additional information on the former (Robins et al 1994). In the online appendix, we provide a general functional map that relates the efficient influence function for the larger model $ℳ_{a} \cup ℳ_{b} \cup ℳ_{c}$ to the efficient influence for the smaller model $ℳ_{a} \cup ℳ_{c}$ where the model for the mediator is either parametric or semiparametric. Our map is instructive because it makes explicit using simple geometric arguments, the information that is gained from increasing restrictions on the law of the mediator. In the online appendix, we illustrate the map by recovering the efficient influence function of van der Laan in the case of a singleton model (i.e. a known conditional density) for the mediator and in the case of a parametric model for the mediator.

8 A semiparametric sensitivity analysis

We describe a semiparametric sensitivity analysis framework to assess the extent to which a violation of the ignorability assumption for the mediator might alter inferences about natural direct and indirect effects. Although only results for the natural direct effect are given here, the extension for the indirect effect is easily deduced from the presentation. Let

t (e, m, x) = E [Y_{1, m} | E = e, M = m, X = x] - E [Y_{1, m} | E = e, M \neq m, X = x],

then

Y_{e', m} ⊥ M | E = e, X,

i.e. a violation of the ignorability assumption for the mediator variable, generally implies that

t (e, m, x) \neq 0 for some (e, m, x) .

Thus, we proceed as in Robins, Rotnitzky and Scharfstein (1999), and propose to recover inferences by assuming the selection bias function t (e; m; x) is known, which encodes the magnitude and direction of the unmeasured confounding for the mediator. In the following, $S$ is assumed to be finite. To motivate the proposed approach, suppose for the moment that f_M|E,X (M|E,X) is known, then under the assumption that the exposure is ignorable given X, we show in the appendix that:

\begin{array}{l} E [Y_{1, m} | M_{0} = m, X = x] = E [Y_{1, m} | E = 0, M = m, X = x] \\ = E [Y | E = 1, M = m, X = x] - t (1, m, x) (1 - f_{M | E, X} (m | E = 1, X = x)) \\ + t (0, m, x) (1 - f_{M | E, X} (m | E = 0, X = x)), \end{array}

and therefore the M-functional is identified by:

\begin{array}{l} \sum_{m \in S} E {E [Y | E = 1, M = m, X] - t (1, m, X) (1 - f_{M | E, X} (m | E = 1, X)) \\ + t (0, m, X) (1 - f_{M | E, X} (m | E = 0, X))} f_{M | E, X} (m | E = 0, X), \end{array}

(5)

which is equivalently represented as:

\begin{array}{l} E [\frac{I {E = 1} f_{M | E, X} (M | E = 0, X)}{f_{E | X} (1 | X) f_{M | E, X} (M | E = 1, X)} \\ \times {Y - t (1, M, X) (1 - f_{M | E, X} (m | E = 1, X)) + t (0, M, X) (1 - f_{M | E, X} (M | E = 0, X))}] . \end{array}

(6)

Below, these two equivalent representations (5) and (6) are carefully combined to obtain a double robust estimator of the M-functional assuming t (·,·,·) is known. A sensitivity analysis is then obtained by repeating this process and reporting inferences for each choice of t (·,·,·) in a finite set of user–specified functions $T = {t_{λ} (\cdot, \cdot, \cdot) : λ}$ indexed by a finite dimensional parameter λ with $t_{0} (\cdot, \cdot, \cdot) \in T$ corresponding to the no unmeasured confounding assumption, i.e. t₀ (·,·,·) ≡ 0. Throughout, the model $f_{M | E, X}^{par} (\cdot | E, X; β_{m})$ for the probability mass function of M is assumed to be correct. Thus, to implement the sensitivity analysis, we develop a semiparametric estimator of the natural direct effect in the union model $ℳ_{a} \cup ℳ_{c}$ , assuming t (·,·,·) =t_λ* (·,·,·) for a fixed λ*. The proposed doubly robust estimator of the natural direct effect is then given by ${\hat{θ}}_{0}^{doubly} (λ^{*}) - {\hat{δ}}_{0}^{doubly}$ where ${\hat{θ}}_{0}^{doubly}$ is as previously described, and

\begin{array}{l} {\hat{θ}}_{0}^{doubly} (λ^{*}) = P_{n} [\frac{I {E = 1} {\hat{f}}_{M | E, X}^{par} (M | E = 0, X)}{{\hat{f}}_{E | X}^{par} (1 | X) {\hat{f}}_{M | E, X}^{par} (M | E = 1, X)} {Y - {\hat{E}}^{par} (Y | X, M, E = 1)]} \\ + {\tilde{η}}^{par} (1, 0, X; λ^{*})], \end{array}

with

\begin{array}{l} {\tilde{η}}^{par} (1, 0, X; λ^{*}) = \sum_{m \in S} {{\hat{E}}^{par} (Y | X, M = m, E = 1) + t_{λ *} (0, m, X) (1 - {\hat{f}}_{M | E, X}^{par} (m | E = 0, X)) \\ - t_{λ^{*}} (1, m, M) (1 - {\hat{f}}_{M | E, X}^{par} (m | E = 1, X))} {\hat{f}}_{M | E, X}^{par} (m | E = 0, X) . \end{array}

Our sensitivity analysis then entails reporting the set ${{\hat{θ}}_{0}^{doubly} (λ) - {\hat{δ}}_{0}^{doubly} : λ}$ (and the associated confidence intervals) which summarizes how sensitive inferences are to a deviation from the ignorability assumption $λ = 0$ . A theoretical justification for the approach is given by the following formal result which is proved in the supplemental appendix

Theorem 4

Suppose t (·,·,·) =t_λ*(·,·,·), then under the consistency, positivity assumptions, and the ignorability assumption for the exposure, ${\hat{θ}}_{0}^{doubly} (λ^{*}) - {\hat{δ}}_{0}^{doubly}$ is a CAN estimator of the natural direct effect in $ℳ_{a} \cup ℳ_{c}$ .

The influence function of ${\hat{θ}}_{0}^{doubly} (λ^{*})$ is provided in the appendix, and can be used to construct a corresponding confidence interval.

It is important to note that the sensitivity analysis technique presented here differs in crucial ways from previous techniques developed by Hafeman (2008), VanderWeele (2010) and Imai et al (2010a). First, the methodology of Vanderweele (2010) postulates the existence of an unmeasured confounder U (possibly vector valued) which when included in X recovers the sequential ignorability assumption. The sensitivity analysis then requires specification of a sensitivity parameter encoding the effect of the unmeasured confounder on the outcome within levels of (E, X, M), and another parameter for the effect of the exposure on the density of the unmeasured confounder given (X, M). This is a daunting task which renders the approach generally impractical, except perhaps in the simple setting where it is reasonable to postulate a single binary confounder is unobserved, and one is willing to make further simplifying assumptions about the required sensitivity parameters (VanderWeele, 2010). In comparison, the proposed approach circumvents this difficulty by concisely encoding a violation of the ignorability assumption for the mediator through the selection bias function t_λ (e, m, x). Thus the approach makes no reference and thus is agnostic about the existence, dimension, and nature of unmeasured confounders U: Furthermore, in our proposal, the ignorability violation can arise due to an unmeasured confounder of the mediator-outcome relationship that is also an effect of the exposure variable, a setting not handled by the technique of VanderWeele (2010). The method of Hafeman (2008) which is restricted to binary data, shares some of the limitations given above. Finally, in contrast with our proposed double robust approach, a coherent implementation of the sensitivity analysis techniques of Imai et al (2010a, 2010b) and VanderWeele (2010) both rely on correct specification of all posited models. We refer the reader to VanderWeele (2010) for further discussion of Hafeman (2008) and Imai et al (2010a).

9 Discussion

The main contribution of the current paper is a theoretically rigorous yet practically relevant semiparametric framework for making inferences about natural direct and indirect causal effects in the presence of a large number of confounding factors. Semiparametric efficiency bounds are given for the nonparametric model, and multiply robust locally efficient estimators are developed that can be used when nonparametric estimation is not possible.

Although the paper focuses on a binary exposure, we note that the extension to a polytomous exposure is trivial. In future work, we shall extend our results for marginal effects by considering conditional natural direct and indirect effects given a subset of pre-exposure variables. These models are particular important in making inferences about so-called moderated mediation effects, a topic of growing interest particularly in the field of psychology(Preacher, Rucker and Hayes, 2007). In related work, we have recently extended our results to a survival analysis setting (Tchetgen Tchetgen, 2011).

A major limitation of the current paper is that it assumes that the mediator is measured without error, an assumption that may be unrealistic in practice; and if incorrect may result in biased inferences about mediated effects. We note that much of the recent literature on causal mediation analysis makes a similar assumption. In future work, it will be important to build on the results derived in the current paper to appropriately account for a mis-measured mediator.

Acknowledgments

The authors would like to acknowledge Andrea Rotnitzky who provided invaluable comments that improved the presentation of the results given in Section 7 of the manuscript. The authors also thank James Robins and Tyler VanderWeele for useful comments that significantly improved the presentation of this article.

APPENDIX

PROOF OF THEOREM 1

Let F_O;t =F_Y_|_M_,_X,E;t F_M_|_E_,_X;t F_E_|_X;t F_X;t denote a one dimensional regular parametric submodel of $ℳ_{nonpar}$ , with F_O_;0 = F_O, and let

θ_{t} = θ_{0} (F_{O; t}) = \iint_{S \times X} E_{t} (Y | E = 1, M = m, X = x) f_{M | E, X; t} (m | E = 0, X = x) f_{X; t} (x) d μ (m, x)

The efficient influence function $S_{θ_{0}}^{eff, nonpar} (θ_{0})$ is the unique random variable to satisfy the following equation

\nabla_{t = 0} θ_{t} = E {S_{θ_{0}}^{eff, nonpar} (θ_{0}) U}

for U the score of F_O;t at t = 0; and ∇_t₌₀ denoting differentiation wrt t at t = 0: We observe that

\begin{array}{l} \frac{\partial θ_{t}}{\partial t} |_{t = 0} = \iint_{S \times X} \nabla_{t = 0} E_{t} (Y | E = 1, M = m, X = x) f_{M | E, X} (m | E = 0, X = x) f_{X} (x) d μ (m, x) \\ + \iint_{S \times X} E (Y | E = 1, M = m, X = x) \nabla_{t = 0} f_{M | E, X} (m | E = 0, X = x) f_{X} (x) d μ (m, x) \\ + \iint_{S \times X} E (Y | E = 1, M = m, X = x) f_{M | E, X} (m | E = 0, X = x) \nabla_{t = 0} f_{X; t} (x) d μ (m, x) \end{array}

Consider the first term, it is straightforward to verify that:

\begin{array}{l} \iint_{S \times X} \nabla_{t = 0} E_{t} (Y | E = 1, M = m, X = x) f_{M | E, X} (m | E = 0, X = x) f_{X} (x) d μ (m, x) \\ = E [U \frac{I (E = 1)}{f_{E | X} (E | X)} {Y - E (Y | E, M = m, X = x)} \frac{f_{M | E, X} (M | E = 0, X)}{f_{M | E, X} (M | E = 1, X)}] \end{array}

Similarly, one can easily verify that

\begin{array}{l} \iint_{S \times X} E (Y | E = 1, M = m, X = x) \nabla_{t = 0} f_{M | E, X; t} (m | E = 0, X = x) f_{X} (x) d μ (m, x) \\ = E [U \frac{I (E = 0)}{f_{E | X} (E | X)} {E (Y | E = 1, M = m, X = x) - η (1, 0, X)}] \end{array}

and finally, one can also verify that

\begin{array}{l} \iint_{S \times X} E (Y | E = 1, M = m, X = x) f_{M | E, X} (m | E = 0, X = x) \nabla_{t = 0} f_{X; t} (x) d μ (m, x) \\ = E [U {η (1, 0, X) - θ_{0}}] \end{array}

Thus, we obtain

\nabla_{t = 0} θ_{t} = E {S_{θ_{0}}^{eff, nonpar} (θ_{0}) U}

Given $S_{δ_{e}}^{eff, nonpar} (δ_{e})$ the results for the direct and indirect effect follow from the fact that the influence function of a difference of two functionals equals the difference of the respective influence functions. Because the model is nonparametric, there is a unique influence function for each functional, and it is efficient in the model leading to the efficiency bound results.

PROOF OF THEOREM 2

We begin by showing that

E {S_{θ_{0}}^{eff, nonpar} (θ_{0}; β_{m}^{*}, β_{e}^{*}, β_{y}^{*})} = 0

(7)

under model $ℳ_{union}$ . First note that $(β_{y}^{*}, β_{m}^{*}) = (β_{y}, β_{m})$ under model $ℳ_{a}$ Equality (7) now follows because $E^{par} (Y | X, M, E = 1; β_{y}) = E (Y | X, M, E = 1)$ and $η (1, 0, X; β_{y}, β_{m}) = E [{E^{par} (Y | X, M, E = 1; β_{y})} | E = 0, X] = η (1, 0, X)$

\begin{array}{l} E {S_{θ_{0}}^{eff, nonpar} (θ_{0}; β_{m}, β_{e}^{*}, β_{y})} \\ = E [\frac{I {E = 1} f_{M | E, X}^{par} (M | E = 0, X; β_{m})}{f_{E | X}^{par} (1 | X; β_{e}^{*}) f_{M | E, X}^{par} (M | E = 1, X; β_{m})} \overset{= 0}{\overset{︷}{E {Y - E^{par} (Y | X, M, E = 1; β_{y})] | E = 1, M, X}}}] \\ + E [\frac{I (E = 0)}{f_{E | X}^{par} (1 | X; β_{e}^{*})} \overset{= 0}{\overset{︷}{E [{E^{par} (Y | X, M, E = 1; β_{y}) - η (1, 0, X; β_{y}, β_{m})} | E = 0, X]}}] \\ + E [η (1, 0, X; β_{y}, β_{m})] - θ_{0} \\ = 0 \end{array}

Second, $(β_{y}^{*}, β_{e}^{*}) = (β_{y}, β_{e})$ under model $ℳ_{b}$ Equality (7) now follows because $E^{par} (Y | X, M, E = 1; β_{y}) = E (Y | X, M, E = 1)$ and $f_{E | X}^{par} (1 | X; β_{e}) = f_{E | X} (1 | X)$ :

\begin{array}{l} E {S_{θ_{0}}^{eff, nonpar} (θ_{0}; β_{m}^{*}, β_{e}, β_{y})} \\ = E [\frac{I {E = 1} f_{M | E, X}^{par} (M | E = 0, X; β_{m}^{*})}{f_{E | X}^{par} (1 | X; β_{e}) f_{M | E, X}^{par} | (M | E = 1, X; β_{m})} \overset{= 0}{\overset{︷}{E {Y - E^{par} (Y | X, M, E = 1; β_{y})] | E = 1, M, X}}}] \\ + E [\frac{I (E = 0)}{f_{E | X}^{par} (1 | X; β_{e})} \overset{= 0}{\overset{︷}{E [{E^{par} (Y | X, M, E = 1; β_{y}) - η (1, 0, X; β_{y}, β_{m}^{*})} | E = 0, X]}}] \\ + E [η (1, 0, X; β_{y}, β_{m}^{*})] - θ_{0} \\ = E [E [{E^{par} (Y | X, M, E = 1; β_{y})} | E = 0, X]] - θ_{0} = 0 \end{array}

Third, equality (7) holds under model $ℳ_{c}$ because

\begin{array}{l} E {S_{θ_{0}}^{eff, nonpar} (θ_{0}; β_{m}, β_{e}, β_{y}^{*})} \\ = E [\frac{I {E = 1} f_{M | E, X}^{par} (M | E = 0, X; β_{m})}{f_{E | X}^{par} (1 | X; β_{e}) f_{M | E, X}^{par} | (M | E = 1, X; β_{m})} E {Y - E^{par} (Y | X, M, E = 1; β_{y}^{*})]}] \\ + E [\frac{I (E = 0)}{f_{E | X}^{par} (1 | X; β_{e})} E [{E^{par} (Y | X, M, E = 1; β_{y}^{*}) - η (1, 0, X; β_{y}^{*}, β_{m})} | E = 0, X]] \\ + E [η (1, 0, X; β_{y}^{*}, β_{m})] - θ_{0} \\ = E [E [{E (Y | X, M, E = 1)} | E = 0, X]] - E [E [E^{par} (Y | X, M, E = 1; β_{y}^{*}) | E = 0, X]] \\ + E [E [E^{par} (Y | X, M, E = 1; β_{y}^{*}) | E = 0, X]] - E [η (1, 0, X; β_{y}^{*}, β_{m})] \\ + E [η (1, 0, X; β_{y}^{*}, β_{m})] - θ_{0} \\ = E [E [{E (Y | X, M, E = 1)} | E = 0, X]] - θ_{0} \end{array}

Assuming that the regularity conditions of Theorem 1A in Robins, Mark and Newey (1992) hold for $S_{θ_{0}}^{eff, nonpar} (θ_{0}; β_{m}, β_{e}, β_{y}), S_{β} (β)$ ; the expression for $S_{θ_{0}}^{union} (θ_{0}, β *)$ follows by standard Taylor expansion arguments and it now follows that

\sqrt{n} ({\hat{θ}}_{0}^{triply} - θ_{0}) = \frac{1}{n^{1 / 2}} \sum_{i = 1}^{n} S_{θ_{0, i}}^{union} (θ_{0}, β *) + o_{p} (1)

(8)

The asymptotic distribution of $\sqrt{n} ({\hat{θ}}_{θ_{0}}^{triply} - θ_{0})$ under model ℳ_union follows from the previous equation by Slutsky’s Theorem and the Central Limit Theorem.

We note that ${\hat{δ}}_{e}^{doubly}$ is CAN in the union model ℳ_union since it is CAN in the larger model where either the density for the exposure is correct, or the density of the mediator and the outcome regression are both correct and thus $η (e, e, X; β_{y}^{*}, β_{m}^{*}) = E (Y | X, E = e)$ . This gives the multiply robust result for direct and indirect effects. The asymptotic distribution of direct and indirect effect estimates then follow from similar arguments as above.

At the intersection submodel

\frac{\partial E {S_{θ_{0}}^{eff, nonpar} (θ_{0}, β)}}{\partial β^{T}} = 0

hence

S_{θ_{0}}^{union} (θ_{0}, β) = S_{θ_{0}}^{eff, nonpar} (θ_{0}, β) .

The semiparametric efficiency claim then follows for ${\hat{θ}}_{0}^{triply}$ and a similar argument gives the result for direct and indirect effects.

PROOF OF THEOREMS 3 & 4

The proofs are given in the online appendix.

Footnotes

AMS 1991 Subject Classifications. Primary: 62G05.

References

Avin C, Shpitser I, Pearl J. Identifiability of path-specific effects. IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; Edinburgh, Scotland, UK. July 30–August 5, 2005; 2005. pp. 357–363. [Google Scholar]
Bang H, Robins J. Doubly robust estimation in Missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]
Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semi-parametric Models. Springer; New York: 1993. [Google Scholar]
Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:732–734. doi: 10.1093/biomet/asp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goetgeluk S, Vansteelandt S, Goetghebeur E. Estimation of controlled direct effects. Journal of the Royal Statistical Society – Series B. 2008;70:1049–1066. [Google Scholar]
Hafeman D. Opening the Black Box: A Reassessment of Mediation from a Counterfactual Perspective[dissertation] New York: Columbia University; 2008. [Google Scholar]
Hafeman D, VanderWeele T. Alternative assumptions for the identification of direct and indirect effects. Epidemiology. 2009 doi: 10.1097/EDE.0b013e3181c311b2. In press. [DOI] [PubMed] [Google Scholar]
Hahn J. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica. 1998;66:315–331. [Google Scholar]
van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer Verlag; New York: 2003. [Google Scholar]
van der Laan M, Petersen M. Direct Effect Models. (Working Paper 187).U.C Berkeley Division of Biostatistics Working Paper Series. 2005 http://www.bepress.com/ucbbiostat/paper187.
Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science. 2010a;25:51–71. [Google Scholar]
Imai K, Keele L, Tingley D. A General Approach to Causal Mediation Analysis. Psychological Methods. 2010b Dec;15(4):309–334. doi: 10.1037/a0020761. (lead article) [DOI] [PubMed] [Google Scholar]
Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statist Sci. 2007;22:523–39. doi: 10.1214/07-STS227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pearl J. Direct and indirect effects. Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01); San Francisco, CA. Morgan Kaufmann; 2001. pp. 411–42. [Google Scholar]
Pearl J. The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models. Technical report. 2011 http://ftp.cs.ucla.edu/pub/statser/r379.pdf.
Preacher KJ, Rucker DD, Hayes AF. Assessing moderated mediation hypotheses: Strategies, methods, and prescriptions. Multivariate Behavioral Research. 2007;42:185–227. doi: 10.1080/00273170701341316. [DOI] [PubMed] [Google Scholar]
Robins JM, Greenl S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
Robins JM, Mark SD, Newey WK. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed] [Google Scholar]
Robins JM, Rotnitzky A, Scharfstein D. Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Vol. 116. NY: Springer-Verlag; 1999. pp. 1–92. IMA. [Google Scholar]
Robins J. Semantics of causal DAG models and the identification of direct and indirect effects. In: Green P, Hjort N, Richardson S, editors. Highly Structured Stochastic Systems. Oxford, UK: Oxford University Press; 2003. pp. 70–81. [Google Scholar]
Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer”. Statistica Sinica. 2001;11(4):920–936. [Google Scholar]
Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science. 2000;1999:6–10. [Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
Robins JM, Sued M, Lei-Gomez Q, Rotnitsky A. Comment: Performance of double-robust estimators when “Inverse Probability” weights are highly variable. Statistical Science. 2007;22(4):544–559. [Google Scholar]
Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. In: Shrout P, editor. To appear in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures. Oxford University Press; 2010. [Google Scholar]
Scharfstein DO, Rotnitzky A, Robins JM. Rejoinder to comments on “Adjusting for non-ignorable drop-out using semiparametric non-response models”. Journal of the American Statistical Association. 1999;94:1096–1120. [Google Scholar]
Tchetgen Tchetgen EJ. On Causal Mediation Analysis with a Survival Outcome. The International Journal of Biostatistics. 2011;7(1) doi: 10.2202/1557-4679.1351. Article 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsiatis AA. Semiparametric Theory and Missing Data. Springer, Verlag; New York: 2006. [Google Scholar]
VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20:18–26. doi: 10.1097/EDE.0b013e31818f69ce. [DOI] [PubMed] [Google Scholar]
VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome - with discussion. American Journal of Epidemiology. 2010;172:1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]
VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–551. doi: 10.1097/EDE.0b013e3181df191c. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Avin C, Shpitser I, Pearl J. Identifiability of path-specific effects. IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; Edinburgh, Scotland, UK. July 30–August 5, 2005; 2005. pp. 357–363. [Google Scholar]

[R2] Bang H, Robins J. Doubly robust estimation in Missing data and causal inference models. Biometrics. 2005;61:692–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]

[R3] Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]

[R4] Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semi-parametric Models. Springer; New York: 1993. [Google Scholar]

[R5] Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:732–734. doi: 10.1093/biomet/asp033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Goetgeluk S, Vansteelandt S, Goetghebeur E. Estimation of controlled direct effects. Journal of the Royal Statistical Society – Series B. 2008;70:1049–1066. [Google Scholar]

[R7] Hafeman D. Opening the Black Box: A Reassessment of Mediation from a Counterfactual Perspective[dissertation] New York: Columbia University; 2008. [Google Scholar]

[R8] Hafeman D, VanderWeele T. Alternative assumptions for the identification of direct and indirect effects. Epidemiology. 2009 doi: 10.1097/EDE.0b013e3181c311b2. In press. [DOI] [PubMed] [Google Scholar]

[R9] Hahn J. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica. 1998;66:315–331. [Google Scholar]

[R10] van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer Verlag; New York: 2003. [Google Scholar]

[R11] van der Laan M, Petersen M. Direct Effect Models. (Working Paper 187).U.C Berkeley Division of Biostatistics Working Paper Series. 2005 http://www.bepress.com/ucbbiostat/paper187.

[R12] Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science. 2010a;25:51–71. [Google Scholar]

[R13] Imai K, Keele L, Tingley D. A General Approach to Causal Mediation Analysis. Psychological Methods. 2010b Dec;15(4):309–334. doi: 10.1037/a0020761. (lead article) [DOI] [PubMed] [Google Scholar]

[R14] Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statist Sci. 2007;22:523–39. doi: 10.1214/07-STS227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Pearl J. Direct and indirect effects. Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01); San Francisco, CA. Morgan Kaufmann; 2001. pp. 411–42. [Google Scholar]

[R16] Pearl J. The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models. Technical report. 2011 http://ftp.cs.ucla.edu/pub/statser/r379.pdf.

[R17] Preacher KJ, Rucker DD, Hayes AF. Assessing moderated mediation hypotheses: Strategies, methods, and prescriptions. Multivariate Behavioral Research. 2007;42:185–227. doi: 10.1080/00273170701341316. [DOI] [PubMed] [Google Scholar]

[R18] Robins JM, Greenl S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]

[R19] Robins JM, Mark SD, Newey WK. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed] [Google Scholar]

[R20] Robins JM, Rotnitzky A, Scharfstein D. Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Vol. 116. NY: Springer-Verlag; 1999. pp. 1–92. IMA. [Google Scholar]

[R21] Robins J. Semantics of causal DAG models and the identification of direct and indirect effects. In: Green P, Hjort N, Richardson S, editors. Highly Structured Stochastic Systems. Oxford, UK: Oxford University Press; 2003. pp. 70–81. [Google Scholar]

[R22] Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer”. Statistica Sinica. 2001;11(4):920–936. [Google Scholar]

[R23] Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science. 2000;1999:6–10. [Google Scholar]

[R24] Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]

[R25] Robins JM, Sued M, Lei-Gomez Q, Rotnitsky A. Comment: Performance of double-robust estimators when “Inverse Probability” weights are highly variable. Statistical Science. 2007;22(4):544–559. [Google Scholar]

[R26] Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. In: Shrout P, editor. To appear in Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures. Oxford University Press; 2010. [Google Scholar]

[R27] Scharfstein DO, Rotnitzky A, Robins JM. Rejoinder to comments on “Adjusting for non-ignorable drop-out using semiparametric non-response models”. Journal of the American Statistical Association. 1999;94:1096–1120. [Google Scholar]

[R28] Tchetgen Tchetgen EJ. On Causal Mediation Analysis with a Survival Outcome. The International Journal of Biostatistics. 2011;7(1) doi: 10.2202/1557-4679.1351. Article 33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Tsiatis AA. Semiparametric Theory and Missing Data. Springer, Verlag; New York: 2006. [Google Scholar]

[R30] VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20:18–26. doi: 10.1097/EDE.0b013e31818f69ce. [DOI] [PubMed] [Google Scholar]

[R31] VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome - with discussion. American Journal of Epidemiology. 2010;172:1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] VanderWeele TJ. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology. 2010;21:540–551. doi: 10.1097/EDE.0b013e3181df191c. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis

Eric J Tchetgen Tchetgen

Ilya Shpitser

Abstract

1 Introduction

2 The nonparametric mediation functional

2.1 Identification

Consistency

Sequential ignorability

positivity

2.2 Semiparametric efficiency bounds for ℳnonpar

Theorem 1

Strategy 1

Strategy 2

Strategy 3

2.3 Triply robust estimation

Theorem 2

3 A simulation study of estimators of direct effect

Table 1.

Table 2.

4 A data application

Table 3.

5 Improving the stability of θ^0triply when weights are highly variable

6 A simulation study where positivity is practically violated

Table 4.

Table 5.

7 A comparison to some existing estimators

8 A semiparametric sensitivity analysis

Theorem 4

9 Discussion

Acknowledgments

APPENDIX

PROOF OF THEOREM 1

PROOF OF THEOREM 2

PROOF OF THEOREMS 3 & 4

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.2 Semiparametric efficiency bounds for ℳ_nonpar

5 Improving the stability of ${\hat{θ}}_{0}^{triply}$ when weights are highly variable