Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Nov 12;16(2):339–351. doi: 10.1093/biostatistics/kxu048

Sensitivity analyses for parametric causal mediation effect estimation

Jeffrey M Albert 1,*, Wei Wang 2
PMCID: PMC4441101  PMID: 25395683

Abstract

Causal mediation analysis uses a potential outcomes framework to estimate the direct effect of an exposure on an outcome and its indirect effect through an intermediate variable (or mediator). Causal interpretations of these effects typically rely on sequential ignorability. Because this assumption is not empirically testable, it is important to conduct sensitivity analyses. Sensitivity analyses so far offered for this situation have either focused on the case where the outcome follows a linear model or involve nonparametric or semiparametric models. We propose alternative approaches that are suitable for responses following generalized linear models. The first approach uses a Gaussian copula model involving latent versions of the mediator and the final outcome. The second approach uses a so-called hybrid causal-observational model that extends the association model for the final outcome, providing a novel sensitivity parameter. These models, while still assuming a randomized exposure, allow for unobserved (as well as observed) mediator-outcome confounders that are not affected by exposure. The methods are applied to data from a study of the effect of mother education on dental caries in adolescence.

Keywords: Causal inference, Copula, Interaction, Mediation analysis, Mediation formula, Potential outcome, Structural equations model

1. Introduction

Mediation analysis has become popular as an approach for studying the mechanisms through which a treatment or exposure affects an outcome. In practice, mediation analysis utilizes one or more measured intermediate variables (or mediators) hypothesized to lie on the causal pathway between the exposure and the outcome. Typically, the analysis involves the decomposition of the overall exposure effect into a direct effect and an indirect (mediation) effect occurring through one or more intermediate variables.

A classical approach to mediation analysis involves the fit of linear regression models for the outcome and mediator (Baron and Kenny, 1986). Recently, causal model (potential outcomes) approaches have been developed that provide causally interpretable mediation effects under flexible situations, including generalized linear models and models with interactions (Robins and Greenland, 1992; Albert, 2008; Imai and others, 2010b).

One very general approach to causal mediation analysis is based on the mediation formula (Imai and others, 2010b; Albert and Nelson, 2011; Pearl, 2012). Although a nonparametric version of this method is possible (Imai and others, 2010b), a very general and practical use of the mediation formula is via parametric models. Particular applications of the parametric mediation formula are in several recent papers (Wang and Albert, 2012; Wang and others, 2013).

The mediation formula, which provides a key tool for the methods to be presented in this paper, is based on the potential outcomes framework. Some basic definitions and notation for this framework are as follows. We let Inline graphic denote the potential outcome for (response variable) Inline graphic if causal factor Inline graphic were set to level Inline graphic for individual Inline graphic. Note that we will sometimes drop the subscript Inline graphic when not needed for clarity. The central problem of causal inference is that a response, Inline graphic, for any given individual, is observed (at a given time) for only one level of the exposure Inline graphic. Potential outcomes for exposure levels not actually observed are referred to as counterfactuals. To define mediation effects, we use nested potential outcomes. For example, Inline graphic denotes the potential outcome for Inline graphic if exposure, Inline graphic, were set to Inline graphic, but the mediator, Inline graphic, set to its potential outcome if Inline graphic were set to Inline graphic. We let Inline graphic denote the average of this potential outcome over a specified population. Then we define the direct and indirect effects, respectively, as

1. (1.1)

Note that Inline graphic and Inline graphic represent “natural” direct and indirect effects whereby the mediator, Inline graphic, is set to the value (potential outcome) that would “naturally” be observed under the specified exposure level. These estimands are in contrast to those of “controlled” direct and indirect effects whereby Inline graphic is fixed at a particular common value. The total effect, Inline graphic can be written as Inline graphic, demonstrating that the natural direct and indirect effects represent an exact decomposition of the total exposure effect.

A key assumption in the identification of the natural direct and indirect effects (as used, e.g., by Imai and others, 2010b) is sequential ignorability. This assumption (discussed in detail in Section 2 below) comprises two conditions loosely stated as follows: (i) there are no unobserved confounders of the Inline graphicInline graphic or Inline graphicInline graphic relationships and (ii) there are no unobserved (baseline or post-exposure) Inline graphicInline graphic confounders. Unfortunately, these assumptions are not testable, although the first condition can sometimes be justified by the design, namely, when the exposure or treatment is randomly assigned or a sufficient set of prognostic baseline variables measured. The second part of the assumption, however, is generally difficult to make with confidence due to the possibility of unobserved Inline graphicInline graphic confounders. For this reason, a number of scholars (including Imai and others, 2010b) emphasize the importance of conducting a sensitivity analysis to accompany mediation effect estimates.

Approaches to sensitivity analysis offered in the past (Imai and others, 2010b) have often relied on linear or linearizable (e.g., probit) models. In these cases, sequential ignorability amounts to assuming a zero correlation between the error terms for the Inline graphic and Inline graphic models. Imai and others (2010b) proposed a sensitivity analysis that examines the impact of varying this correlation, and Imai and others (2010a) extended this approach to a binary outcome using a probit regression model. VanderWeele (2010) presented a general nonparametric methodology that considered an unobserved confounder between the final outcome and mediator. Although simple bias formulae are provided under simplifying assumptions, more generally, this approach may require complex specifications of quantities involving the distribution of (or distributions that condition on) one or more confounders. Hafeman (2011) simplified the required specifications but limited the sensitivity analysis to the case of binary variables (exposure, mediator, and outcome). Recent contributions include Tchetgen Tchetgen and Shpitser (2012) who proposed a double robust semiparametric approach, and VanderWeele and Chiba (2014) who provided a nonparametric approach allowing for unobserved Inline graphicInline graphic confounders affected by Inline graphic. A limitation of the latter two methods is that the number of sensitivity parameters increases with the dimensionality of Inline graphic, making these approaches difficult when Inline graphic is not binary.

In the present paper, we propose two new approaches to sensitivity analysis designed to accommodate different types of variables, each assumed to follow a generalized linear model. The first approach uses a Gaussian copula model for the joint distribution of potential outcomes for Inline graphic and Inline graphic. Wang and others (2013) used a similar idea but made stronger assumptions than those of the present approach, resulting in a different derived model. The second approach considers a “hybrid” model that extends the association model for Inline graphic by incorporating causal as well as observational effects. The new approaches represent novel alternatives to the predominant approach that specifies effects of unobserved confounders, an approach that goes back to Cornfield and others (1959); see also Rosenbaum (2002).

Following some background in Section 2, we present the new methods in Section 3. In Section 4, we jointly apply the two new methods to data from a study of parental factors in the development of dental caries among adolescents. A concluding discussion, including advantages and limitations of each method, is provided in Section 5.

2. Background

To demonstrate identifiability of natural direct and indirect effects, Imai and others (2010b) assumed the following, with Inline graphic denoting a vector of baseline covariates (unaffected by Inline graphic):

2. (2.1a)
2. (2.1b)

Assumption (2.1a) implies that the exposure Inline graphic is random (or that exposure groups are exchangeable) conditional on Inline graphic, while Assumption (2.1b) requires that there are no unobserved confounders (whether baseline or post exposure) of the Inline graphicInline graphic relationship conditioning on Inline graphic and Inline graphic. The two assumptions together are referred to as “sequential ignorability”.

Imai and others (2010b) showed that under sequential ignorability (along with a standard consistency assumption),

2. (2.2)

The right hand side of (2.2) contains only association parameters and is thus estimable under appropriate association models for Inline graphic and Inline graphic. Rather than integrate over an assumed distribution for Inline graphic as indicated in (2.2), a convenient alternative approach is to sum over the empirical distribution of Inline graphic for the sample or other chosen reference group (e.g., subsample of exposed subjects). Equation (2.2), along with (1.1), yields estimable expressions for the natural direct and indirect effects. These expressions have been referred by Pearl (2012) as the mediation formula, although we tend to refer to Equation (2.2) itself as such.

The assumption (2.1b) can be relaxed slightly for natural direct and indirect effects defined (as in (1.1)) in terms of expected potential outcomes (2.2). In particular, it suffices to assume an “independence in expectation” version of (2.1b). In the present paper, we consider an alternative assumption which also allows identification of the natural direct and indirect effects in combination with randomization of Inline graphic (2.1a). This assumption, which we refer to as “mediator comparability”, is written as,

2. (2.3)

for all Inline graphic, Inline graphic, Inline graphic. This assumption says that the mean potential outcome of Inline graphic for a given exposure level Inline graphic and mediator value Inline graphic is the same for subgroups of the two observed exposure groups (Inline graphic, 1) that are observed at a common level Inline graphic of the mediator (and covariate values, Inline graphic), this holding for each Inline graphic, Inline graphic. As with sequential ignorability, the mediator comparability assumption is not testable from the data. Identifiability of natural direct and indirect effects using the mediator comparability assumption is shown in Appendix A of supplementary material available at Biostatistics online.

In a parametric approach to causal mediation analysis, the above potential outcome assumptions are used in conjunction with parametric association models for the outcome variables. For example, we will consider the following generalized structural equations model (GSEM),

2. (2.4a)
2. (2.4b)

where Inline graphic, Inline graphic are invertible link functions, and the Inline graphic and Inline graphic are unknown regression parameters, with Inline graphic, Inline graphic representing parameter vectors compatible with Inline graphic. In our approach to estimation, which is based on maximum likelihood, we need to further specify conditional distributions for each response variable Inline graphic. We may also need to estimate non-regression parameters, for example, variances for normally distributed outcomes. In the present paper, we will assume response variables to follow exponential family distributions.

3. New sensitivity analysis methods

Our proposed methods, like that of Imai and others (2010b), assume randomization (or conditional exchangeability) of Inline graphic. The methods thus focus on the departure from the second sequential ignorability condition (i.e., (2.1b) or (2.3)), which is, roughly speaking, the assumption of no unobserved Inline graphicInline graphic confounders. Further, also like Imai and others (2010b), these approaches assume that any Inline graphicInline graphic confounders are not affected by exposure. Note that such an effect would imply a causal model with a three-link path, thus departing from the present (two-link) mediation set-up. The former situation is considered by Imai and Yamamoto (2013) focusing on the linear model case; see also Albert and Nelson (2011) and Lange and others (2014).

In addition, we note that while our focus is on the estimands Inline graphic and Inline graphic, the approaches for Inline graphic and Inline graphic are completely analogous. The alternative versions of the direct and indirect effects are generally equal only in the case of a linear (identity link) model for Inline graphic, and in the absence of an Inline graphic interaction. For other generalized linear models, they will tend to be very close (as in the case of our data example) when there is no interaction.

3.1. Copula model approach

The first approach uses a Gaussian copula model for the joint distribution of potential outcomes of Inline graphic and Inline graphic. Specifically, we suppose the existence of continuous latent variables, Inline graphic and Inline graphic, corresponding to the (partially) observable potential outcomes, Inline graphic and Inline graphic, respectively, for each Inline graphic, Inline graphic, and Inline graphic. We assume that Inline graphic and Inline graphic are bivariate normally distributed with correlation Inline graphic (assumed constant over Inline graphic) conditional on Inline graphic and Inline graphic.

We make the standard consistency assumption that Inline graphic if Inline graphic, and Inline graphic if Inline graphic and Inline graphic, where Inline graphic is the observed level of exposure Inline graphic for individual Inline graphic (similarly, for Inline graphic, Inline graphic). For discrete Inline graphic (taking values Inline graphic), we further say that Inline graphic is “compatible” with Inline graphic if Inline graphic, Inline graphic, where Inline graphic is the cumulative distribution function of Inline graphic. From here on in, using the same symbol, one with an asterisk (the latent version), one without (the discrete, observed version), such as Inline graphic and Inline graphic, will imply that the two values are compatible.

Given the assumed bivariate normal distribution for Inline graphic and Inline graphic we have the regression relationship,

3.1. (3.1)

where Inline graphic, Inline graphic denote the variances of Inline graphic and Inline graphic, respectively, all expectations (variances, and so on) in (3.1) being conditional on Inline graphic. From here on in, we will tend to leave conditioning on Inline graphic and Inline graphic out of the notation unless needed for clarity; note that Inline graphic can be dropped anyway due to the assumption of randomized Inline graphic (2.1a). We will assume that Inline graphic is constant over Inline graphic and write Inline graphic. We also write Inline graphic and Inline graphic for brevity.

Our goal is the estimation of Inline graphic for all Inline graphic, Inline graphic, which will allow estimation of the natural direct and indirect effects defined earlier. The case of Inline graphic requires particular attention, and in fact is where the sequential ignorability assumption (i.e., (2.1b) or (2.3)) is ordinarily needed. For concreteness, and applicability to our data example, we focus on the estimand Inline graphic, needed for Inline graphic and Inline graphic. The approach for the estimand Inline graphic, which is involved in the expressions for Inline graphic and Inline graphic, is derived in a parallel manner by simply switching the values for Inline graphic and Inline graphic. We indicate maximum likelihood estimates (i.e., obtained under (2.4)) using the standard hat notation.

When Inline graphic and Inline graphic are continuous and assumed to be bivariate normally distributed then (3.1) holds for Inline graphic, Inline graphic in place of Inline graphic, Inline graphic, and Inline graphic in place of Inline graphic. In the special case where Inline graphic and Inline graphic follow additive linear models (i.e., (2.4a,b) with identity link functions), it can then be shown that Inline graphic and Inline graphic where Inline graphic. Under homogeneous variance assumptions, allowing estimation of Inline graphic and Inline graphic, the correlation parameter, Inline graphic, may be used as the sensitivity parameter. The above and more general expressions for Inline graphic and Inline graphic in the continuous Inline graphic case are derived in Appendix B of supplementary material available at Biostatistics online.

For discrete Inline graphic, we assume the regression relationship (3.1) for the latent final response variable Inline graphic and mediator Inline graphic (also latent if Inline graphic is discrete). In this case, constraining the marginal distribution of Inline graphic to be standard normal, the right side of (3.1) can be reduced to give,

3.1. (3.2)

where Inline graphic. Given our assumptions, we have Inline graphic).

We can then estimate Inline graphic for given Inline graphic and Inline graphic using Monte Carlo simulation. Here we present the algorithm for the case where Inline graphic is continuous (and assumed normally distributed), in which case we use Inline graphic in place of Inline graphic (and Inline graphic in place of Inline graphic) in the above expressions. Further details, including the case where Inline graphic is discrete, are given in Appendix C of supplementary material available at Biostatistics online. The algorithm is implemented for selected values for Inline graphic and Inline graphic. For each person Inline graphic in the chosen reference group, do the following for replications Inline graphic,

  1. Draw Inline graphic from Inline graphic. Let Inline graphic.

  2. Supposing the response for a given Inline graphic is Inline graphic, let Inline graphic, where Inline graphic, Inline graphic and Inline graphic

Then, compute Inline graphic and Inline graphic. Note that Inline graphic (in the second step above) represents the estimated conditional (cumulative) probability of Inline graphic given Inline graphic, and Inline graphic represents the estimated conditional expected value of Inline graphic given Inline graphic, both conditional on Inline graphic.

The natural direct and indirect effects, using the above estimated expected value, may be recomputed for varying values for Inline graphic and Inline graphic thus providing a sensitivity analysis. Confidence intervals for mediation effects for given Inline graphic and Inline graphic can be computed using the percentile bootstrap resampling method.

3.2. Hybrid model approach

In our second approach, we consider a hybrid causal-observational model for Inline graphic,

3.2. (3.3)

where the right-hand side represents some specified parametric model. A key feature of this model is that it distinguishes different roles of the exposure, namely (i) a causal factor directly affecting Inline graphic (denoted as Inline graphic) and (ii) the observed exposure group or cohort Inline graphic representing selection bias. It is possible to further generalize this model by distinguishing the exposure term in Inline graphic, but this will not be pursued here.

For our application, we consider the following additive hybrid model

3.2. (3.4)

where Inline graphic and Inline graphic represent causal and cohort (selection bias) effects, respectively. Mediator comparability is obtained when the coefficients of all terms involving Inline graphic in the hybrid model (e.g., Inline graphic in (3.4)) are equal to 0. Note that the hybrid model (3.4) is distinct from previous causal models that may bear a resemblance to it, for example, Vansteelandt and others (2012) and Lange and others (2012). In the causal models in the latter papers, the dual exposure indicators both have a causal interpretation, as needed to define natural direct and indirect effects, whereas, in the hybrid model, they refer to different roles, i.e., causal versus observational, for the exposure variable.

We also consider the following useful reparameterization of the hybrid model (3.4):

3.2. (3.5)

where Inline graphic and Inline graphic. As in model (2.4a), Inline graphic in (3.5) represents an association parameter, namely the effect of Inline graphic on Inline graphic, conditional on Inline graphic and Inline graphic, on the scale corresponding to the link function, Inline graphic. The sensitivity parameter, Inline graphic, represents the (nonidentifiable) proportion of the association effect due to the cohort effect, and, consequently, Inline graphic is the proportion of the association effect due to the causal effect of exposure. As the causal and cohort effects, Inline graphic and Inline graphic in (3.4), need not be in the same direction, it is possible for the “proportion”, Inline graphic, to be negative or Inline graphic. Note that this reparameterization is only possible in the case of Inline graphic (i.e., a non-zero observed effect of exposure).

The case of Inline graphic in model (3.5) implies that Inline graphic is (mean) independent of Inline graphic given Inline graphic and Inline graphic, and thus implies mediator comparability (2.3). In contrast, Inline graphic represents a departure from mediator comparability, and thus from sequential ignorability. Note that, as seen in (3.5), Inline graphic does not affect Inline graphic (nor, consequently, Inline graphic) when Inline graphic. This is as expected because then, under consistency, Inline graphic, which is estimable under randomization of Inline graphic (2.1a), that is, without having to assume (2.1b) or (2.3). In Appendix D of supplementary material available at Biostatistics online, we provide a demonstration of the identifiability of the natural direct and indirect effects under the hybrid model with specified Inline graphic. The effect of departures from sequential ignorability on estimates (and percentile bootstrap confidence intervals) of the natural direct and indirect effects can thus be studied by varying Inline graphic.

Some comments are in order on the specification of Inline graphic. The direct specification of Inline graphic may be difficult as Inline graphic represents a measure of “collider-stratification bias” (see, e.g., Greenland, 2003) whose magnitude, and even direction, may be non-intuitive to many researchers. Fortunately, this bias, or an approximation, may be derived as a function of the effects of (possibly multiple) unobserved Inline graphicInline graphic confounders on Inline graphic and Inline graphic. Detailed guidelines for the elicitation of plausible values for Inline graphic are provided in Appendix E of supplementary material available at Biostatistics online.

4. Application to dental data

We apply the two proposed sensitivity analysis methods to data from a dental caries study (Nelson and others, 2010). The study examined a cohort of very low birth weight (VLBW, with and without bronchopulmonary dysplasia, BPD) and a matched group of normal birth weight (NBW) children followed from birth through adolescence in an earlier study (Singer and others, 1997). This earlier study recorded parent information, including education, knowledge, and stress factors, from questionnaires given at the child's birth, ages 3 and 8. The children who continued on to the dental study underwent a dental clinical exam at around age 14. This exam provided, among other information, the number of decayed, missing, and filled teeth (DMFT) and the oral hygiene index score (OHI), an indicator of effective oral hygiene behavior.

Previous research considered the effect of mother education on child's dental outcomes at adolescence (Nelson and others, 2010). Nelson and others (2012) found that adolescents whose mothers had less than a high school education when the child was age 3 (compared with those whose mother had at least a high school education) had a higher mean DMFT. In the present paper, we seek to assess the extent to which this effect is mediated through OHI.

Our specific model variables included a binary exposure variable, “MomEd” (Inline graphic for mother's education at or below high school level, “low”, 0 otherwise, “high”), a binary final outcome, “DMFTD” (Inline graphic if the child had DMFT Inline graphic at the age 14 exam, 0, otherwise), and a roughly continuous mediator, OHI (ranging from 0 to 3, where higher is worse). We assumed that the model variables are causally related as indicated in the graph in Figure 1. In an initial mediation analysis, we assumed sequential ignorability controlling for the (child) variables: sex, race (African American versus other), and birth status (two indicator variables for the three categories: VLBW plus BPD, VLBW alone, and NBW).

Fig. 1.

Fig. 1.

Mediation model for dental data (confounding variables not shown).

To describe the links in the mediation model, we assumed a special case of the GSEM (2.4a,b)) comprised of a logistic regression model for the binary outcome variable and a linear regression model for the continuous mediator as follows:

4. (4.1a)
4. (4.1b)

where Inline graphic and Inline graphic is an unknown variance parameter. As an alternative, we considered the above set of models but where (4.1a) is extended by adding an exposure by mediator (Inline graphic) interaction term. The component models ((4.1a), without and with interaction, and (4.2b)) may be fit separately using standard likelihood methods. The estimated regression parameters for both the additive and interaction GSEM (the model for Inline graphic is the same in both cases) are given in Table 1.

Table 1.

Parameter estimates(standard errors) from Inline graphic fit of model (4.1) of Inline graphic (without and with Inline graphic x Inline graphic interaction) and Inline graphic outcomes from dental data

DMFTD
Covariates Additive Interaction OHI
Intercept Inline graphic (0.45) Inline graphic (0.52) 0.83 (0.15)
MomEd 0.53 (0.33) 0.92 (0.61) 0.29 (0.12)
OHI 0.66 (0.22) 0.86 (0.35)
MomEd Inline graphic OHI Inline graphic (0.44)
VLBW vs NBW 0.21 (0.42) 0.20 (0.43) Inline graphic (0.15)
BPD vs NBW Inline graphic0.46 (0.39) Inline graphic (0.39) 0.19 (0.14)
Race 0.57 (0.33) 0.55 (0.33) 0.29 (0.12)
Sex Inline graphic0.23 (0.33) Inline graphic (0.34) 0.04 (0.12)

For each GSEM, we then estimated the natural direct effect, Inline graphic, and the natural indirect effect, Inline graphic, using the mediation formula (2.2) plugging in parameter estimates obtained under model (4.1a,b) and using the VLBW subsample as the reference group. We present results from the additive model; results from the interaction model (not shown) are nearly the same, indicating that our mediation effect inferences are not changed substantially by inclusion of a MomEd by OHI (Inline graphic) interaction effect in the DMFTD (Inline graphic) model.

Using the mediation formula under the assumption of sequential ignorability, we obtained an estimated natural direct effect (bootstrap Inline graphic confidence interval) of Inline graphic (Inline graphic, 0.27), indicating that the probability of a DMFT would increase by an estimated 0.12 if MomEd (Inline graphic) were changed from “high” to “low” but the OHI (Inline graphic) fixed as if the mother had “low” education. The estimated natural indirect effect was Inline graphic (0.01, 0.09), indicating that the probability of a DMFT would increase by an estimated 0.04 if MomEd was changed from “high” to “low” in a way that naturally affected OHI without affecting DMFTD through any other path. We also note that the estimated total effect (95Inline graphic confidence interval) was Inline graphic (0.003, 0.31) and the indirect effect (mediation) proportion an estimated 0.26. As indicated by the above confidence intervals, the indirect as well as the total (but not the direct) effect is statistically significant at the nominal 0.05 Inline graphic-level.

The above estimates and confidence intervals rely on the assumption of sequential ignorability (or (2.1a) and mediator comparability (2.3)). In the context of the dental data, mediator comparability implies that if set to a common exposure and mediator level, subgroups of the two observed exposure groups (with high versus low mother education) that are observed at a common observed mediator (oral hygiene index) level (and the same baseline control variable values) would have the same probability of dental caries. This assumption is unlikely for the dental data because there are apt to be unobserved confounders of the relationship between the oral hygiene index and dental caries. It is of interest to examine how the estimates of the natural direct and indirect effects would change under violations of the mediator comparability assumption.

We conducted sensitivity analyses for the dental data using the two new approaches described in Section 3. The results of these analyses, including estimated (natural) direct and indirect effects and 95Inline graphic confidence intervals, are presented in Figure 2. Scientific considerations may allow us to narrow the range of plausible values for the sensitivity parameters, and thus obtain sharper conclusions about mediation effects. The elicitation of plausible sensitivity parameter values for the present data example is described in Appendix E of supplementary material available at Biostatistics online.

Fig. 2.

Fig. 2.

Sensitivity analysis for dental data. Maximum likelihood estimates of direct (top) and indirect (bottom) effects versus sensitivity parameters from copula model (left) and hybrid model (right). Solid Inline graphic, dotted Inline graphic confidence interval bounds. For copula method, the upper bounds are for Inline graphic (direct), Inline graphic (indirect); the lower bounds are for Inline graphic (direct), Inline graphic (indirect).

In the context of our data example, the hybrid model sensitivity parameter, Inline graphic, is interpreted as the proportion of the association parameter, Inline graphic, that is due to selection bias, the remaining proportion being due to a true causal effect of MomEd. In the hybrid model (3.5), as in the corresponding association model (4.1a), Inline graphic is interpreted as the log DMFTD odds ratio for low versus high MomEd groups conditional on OHI and the included baseline covariates.

For the dental data, Inline graphic was estimated to be 0.53, providing an estimated odds ratio for low versus high MomEd of Inline graphic. Our elicitation suggested a range of plausible values for Inline graphic of Inline graphic to 0, and a best (pessimistic) guess value of Inline graphic. The value of Inline graphic implies that the selection bias portion of the association effect is Inline graphic, while the portion due to the causal effect of MomEd on DMFTD is Inline graphic. Note that this decomposition can be expressed on the odds ratio scale: Inline graphic or Inline graphic (2.09). For the copula model, our elicitation (Appendix E of supplementary material available at Biostatistics online) provided plausible (pessimistic) values of Inline graphic and Inline graphic, and a plausible range for Inline graphic of [0.1, 0.7] and for Inline graphic of [0.1, 0.75].

As indicated in the hybrid model curves in Figure 2, the elicited value of Inline graphic provides an estimate (95Inline graphic CI) for Inline graphic of 0.16 (Inline graphic, 0.36) and an estimate (95Inline graphic CI) for Inline graphic of Inline graphic. The plausible range for Inline graphic of [Inline graphic, 0] provides a range for the estimate of Inline graphic of 0.12 to 0.23 and for the estimate of Inline graphic of Inline graphic to 0.04. Figure 2 further shows that the 95Inline graphic confidence intervals for Inline graphic fail to exclude 0 for values of Inline graphic; also, for Inline graphic, the estimate for Inline graphic is no longer positive. For Inline graphic, neither the sign nor lack of statistical significance changes over the plausible range for Inline graphic.

From the copula model, the pessimistic scenario of Inline graphic, Inline graphic results in an estimate (95Inline graphic CI) for Inline graphic of 0.2 (0.05, 0.37) and for Inline graphic of Inline graphic. As noted in Appendix E of supplementary material available at Biostatistics online, this scenario may be viewed as more pessimistic than the corresponding values obtained for the hybrid model. For the (lower bound) plausible values of Inline graphic, we obtain an estimate (95Inline graphic CI) for Inline graphic of 0.13 (Inline graphic, 0.28) and for Inline graphic of 0.028 (Inline graphic, 0.069). From these results, and as indicated in Figure 2, the estimate of Inline graphic is no longer statistically significant (and goes from positive to negative) as Inline graphic and Inline graphic increase over their plausible ranges. Estimates for Inline graphic increase as the Inline graphic's increase and are statistically significant in part of the plausible range (as seen for the “pessimistic” scenario discussed above). These results are similar (apart from some differences in confidence interval coverage) to those obtained from the hybrid model approach.

In regard to other features of the graphs, we note that the degenerate confidence interval at Inline graphic in the hybrid model approach is due to the fact that this value implies no natural direct effect of Inline graphic on Inline graphic. Another difference between the two methods is that the confidence intervals from the copula model are generally more symmetric, reflecting the assumption of an underlying bivariate normal distribution for Inline graphic and Inline graphic.

5. Discussion

We have presented two new approaches to sensitivity analysis for estimation of natural direct and indirect effects via the parametric mediation formula. Both approaches are flexible in allowing different variable types (i.e., discrete or continuous) for the outcome and mediator, each response variable following a generalized linear model. The approaches involve models (the copula and hybrid, respectively) providing particular departures from sequential ignorability, specifically (2.1b) or the mediator comparability assumption (2.3). Although our focus was on natural direct and indirect effects on a difference scale (as in (1.1)), because both sensitivity analysis methods provide estimates of expected potential outcomes, it would be straightforward to extend the method to inference on mediation effects defined on alternative scales, for example, relative risk or odds ratio for a binary Inline graphic. Of course, the relationship between the sensitivity analysis parameters presented here and the mediation effects may be different depending on the scale used for the latter.

The two new sensitivity analysis methods have been encoded in SAS macros, which are available for downloading from http://epbiwww.case.edu/index.php/people/faculty/53-albert. These macros include alternative model and distributional options then those discussed in this paper. Additional examples, assuming normal and negative binomial distributions for Inline graphic, are given in Appendix G of supplementary material available at Biostatistics online.

A difficulty of previous sensitivity analysis models that involve a hypothetical unobserved confounder is that the resulting model is generally incompatible with the model without the confounder (Lin and others, 1998). This problem is avoided in the proposed approaches as the corresponding sensitivity analysis models do not directly involve an unobserved confounder. Rather, in each of the new approaches the sensitivity analysis and association models are compatible by construction; for example, in the hybrid model the latter is obtained from the former when Inline graphic. However, unobserved confounders can be considered, and play a useful role, in the elicitation of plausible sensitivity parameter values for the proposed methods as described in Appendix E of supplementary material available at Biostatistics online.

Between our two proposed methods, the copula model approach has an advantage of involving bounded (correlation) parameters. In addition, due to its underlying bivariate normal assumption and consequent linear structure, it appears to have more favorable inference properties, in particular, relatively narrow and symmetric confidence intervals across the range of sensitivity parameter values. The hybrid model approach, on the other hand, has the advantage of greater computational simplicity and a single sensitivity parameter (though more complex hybrid models, for example, involving an Inline graphicInline graphic interaction, may be possible).

One way to reduce the sensitivity parameter dimensionality, and allow a simpler graphical presentation, for the copula model is to assume Inline graphic (analogously, Inline graphic for the estimation of Inline graphic and Inline graphic). A more flexible approach would be to assume a functional relationship between the two correlation parameters, for example, Inline graphic, where Inline graphic is a specified constant.

As elaborated in Appendix E of supplementary material available at Biostatistics online, carrying out both methods has the advantage that each can be used to calibrate and check the other. Although more work is needed in systematic approaches to combining sensitivity analyses, the present paper suggests the great potential for multiple models to provide a more refined and complete sensitivity analysis.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

Support for this research was provided in part by the National Institute of Dental and Craniofacial Research, National Institutes of Health (R01DE022674 to J.A.).

Supplementary Material

Supplementary Data

Acknowledgement

The authors thank the Associate Editor and referees for insightful and constructive comments that helped greatly in improving the paper. The authors also thank Cuiyu Geng for assistance in the construction of graphs and preparation of the paper, Dr Suchitra Nelson for helpful discussion and for providing data from her study of dental outcomes in VLBW and NBW children [NIDCR/NIH research grant number R21-DE16469], and Dr Lynn Singer for providing access to data from her cohort study of VLBW and NBW adolescents, supported by the Maternal and Child Health Program, Health Resources and Services Administration, Department of Health and Human Services (grant numbers MC-390592, MC-00127, and MC-00334). Conflict of Interest: None declared.

References

  1. Albert J. M. (2008). Mediation analysis via potential outcomes models. Statistics in Medicine 27, 1282–1304. [DOI] [PubMed] [Google Scholar]
  2. Albert J. M., Nelson S. (2011). Generalized causal mediation analysis. Biometrics 67, 1028–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baron R. M., Kenny D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51, 1173–1181. [DOI] [PubMed] [Google Scholar]
  4. Cornfield J., Haenszel W., Hammond E. C., Lilienfeld A. M., Shimkin M. B., Wynder E. L. (1959). Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22, 173–203. [PubMed] [Google Scholar]
  5. Greenland (2003). Quantifying biases in causal models: classical confounding vs. collider-stratification bias. Epidemiology 14, 300–306. [PubMed] [Google Scholar]
  6. Hafeman D. M. (2011). Confounding of indirect effects: a sensitivity analysis exploring the range of bias due to a cause common to both the mediator and the outcome. American Journal of Epidemiology 174, 710–717. [DOI] [PubMed] [Google Scholar]
  7. Imai K., Keele L., Tingley J. (2010a). A general approach to causal mediation analysis. Psychological Methods 15, 309–334. [DOI] [PubMed] [Google Scholar]
  8. Imai K., Keele L., Yamamoto T. (2010b). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science 25, 51–71. [Google Scholar]
  9. Imai K., Yamamoto T. (2013). Identification and sensitivity analysis for multiple causal mechanisms: revisiting evidence from framing experiments. Political Analysis 21, 141–171. [Google Scholar]
  10. Lange T., Rasmussen M., Thygesen L. C. (2014). Assessing natural direct and indirect effects through multiple pathways. American Journal of Epidemiology 179, 513–518. [DOI] [PubMed] [Google Scholar]
  11. Lange T., Vansteelandt S., Bekaert M. (2012). A simple unified approach for estimating natural direct and indirect effects. American Journal of Epidemiology 176, 190–195. [DOI] [PubMed] [Google Scholar]
  12. Lin D. Y., Psaty B. M., Kronmal R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54, 948–963. [PubMed] [Google Scholar]
  13. Nelson S., Albert J. M., Lombardi G., Wishnek S., Asaad G., Kirchner H. L., Singer L. T. (2010). Dental caries and enamel defects in very low birth weight adolescents. Caries Research 44, 509–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Nelson S., Lee W., Albert J. M., Singer L. T. (2012). Early maternal psychosocial factors are predictors for adolescent caries. Journal of Dental Research 44, 509–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Pearl J. (2012). The causal mediation formula a guide to the assessment of pathways and mechanisms. Prevention Science 13, 426–436. [DOI] [PubMed] [Google Scholar]
  16. Robins J. M., Greenland S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology 3, 143–155. [DOI] [PubMed] [Google Scholar]
  17. Rosenbaum P. R. (2002). Observational Studies, 2nd edition New York: Springer-Verlag. [Google Scholar]
  18. Singer L. T., Yamashita T. S., Lilien L., Collin M., Baley J. (1997). A longitudinal study of infants with bronchopulmonary dysplasia and very low birthweight. Pediatrics 100, 987–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Tchetgen Tchetgen E. J., Shpitser I. (2012). Semiparametric theory for causal mediation anlaysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics 40, 1816–1845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. VanderWeele T. J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology 21, 540–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. VanderWeele T. J., Chiba Y. (2014). Sensitivity analysis for direct and indirect effects in the presence of exposure-induced mediator-outcome confounders. Epidemiology, Biostatistics, and Public Health e9027-1–e9027-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Vansteelandt S., Bekaert M., Lange T. (2012). Imputation strategies for the estimation of natural direct and indirect effects. Epidemiologic Methods 1(1), Article 7. [Google Scholar]
  23. Wang W., Albert J. M. (2012). Estimation of mediation effects for zero-inflated regression models. Statistics in Medicine 31, 3118–3132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Wang W., Nelson S., Albert J. M. (2013). Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula. Statistics in Medicine 32, 4211–4228. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES