Abstract
Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. In this paper, we identify the instrumental variable-direct effect of the exposure on the outcome not through the mediator, using randomization of the instrument. We call this estimand the complier stochastic direct effect (CSDE). To our knowledge, such an estimand has not previously been considered or estimated. We propose and evaluate several estimators for the CSDE: a ratio of inverse-probability of treatment-weighted estimators (IPTW), a ratio of estimating equation estimators (EE), a ratio of targeted minimum loss-based estimators (TMLE), and a TMLE that targets the CSDE directly. These estimators are applicable for a variety of study designs, including randomized encouragement trials, like the Moving to Opportunity housing voucher experiment we consider as an illustrative example, treatment discontinuities, and Mendelian randomization. We found the IPTW estimator to be the most sensitive to finite sample bias, resulting in bias of over 40% even when all models were correctly specified in a sample size of N=100. In contrast, the EE estimator and TMLE that targets the CSDE directly were far less sensitive. The EE and TML estimators also have advantages in terms of efficiency and reduced reliance on correct parametric model specification.
Keywords: Mediation, targeted minimum loss-based estimation, instrumental variables
1. Introduction
Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. It can be used to decompose the total effect into its path-specific effects—usually categorized as direct effects, meaning the effect of the exposure on the outcome not operating through a mediator, and indirect effects, meaning the path from the exposure to the mediator to the outcome (Ogburn, 2012). For example, such decomposition has led to understanding of which locations in the brain are responsible for transmitting pain (Chen et al., 2015) and mechanisms underlying associations between early life body size and breast cancer (Rice et al., 2016). Such scenarios reflect observed data , where W are covariates, Z is exposure, M is mediator, and Y is outcome.
Less research has been devoted to estimating path-specific effects where there is an instrument for the exposure, reflecting observed data , where A is an instrument for the overall effect of Z by only affecting M and Y through Z, satisfying the econometric criteria for an instrument (Joffe et al., 2008). In this paper, we consider such a data structure and are concerned with estimating the path-specific direct effect of Z on Y not through M using instrument A to address observed and unobserved confounding of the exposure-outcome relationship. Such an estimand would be an instrumental variable (IV)-direct effect or complier direct effect. To our knowledge, such an estimand has not previously been considered or estimated, though Frölich and Huber (2017) considered a similar complier direct effect with separate instruments for Z and M, A1 and A2, that themselves are conditionally independent, . These authors demonstrated how one can identify IV mediation estimands for the direct effect of Z on Y not through M and for the indirect effect of Z on Y through M using these two distinct instruments. Relatedly, Joffe et al. (2008) considered observed data but where Z and M are sequential exposures (Joffe et al., 2008) with A being an instrument for each. In this case, A affects Z and M but not Y. The authors were concerned with estimating the overall effect of Z, because it could no longer be identified using standard IV approaches.
Recent work considering the same observed data structure has identified and estimated stochastic direct and indirect effects of the instrument on the outcome not operating through a mediator in the direct effects case and operating through a mediator in the indirect effects case (Rudolph et al., 2017b), treating Z as a time-varying confounder (Didelez et al., 2006; VanderWeele and Tchetgen Tchetgen, 2017; Zheng and van der Laan, 2017). Other work has considered another instrumental variable observed data structure, , where Z and W interact together to form an instrument for M relaxing the sequential ignorability assumption (Ten Have et al., 2007; Dunn and Bentall, 2007; Albert, 2008; Small, 2011). However, again, to our knowledge, there has been no research on the identification or estimation of IV mediation estimands in the scenario where we have observed data , where A is an instrument for the total effect of exposure Z which in turn may affect mediator M and outcome Y and where A adheres to the exclusion restriction assumption of instruments, so does not directly affect either M or Y.
We address this research gap by identifying an IV causal quantity of the direct effect of the exposure on the outcome (the effect of Z on Y not through M), using randomization of the instrument, A. We call this estimand the complier stochastic direct effect (CSDE). We propose and evaluate several estimators for this estimand: 1) an inverse-probability-of-treatment weighted estimator (IPTW), 2) an estimating equation estimator (EE), and 3) several targeted minimum loss-based estimators (TMLE). Both the EE and TML estimators are robust to several combinations of model misspecifications, the details of which are described in a later section. In contrast, the IPTW estimator may not be consistent if the instrument or mediation models are incorrectly specified.
The paper is organized as follows. In Section 2, we introduce notation and the structural causal model representing our data structure. In Section 3, we define the causal quantity of interest, the CSDE, and establish its identification from the data distribution under specified assumptions. Section 4 details the IPTW, EE, and TML estimators. Section 5 presents the simulation study that demonstrates each estimator’s consistency, efficiency, and robustness properties in finite samples. In Section 6, we apply these estimators to a real-world data where we estimate the direct effect of using a Section 8 housing voucher to move out of public housing on subsequent adolescent substance use outcomes not mediated by parental mental health, employment, and parent-child closeness using randomization of housing voucher receipt as the instrument. Section 7 concludes.
2. Notation and Structural Causal Model
We observe data for each of n individuals, where we assume O1,…,On are i.i.d. for the true, unknown data distribution, P0 on O. The subscript 0 denotes values under this true, unknown distribution P0. P is any probability distribution (including P0) in statistical model, , which is the set of distributions for which our estimand is identifiable and is discussed further below and in Section 3. Values under a particular P are not given subscripts. The subscript n denotes estimates.
W is a vector of exogenous baseline covariates, , where UW is unobserved exogenous error on W(Pearl, 2009). We consider the statistical model, , where A is a binary instrument (with the attendant assumptions of instrumental variables (Angrist et al., 1996)) of binary exposure Z, with and , where again, UA and UZ are unobserved exogenous errors. M is a binary mediator, with , and Y is an outcome, , where UM and UY represent exogenous errors. Adhering to the constraints of our statistical model in which A is an instrument, Y does not depend on A conditional on Z, and M does not depend on A conditional on Z This is equivalent to the exclusion restriction assumption of instruments (Angrist et al., 1996). However, the estimand and estimation approaches we consider also work in the scenario where M may depend on A conditional on Z: . We describe differences in the estimator details for such a scenario in the Web appendix. In this alternative scenario, A is not an instrument for the total effect of Z on Y, and the estimation approach suggested by Joffe et al. (2008) would also be appropriate.
The density of the true distribution P0 of O, can be factorized as
We note that an identification assumption of monotonicity of A on Z detailed in the next section, places an additional constraint on the statistical model.
3. Complier Stochastic Direct Effect Estimand and Identification
Our causal quantity of interest is the CSDE, which we define as
(1) |
where for each , Za indicates the potential exposure that would be observed if instrument A = a were assigned and compliers are those for whom indicates the potential outcome that would be observed if exposure Z = z were assigned and under a given stochastic intervention on the mediator , where the user sets M equal to m with probability . We note that this stochastic intervention marginalizes over Z and also note that can be set equal to the true distribution, , or a data-dependent version estimated from the observed data, .
The statistical parameter, , is a mapping that maps a probability distribution P in our statistical model to a real number R.
, where is the statistical parameter for the stochastic direct effect (SDE) of A on Y given by
(2) |
and where is the statistical parameter for the first-stage (FS) effect of A on Z given by
(3) |
The causal quantity is identified by the statistical parameter ,
(4) |
under the assumptions enumerated below. The proof for the identification result is in the Web appendix.
The assumptions needed for identifiability are:
which is the exclusion restriction assumption, stating that the instrument A only affects the outcome Y through the exposure Z where Za is the potential exposure that would be observed if instrument A = a were assigned, where is the potential outcome that would be observed if A = a and Z= Za were assigned, and under stochastic intervention where M to set to m with probability is the potential outcome that would be observed if Z = Za were assigned and under stochastic intervention
Sequential randomization:
Z1 − Z0 ≥ 0, which is the monotonicity assumption, meaning that the instrument A cannot decrease exposure,
Positivity assumptions: for all and a.e. which can also be written, for all m in the support of and
which means that the average effect of the instrument on the exposure does not equal 0.
4. Estimators
We now describe several estimators of a data-dependent version of the CSDE parameter that assumes a known stochastic intervention on M estimated from the observed data, which we denote In the first subsection, we describe several estimators that estimate by estimating the numerator and denominator separately: a ratio of IPTW estimators, a ratio of EE estimators, and two ratios of TMLEs. In the second subsection, we describe a TMLE that targets the CSDE ratio itself, thus making a compatible plug-in estimator.
4.1. Estimators that estimate the numerator and denominator separately
4.1.1. Inverse Probability of Treatment Weighted Estimator
We first describe how to compute by using an IPTW estimator of the numerator, , and denominator, , separately. The R code to program this estimator is included in the supplementary Web appendix.
The inverse probability of treatment weights for estimating are
(5) |
Let be estimators of respectively. can be estimated by predicted probabilities from a logistic regression model of A on W One could use machine learning in model fitting but we will describe estimation in terms of parametric model fitting for simplicity. can be estimated by predicted probabilities from a logistic regression model of M on W, Z. is treated as known, estimated from the observed data, marginalizing out (VanderWeele and Tchetgen Tchetgen, 2017). The IPTW estimate of is the empirical mean of outcome, Y, weighted by an estimate of IPTWSDE.
The inverse probability of treatment weights for estimating is estimated as above. The IPTW estimate of is the empirical mean of Z, weighted by an estimate of
The ratio of these two IPTW estimates gives the IPTW estimate of parameter . The associated variance can be estimated as the sample variance of the estimator’s influence curve (IC), which is
(6) |
and where
(7) |
and where
(8) |
We note that the above is the influence curve using true If we use parametric models and maximum likelihood estimates of then the sample variance of the above influence curve will be conservative.
4.1.2. Estimating Equation Estimator
We now describe how to estimate the by using an EE estimator of the numerator, , and denominator, , separately. The efficient influence curve we detail for is novel in that it respects the constraints on our statistical model—namely the exclusion restriction and monotonicity assumptions necessary for identification. This EE estimator uses the same estimator of the conditional distribution of Z given A and W for both the numerator and denominator. The R code to program this estimator is included in the supplementary Web appendix.
The efficient influence curve (EIC) for , is given by
(9) |
where P represents and where
(10) |
and where
(11) |
The EE estimator can be calculated in the following steps:
We first solve DSDE to obtain the EE estimate of . We calculate the first component of DSDE as follows, noting that this first component is specifically formulated to respect the exclusion restriction. Let are defined and their estimation is described in Section 4.1.1. Recall that is treated as known, estimated from the observed data as described in Section 4.1.1. can be estimated by predicted values of Y from a regression of Y on W Z and M. One could use machine learning in model fitting but we will describe estimation in terms of parametric model fitting for simplicity. can be written where can be estimated as described above, can be estimated from a constrained logistic regression model of Z on A and W to respect the monotonicity assumption, and where an estimate of is obtained by marginalizing out
We now calculate the second component of DSDE. To estimate we integrate out M using the data-dependent stochastic intervention on M evaluated at
Finally, we calculate the third component of DSDE. To estimate we integrate out Z from
The estimate of is given by solving DSDE.
The estimate of is given by solving DFS, where each component is calculated as described above.
The ratio of these two estimates gives the EE estimate of .
The associated variance can be estimated as the sample variance of the EIC, , which is given in Equation 9.
4.1.3. Targeted Minimum Loss-Based Estimator
We now describe two TMLE approaches to estimate using separate estimates for the numerator, , and denominator, .
Inefficient TMLE.
The first approach uses a previously developed TMLE for estimating (Rudolph et al., 2017b) and uses the TMLE for an average treatment effect (van der Laan and Rubin, 2006) for estimating . However, using the previously developed TMLE for respects neither the exclusion restriction nor monotonicity constraints on our statistical model, so we refer to this as an inefficient TMLE. This inefficient TMLE estimate of is given by the ratio of the estimates of over . The variance of this estimator can be calculated as the sample variance of the EIC, given in Equation 9.
Efficient TMLE.
The second approach proposes a novel TMLE for that respects the exclusion restriction and monotonicity statistical constraints. Additionally, as in the EE estimation approach detailed in Section 4.1.2, we use the same estimate of the conditional distribution of Z given A and W for both the numerator and denominator, and employ constrained regression in estimating this conditional distribution to enforce the monotonicity statistical constraint. Thus, we refer to this as an efficient TMLE and describe, step-by-step, how to compute this particular TMLE. The R code to program this efficient TMLE is included in the supplementary Web appendix.
Consider submodel This CY differs from the CY in the previously developed inefficient TMLE for in that the targeting step does not introduce dependence on A (Rudolph et al., 2017b); instead, the exclusion restriction constraint is preserved. The components of this submodel can be estimated as described in Step 1 of Section 4.1.2.
Let ϵn be the MLE fitted coefficient on CY in the logistic regression model of Yon CY with as an offset, using the binary log-likelihood loss function. Alternatively, a non-negative portion of may be moved into the weights and a weighted logistic regression model may be fitted. Y can be bounded to the [0,1] scale as previously recommended (Gruber and van der Laan, 2010).
The updated estimator is given by noting again that conditional independence with A is preserved.
We next integrate out M using the data-dependent stochastic intervention on
- The next step is to target given above. We denote this targeted that is used in the numerator with to distinguish it from the targeted version that is used in the denominator. Consider submodel defined as:
where Let be the MLE fitted coefficients on in the weighted logistic regression model of Z with as an offset, using the binary log-likelihood loss function, and weights CZ. The updated estimator is given by
We can now estimate by integrating out Z from
The estimate of is given by where is the empirical distribution of W. It is the empirical mean of the difference in setting a = 1 versus a = 0.
We now turn our attention to targeting in the denominator. We denote the targeted used in the denominator with to distinguish it from the targeted version used in the numerator, Consider submodel defined as: where be the MLE fitted coefficients on in the weighted logistic regression model of Z with as an offset, using the binary log-likelihood loss function, and weights CZ.
The updated estimator is given by
The estimate of is given by It is the empirical mean of the difference in setting a = 1 versus a = 0.
The ratio of these two estimates of over gives the efficient TMLE estimate of . The TMLE solves the empirical means of the efficient influence curves (EIC) for (Equations 10–11 in Section 4.1.2), replacing P with where represents
The variance of the TMLE of is estimated as the sample variance of
4.2. TMLE that estimates the CDSE ratio directly
We now describe a TMLE that targets the CSDE ratio itself. This TMLE is both efficient, because it respects the model constraints, and compatible, because it simultaneously targets the numerator and denominator. We henceforth refer to this as the compatible TMLE and describe, step-by-step, how to compute it. Several of the steps that follow are identical to those for the Efficient TMLE described in the above Section 4.1.3 (e.g., Steps 1–4). The difference lies in the targeting of The R code to program this compatible TMLE is included in the supplementary Web appendix.
Again, consider submodel Recall that this CY preserves the exclusion restriction constraint on our statistical model such that Y is conditionally independent of A given M, Z, W The components of this submodel can be estimated as described in Step 1 of Section 4.1.2.
Let ϵn be the MLE fitted coefficient on CY in the logistic regression model of Y on CY with as an offset, using the binary log-likelihood loss function. Alternatively, a non-negative portion of may be moved into the weights and a weighted logistic regression model may be fitted. Y can be bounded to the [0,1] scale as previously recommended (Gruber and van der Laan, 2010).
The updated estimator is given by nothing again that conditional independence with A is preserved.
We next integrate out M using the data-dependent stochastic intervention on
The next step is to target given above, for both the numerator and denominator and in such a way that preserves monotonicity of A on Z Consider submodel defined as: where Let the fitted coefficients on in the weighted logistic regression model of Z with as an offset, using the binary log-likelihood loss function, and weights CZ.
The updated estimator is given by
We can now estimate by integrating
The estimate of is the empirical distribution of W. It is the empirical mean of the difference in setting a = 1 versus a = 0.
The estimate of is given by It is the empirical mean of the difference in setting a = 1 versus a = 0.
The ratio of the estimates of gives the TMLE estimate of The TMLE solves the empirical mean of the EIC for (Equation 9), replacing P with
The variance of the TMLE of is estimated as the sample variance of
5. Simulation
5.1. Overview
We conduct a simulation study to examine finite sample performance of the IPTW, EE, and TML estimators for from the two data-generating mechanisms (DGMs) shown in Table 1. In the Moving to Opportunity data used for the empirical illustration, we have where Δ is an indicator of selection into the sample (in the Moving to Opportunity data, one child from each family is selected to participate). This results in a factorized likelihood:
(12) |
Table 1.
Moderate-Strong Instrument Simulation | |
---|---|
P(W1 = 0.50) | |
P(W2 = 0.50) | |
P(Δ = 0.58) | |
P(A = 0.50) | |
P(Z = 0.58) | |
P(M = 0.52) | |
P(Y = 0.76) | |
Weak Instrument Simulation | |
P(Z = 0.31) |
Under the assumptions enumerated in Section 3, our causal quantity of interest, the CSDE, is identified by the statistical parameter, where is identified
(13) |
and where is identified
(14) |
This slight modification to the SCM results in correspondingly slight modifications to the estimators. For the IPTW estimators, the weights are multiplied by the inverse probability of sampling weights where represents The EE estimators solve the numerator and denominator of an EIC that is nearly identical to that given in Equations 10 – 11 only now multiplied by giving the modified ElCs for
(15) |
and where
(16) |
The TMLEs are now inverse-weighted TMLEs where the clever covariates are multiplied by
Table 1 uses the same notation as in Section 2, excepting the addition of Δ. The first DGM represents the primary simulation and a moderate-strong instrument scenario. The second DGM represents a weak instrument scenario that may be more likely to result in CSDE estimates that lie outside the bounds of the parameter space when estimated by non-substitution-based estimators.
We compare performance of our three estimators. We show estimator performance in terms of absolute bias, percent bias, closeness to the efficiency bound (mean estimator standard error (SE) × the square root of the number of observations), 95% confidence interval (CI) coverage, and mean squared error (MSE) across 1,000 simulations for sample sizes of N=5,000, N=500, and N=100. In addition, we consider 1) correct specification of all models, 2) misspecification of the Y model that included a term for Z only, 3) misspecification of the M model that included a term for W only, 4) misspecification of the M and Y models, 5) misspecification of the Z model that included a term for A only, and 6) misspecification of the Z and Y models.
5.2. Results
Table 2 gives results under the moderate-strong instrument simulation scenario using correct model specification for comparing the TML, IPTW, and EE estimators. We see that the TML, IPTW, and EE estimators are consistent when all models are correctly specified and sample sizes are large (N=5,000), showing biases of less than 1 % in the case of the TML and EE estimators and just over 1% in the case of the IPTW estimator. Bias increases for the IPTW estimator under the smaller sample sizes of N=500 and N=100 to 4% and 42% respectively, indicating that even under correct model specification, this estimator is challenged in finite samples. In contrast, the TML and EE estimators continue to perform well when sample size decreases to N=500 and N=100, although the efficient TMLE that targets the numerator and denominator separately shows a bias of 13% under N=100. The compatible TML and EE estimators perform similarly and close to the efficiency bound for all sample sizes, though efficiency decreases slightly with decreasing sample size. 95% CI coverage for both is close to 95% for N=5,000 and N=500 and is reduced slightly for N=100. 95% CI coverage is conservative for the IPTW estimator—around 99% for all sample sizes.
Table 2.
Estimand | Bias | %Bias | 95%CI Cov | MSE | |
---|---|---|---|---|---|
All correctly specified | |||||
N=5000 | |||||
TMLE, compatible | 0.000 | 0.07 | 1.11 | 94.90 | 0.000 |
TMLE, efficient | 0.000 | 0.07 | 1.11 | 94.90 | 0.000 |
IPTW | 0.003 | 1.45 | 6.53 | 98.70 | 0.005 |
EE | 0.000 | 0.12 | 1.11 | 94.50 | 0.000 |
N=500 | |||||
TMLE, compatible | −0.001 | −0.47 | 1.11 | 94.90 | 0.002 |
TMLE, efficient | −0.001 | −0.47 | 1.11 | 95.00 | 0.002 |
IPTW | −0.009 | −4.08 | 6.89 | 98.40 | 0.051 |
EE | −0.002 | −0.75 | 1.11 | 95.50 | 0.002 |
N=100 | |||||
TMLE, compatible | −0.009 | −3.96 | 1.14 | 90.64 | 0.014 |
TMLE, efficient | 0.028 | 13.43 | 1.17 | 86.50 | 0.045 |
IPTW | −0.093 | −42.39 | 21.71 | 99.20 | 1.484 |
EE | −0.005 | −2.15 | 1.12 | 93.30 | 0.013 |
Table 3 gives simulation results under various model misspecifications with large sample size of N=5,000. The IPTW estimator is consistent if the A and M models are correctly specified. Deriving the robustness properties for the EE and TML estimators from the EIC, under large sample size, one of three scenarios is required for estimates of to be consistent: 1) the models need to be correctly specified, or 2) the models need to be correctly specified, or 3) the models need to be correctly specified. For estimates of to be consistent, either the model needs to be correctly specified. Thus, for the EE and TML estimators, robustness requirements for the denominator are subsumed in the robustness requirements for the numerator. In the simulation results that follow, we note that A is randomly assigned in the DGM we consider, aligned with its role as an instrument and with our motivating example.
Table 3.
Estimand | Bias | %Bias | 95%CI Cov | MSE | |
---|---|---|---|---|---|
M model misspecified, N=5,000 | |||||
TMLE, compatible | 0.000 | 0.02 | 1.05 | 94.30 | 0.000 |
TMLE, efficient | 0.000 | 0.02 | 1.05 | 94.30 | 0.000 |
IPTW | −0.024 | −11.28 | 5.32 | 99.40 | 0.003 |
EE | 0.000 | 0.08 | 1.05 | 94.20 | 0.000 |
Y model misspecified, N=5,000 | |||||
TMLE, compatible | 0.000 | 0.09 | 1.14 | 95.60 | 0.000 |
TMLE, efficient | 0.000 | 0.09 | 1.14 | 95.60 | 0.000 |
IPTW | 0.003 | 1.45 | 6.53 | 98.70 | 0.005 |
EE | 0.000 | 0.06 | 1.19 | 96.10 | 0.000 |
M and Y models misspecified, N=5,000 | |||||
TMLE, compatible | 0.096 | 44.90 | 1.06 | 0.00 | 0.009 |
TMLE, efficient | 0.096 | 44.90 | 1.06 | 0.00 | 0.009 |
IPTW | −0.024 | −11.28 | 5.32 | 99.40 | 0.003 |
EE | 0.095 | 44.74 | 1.06 | 0.00 | 0.009 |
Z models misspecified, N=5,000 | |||||
TMLE, compatible | 0.001 | 0.28 | 1.16 | 87.50 (92.90) | 0.000 |
TMLE, efficient | 0.012 | 6.28 | 1.38 | 83.10 (92.60) | 0.001 |
IPTW | 0.003 | 1.45 | 6.53 | 98.70 | 0.005 |
EE | 0.001 | 0.26 | 1.16 | 88.30 (92.80) | 0.000 |
Z and Y models misspecified, N=5,000 | |||||
TMLE, compatible | 0.066 | 33.64 | 1.16 | 2.60 (47.00) | 0.005 |
TMLE, efficient | 0.041 | 20.72 | 1.42 | 45.70 (46.90) | 0.002 |
IPTW | 0.003 | 1.45 | 6.53 | 98.70 | 0.005 |
EE | 0.071 | 36.09 | 1.23 | 2.10 (38.30) | 0.005 |
As expected from each estimator’s robustness properties, we see that all estimators are consistent under misspecification of the Y model, with performance equivalent to performance under correct model specifications and N=5,000. Also as expected, under misspecification of the M model, the IPTW estimator is no longer consistent with 11 % bias, but the TML and EE estimators remain consistent. When both the M and Y models are misspecified, all three estimators are inconsistent with biases ranging from 11 % for IPTW to 45% for TML and EE, and 95% CI coverage of the TML and EE estimators is reduced to 0%.
The compatible TML, EE, and IPTW estimators are consistent under misspecification of the Z model. In this scenario, the efficient TMLE demonstrates slight bias (6%), possibly because is targeted separately in the numerator and denominator, resulting in incompatibility. In this scenario, and more generally under any model misspecification scenario consistent with the robustness properties guaranteeing estimator consistency, the IC-based inference for the TML and EE estimators may be inaccurate, because the IC differs under misspecification. Indeed, we see here that using IC-based inference results in undercoverage (Table 3). Coverage improves when bootstrapping is used for inference (also shown in Table 3), because bootstrapping will approximate the true variance of the estimator in this case where parametric models are used in fitting. However, bootstrapping may not provide valid inference under misspecification if machine learning is used in model fitting, because the resulting estimator is no longer asymptotically linear. Others have addressed the challenge of inference in scenarios of model misspecification under estimator consistency by targeting the nuisance parameters of the IC to provide accurate IC-based inference at the cost of efficiency (Benkeser et al., 2017; van der Laan, 2014).
Misspecification of both the Z and Y models results not only in invalid inference for the EE and TML estimators but also in inconsistent estimates. We note that for the two scenarios where the Z model was misspecified the true DGM was changed to to make misspecifying the Z model meaningful.
Table 4 gives results under the weak instrument simulation scenario using correct model specification, comparing the IPTW, EE, efficient TML, and compatible TML estimators. Finite sample performance is challenged in this scenario—we see all estimators having larger biases and worse efficiency for a given sample size than in the stronger instrument scenario. Performances of the IPTW estimator is particularly affected. Under sample size N=500 and correct model specification, the IPTW estimator is 27% biased compared to 3% bias of the other estimators. With sample size N=100 under this weak instrument scenario, all estimators perform poorly, with the IPTW estimator displaying particularly egregious performance.
Table 4.
Estimand | Bias | %Bias | 95%CI Cov |
MSE | % Out of Bounds | |
---|---|---|---|---|---|---|
N=5000 | ||||||
TMLE compatible | −0.000 | −0.08 | 1.13 | 73.50 | 0.001 | 0.00 |
TMLE, efficient | −0.000 | −0.07 | 1.13 | 73.50 | 0.001 | 0.00 |
IPTW | 0.016 | 8.14 | 19.29 | 98.70 | 0.045 | 0.10 |
EE | −0.000 | −0.07 | 1.13 | 74.40 | 0.001 | 0.00 |
N=500 | ||||||
TMLE compatible | 0.008 | 3.53 | 1.38 | 74.60 | 0.010 | 0.10 |
TMLE, efficient | 0.008 | 3.55 | 1.38 | 74.50 | 0.010 | 0.10 |
IPTW | 0.057 | 27.05 | 28.51 | 99.90 | 0.964 | 18.80 |
EE | 0.008 | 3.69 | 1.40 | 75.30 | 0.010 | 0.10 |
N=100 | ||||||
TMLE compatible | 0.048 | 22.19 | 344.04 | 86.49 | 3.302 | 4.10 |
TMLE, efficient | 0.112 | 50.95 | 1226.43 | 88.51 | 3.550 | 469 |
IPTW | −1.14 ×1012 |
−5.11 ×1014 |
5.37 ×1028 |
100.00 | 1.24 ×1027 |
53.10 |
EE | 0.042 | 19.40 | 53.66 | 88.50 | 0.740 | 3.30 |
In part, the poor efficiency of the IPTW estimator is due to very small or even negative denominator estimates in this weak instrument scenario, which results in CSDE estimates lying outside of the parameter space. Indeed, we see that 18% and 53% of the IPTW estimates were out of the bounds of the parameter space for the N=500 and N=100 sample sizes, respectively. In contrast, the EE and TMLE estimates largely stay within the parameter space for N=500. For the smallest sample size of N=100, about 3–4% lie outside of the parameter space for the EE and TMLE estimates.
6. Empirical Illustration
6.1. Overview and set-up
We now apply our proposed TMLE to the Moving to Opportunity study (MTO): a longitudinal, randomized trial where families living in public housing were randomized to receive a Section 8 housing voucher that they could then use to move out of public housing (Kling et al., 2007). In this example, the CSDE is the direct effect of using the housing voucher to move out of public housing on adolescent substance use outcomes, not mediated by aspects of parental wellbeing, among those who would comply with the intervention.
The instrument, A, is defined as randomization to receive a Section 8 housing voucher that one can then use to rent on the private market. The exposure, Z, is defined as adherence to the intervention—using the housing voucher, if one received it, to move out of public housing. We examined direct effects not operating through each of five mediators, M all measured at an interim assessment that occurred 4–7 years after the baseline assessment: 1) parental employment; 2) parental anxiety, defined as feeling worried, tense or anxious most of the time or worrying much more than others in his/her situation for at least one month during the past year; 3) parental depression, consistent with DSM-IV diagnostic criteria, as measured by the CIDI-SF instrument (Kessler et al., 1998); 4) parental distress, as measured by the Kessler Psychological Distress Scale (K6) (Kessler et al., 2002); and 5) parental warmth towards the adolescent, as measured by direct observation, consisting of nine items. The first three mediators were binary. The last two were indices bounded [0,1]. We examined three adolescent substance use outcomes, Y, which were also measured at the interim assessment: past-month cigarette use, past-month marijuana use, and past-month problematic drug use. Problematic drug use was defined as using hard drugs or using marijuana before school or work in the past month. We used a high-dimensional vector of covariates measured at baseline, W that included social-demographic information for the adolescent and his/her family, information on the adolescent’s behavior and learning while a child, neighborhood characteristics, and reasons for the family’s participation in MTO. Definitions of W, A, Z and Y align with previous work estimating direct and indirect effects of A on Y in the MTO study (Rudolph et al., 2017a). These variables follow the same structural causal model as detailed in Section 2.
Aligned with a prior analysis estimating direct effects in MTO (Rudolph et al., 2017a), our sample includes adolescents participating in MTO who were 12–17 years old at the interim assessment. We exclude the Baltimore site, as Section 8 voucher receipt did not increase a family’s likelihood of moving to a low-poverty neighborhood, which differs from other sites and from the intention of the intervention. We conducted analyses stratified by gender, as previous work documented qualitatively and quantitatively different intervention effects between girls and boys (Orr et al., 2003; Clampet-Lundquist et al., 2011). We combine sites with similar intervention effects, as has been done previously (Rudolph et al., 2017a). Lastly, we restrict to those with nonmissing mediator and outcome data. Multiple imputation by chained equations (Buuren and Groothuis-Oudshoorn, 2011) was used to create 30 imputed datasets to address missing covariate values (none had more than 5% missing). The University of California, Davis, and Columbia University determined this analysis of deidentified data to be non-human subjects research.
6.2. Results
Total complier average causal effects (CACEs) (Angrist et al., 1996) (also called treatment-on-treated effects (Orr et al., 2003)) are shown in Web Figures 1–3 (see supplementary Web appendix). This is the effect of Z on Y among compliers, using randomization of the instrument A. In other words, it is the total effect of moving with the voucher out of public housing on the outcome, among those who comply with the intervention. Moving with the voucher out of public housing increased risk of cigarette use among boys by 8% (RD: 0.08, 95%CI: - 0.00, 0.17) and reduced risk of marijuana use among girls by 7% (RD: −0.07, 95%CI: −0.13, −0.01). This aligns with previous work finding that the intervention generally improved health and risk behavior outcomes among girls but had negative impacts in terms of these same types of outcomes for boys (Orr et al., 2003; Clampet-Lundquist et al., 2011).
We next estimated the first-stage, data-dependent, stochastic effect of A on each mediator, M. that are used in each of CSDE estimators. These first-stage effects are shown in Table 5. Across outcome samples and genders, we see that being randomized to receive a Section 8 voucher increases parental employment and anxiety but decreases parental depression. Effects of voucher receipt on distress and warmth were mixed or null. These first-stage effects of A on M operate through Z The effect of A on Z (voucher receipt on moving with the voucher out of public housing) ranged from an increased likelihood of 47% to 52%, depending on the outcome sample.
Table 5.
Mediator | Boys | Girls |
---|---|---|
RD (95% CI) | RD (95% CI) | |
Cigarette Use Sample | ||
Parental employment | 0.058 (0.047, 0.070) | 0.026 (0.015, 0.037) |
Parental anxiety | 0.036 (0.028, 0.044) | 0.041 (0.039, 0.043) |
Parental depression | −0.004 (−0.007, −0.001) | −0.004 (−0.006, −0.001) |
Parental distress | 0.004 (−0.001, 0.009) | 0.013 (0.009, 0.017) |
Parental warmth | −0.006 (−0.032, 0.019) | −0.005 (−0.028, 0.019) |
Marijuana Use Sample | ||
Parental employment | 0.079 (0.067, 0.092) | 0.027 (0.015, 0.038) |
Parental anxiety | 0.011 (0.002, 0.021) | 0.042 (0.041, 0.043) |
Mediator | Boys | Girls |
Parental depression | −0.024 (−0.026, −0.021) | −0.003 (−0.005, −0.001) |
Parental distress | −0.021 (−0.027, −0.015) | 0.012 (0.008, 0.016) |
Parental warmth | 0.005 (−0.023, 0.033) | −0.001 (−0.025, 0.023) |
Problematic Drug Use Sample | ||
Parental employment | 0.061 (0.052, 0.070) | 0.052 (0.041, 0.063) |
Parental anxiety | 0.039 (0.031, 0.047) | 0.050 (0.045, 0.056) |
Parental depression | −0.006 (−0.009, −0.003) | −0.011 (−0.016, −0.005) |
Parental distress | 0.004 (−0.001, 0.009) | 0.016 (0.009, 0.022) |
Parental warmth | −0.005 (−0.027, 0.017) | −0.011 (−0.040, 0.017) |
The TMLE estimates of the s by outcome sample, gender, and mediator are shown in Figures 1 - 3. The estimates are similar across mediators.
Lastly, we compare the CSDE estimates in Figures 1 - 3 with the stochastic direct and indirect effect (SDEs and SIEs) estimates (see supplementary Web appendix Figures 1–3). The SDE is the direct effect of A on Y not through M and the SIE is the indirect effect of A on Y through M The total intent-to-treat average treatment effects (the total effect of A on each Y) are included in the figures. Such further examination may be of interest, because previous research has shown that significant indirect effects may be present with null total or direct effects (Imai et al., 2010). However, all SDE and SIE effect estimates are null.
Together, these results suggest that none of the five parental well-being variables tested are on the causal pathway from Section 8 voucher receipt to subsequent adolescent substance use. The first-stage results (Table 5) provided evidence against mediation by parental distress, or warmth. The CSDE and SDE/SIE estimates then provided evidence against mediation by the remaining variables of parental employment, anxiety, or depression.
7. Conclusion
In this paper, we identified the IV direct effect of exposure, Z on outcome Y not operating through mediator, M—what we call the complier stochastic direct effect. We detailed three estimators to estimate such effects: a ratio of inverse-probability of treatment-weighted estimators, a ratio of estimating equation estimators, a ratio of targeted minimum loss-based estimators, and a TMLE that targets the CSDE directly. These estimators would be applicable for a variety of study designs, including 1) randomized encouragement trials, like the MTO housing voucher experiment we consider as an illustrative example, 2) treatment discontinuities, such when a policy or practice changes abruptly either over time or at a certain value, and 3) Mendelian randomization (Baiocchi et al., 2014). To facilitate implementation of our proposed estimators, we include step-by-step instructions in the main text and commented R code in the supplementary Web appendix.
Estimators of the CSDE will be challenged by the finite sample sizes encountered in real-world data. Both mediation estimators and IV estimators are less efficient than their total average treatment effect counterparts. Because estimators of the CSDE combine both mediation and IV components, these estimators will likewise have efficiency challenges, particularly in finite samples. Thus, it is important to choose an estimator that is more robust to finite sample bias.
We found the IPTW estimator to be the most sensitive to finite sample bias, resulting in bias of over 40% even when all models were correctly specified in a sample size of N=100 (Table 2). In contrast, the EE estimator and compatible TMLE were far less sensitive, demonstrating slight losses of efficiency in sample sizes of N=100.
The EE and TML estimators also have advantages over the IPTW estimator in terms of efficiency and reduced reliance on correct parametric model specification due to 1) being robust to certain combinations of model misspecifications and 2) having theory-based inference when incorporating data-adaptive methods like machine learning into model fitting. In addition, the compatible TMLE solves the efficient influence equation of the CSDE ratio, with the targeting being done in such a way that it is compatible across the numerator and denominator. This compatibility improves the TMLE’s performance particularly in the weaker instrument scenario, as was shown in Table 4 comparing the compatible and efficient TMLEs. A previous TMLE for the complier average total effect (as opposed to the complier direct effect) used a separate TMLE for each of the numerator and denominator, so did not have the advantage of this compatibility (Rudolph and van der Laan, 2017).
However, the estimators we propose are limited in that they use a data-dependent stochastic intervention on M, which assumes that the stochastic draw is from a known distribution of estimated from the observed data. It would be significantly more complex to solve the compatible EIC for the non-data-dependent version, however, we plan to complete such an extension.
Perhaps the most significant limitation is that we were unable to identify a corresponding complier stochastic indirect effect without additional restrictive assumptions. This limitation is corroborated by recent work by (Frölich and Huber, 2017) where a similar IV indirect effect could only be identified by assuming two distinct instruments, one for Z and one for M that themselves were conditionally independent
Supplementary Material
Acknowledgments
The authors gratefully acknowledge R00DA042127(PI: Rudolph)
Contributor Information
Kara E. Rudolph, Department of Epidemiology, Columbia University, New York, New York
Oleg Sofrygin, Division of Biostatistics, University of California, Berkeley.
Mark J. van der Laan, Division of Biostatistics, University of California, Berkeley
References
- Albert JM (2008) Mediation analysis via potential outcomes models. Statistics in medicine, 27, 1282–1304. [DOI] [PubMed] [Google Scholar]
- Angrist JD, Imbens GW and Rubin DB (1996) Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91, 444–455. [Google Scholar]
- Baiocchi M, Cheng J and Small DS (2014) Instrumental variable methods for causal inference. Statistics in medicine, 33, 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benkeser D, Carone M, Laan MVD and Gilbert P (2017) Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104, 863–880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buuren S and Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45. [Google Scholar]
- Chén OY, Crainiceanu C, Ogburn EL, Caffo BS, Wager TD and Lindquist MA (2015) High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clampet-Lundquist S, Edin K, Kling JR and Duncan GJ (2011) Moving teenagers out of high-risk neighborhoods: How girls fare better than boys. American Journal of Sociology, 116, 1154–89. [DOI] [PubMed] [Google Scholar]
- Didelez V, Dawid AP and Geneletti S (2006) Direct and indirect effects of sequential treatments. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 138–146. AUAI Press. [Google Scholar]
- Dunn G and Bentall R (2007) Modelling treatment-effect heterogeneity in randomized controlled trials of complex interventions (psychological treatments). Statistics in medicine, 26, 4719–4745. [DOI] [PubMed] [Google Scholar]
- Frölich M and Huber M (2017) Direct and indirect treatment effects-causal chains and mediation analysis with instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1645–1666. [Google Scholar]
- Gruber S and van der Laan MJ (2010) A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. The International Journal of Biostatistics, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imai K, Keele L, Yamamoto T et al. (2010) Identification, inference and sensitivity analysis for causal mediation effects. Statistical science, 25, 51–71. [Google Scholar]
- Joffe MM, Small D, Ten Have T, Brunelli S and Feldman HI (2008) Extended instrumental variables estimation for overall effects. The international journal of biostatistics, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE and Zaslavsky AM (2002) Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological medicine, 32, 959–976. [DOI] [PubMed] [Google Scholar]
- Kessler RC, Andrews G, Mroczek D, Ustun B and Wittchen H-U (1998) The world health organization composite international diagnostic interview short- form (cidi-sf). International journal of methods in psychiatric research, 7, 171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kling JR, Liebman JB and Katz LF (2007) Experimental analysis of neighborhood effects. Econometrica, 75, 83–119. [Google Scholar]
- van der Laan MJ (2014) Targeted estimation of nuisance parameters to obtain valid statistical inference. The international journal of biostatistics, 10, 29–57. [DOI] [PubMed] [Google Scholar]
- van der Laan MJ and Rubin D (2006) Targeted maximum likelihood learning. The International Journal of Biostatistics, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogburn EL (2012) Commentary on” mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables” by dylan small. Journal of statistical research, 46, 105. [PMC free article] [PubMed] [Google Scholar]
- Orr L, Feins J, Jacob R, Beecroft E, Sanbonmatsu L, Katz LF, Liebman JB and Kling JR (2003) Moving to opportunity: Interim impacts evaluation. [Google Scholar]
- Pearl J (2009) Causality. Cambridge university press. [Google Scholar]
- Rice MS, Bertrand KA, VanderWeele TJ, Rosner BA, Liao X, Adami H-O and Tamimi RM (2016) Mammographic density and breast cancer risk: a mediation analysis. Breast Cancer Research, 18, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE and van der Laan MJ (2017) Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society Series B Statistical Methodology, 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Sofrygin O, Schmidt NM, Crowder R, Glymour MM, Ahern J and Osypuk TL (2017a) Mediation of neighborhood effects on adolescent substance use by the school and peer environments in a large-scale housing voucher experiment. Epidemiology, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Sofrygin O, Zheng W and van der Laan MJ (2017b) Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiologic Methods, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Small DS (2011) Mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables. arXiv preprint arXiv:1109.1070. [PMC free article] [PubMed] [Google Scholar]
- Ten Have TR, Joffe MM, Lynch KG, Brown GK, Maisto SA and Beck AT (2007) Causal mediation analyses with rank preserving models. Biometrics, 63, 926–934. [DOI] [PubMed] [Google Scholar]
- VanderWeele TJ and Tchetgen Tchetgen EJ (2017) Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng W and van der Laan M (2017) Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.