Abstract
In observational studies, treatments are typically not randomized and therefore estimated treatment effects may be subject to confounding bias. The instrumental variable (IV) design plays the role of a quasi-experimental handle since the IV is associated with the treatment and only affects the outcome through the treatment. In this paper, we present a novel framework for identification and inference using an IV for the marginal average treatment effect amongst the treated (ETT) in the presence of unmeasured confounding. For inference, we propose three different semiparametric approaches: (i) inverse probability weighting (IPW), (ii) outcome regression (OR), and (iii) doubly robust (DR) estimation, which is consistent if either (i) or (ii) is consistent, but not necessarily both. A closed-form locally semiparametric efficient estimator is obtained in the simple case of binary IV and outcome and the efficiency bound is derived for the more general case.
Keywords: Counterfactuals, Double robustness, Instrumental variable, Unmeasured confounding, Effect of treatment on the treated
1. Introduction
Epidemiology studies and social sciences often aim to evaluate the effect of a treatment. For practical reasons, the average treatment effect among treated individuals (ETT) is sometimes of greater interest than the treatment effect in the population. In epidemiology studies concerning the toxic effects of a new drug or the treatment effect only on those who take the treatment, the ETT is the parameter of interest and it is known as “the effect of exposure on the exposed,” or “standardized morbidity” (Miettinen, 1974; Greenland and Robins, 1986). In econometrics, ETT is often used to evaluate the effects of a policy among those whom the policy is applied to. For example, Angrist (1995) evaluated the average effect of military service on the civilian earnings for veterans. Heckman et al. (1997, 1998) evaluated the average effect of job training for the program participants.
In observational or randomized studies with non-compliance, a primary challenge is the presence of unmeasured confounding, i.e. outcomes between treatment groups may differ not only due to the treatment effect, but also because of unmeasured factors that may affect the treatment selection.
Instrumental variables (IV) are useful in addressing unmeasured confounding. An IV is a variable that is associated with the treatment and it affects the outcome only through the treatment. The key idea of the IV method is to extract exogenous variation in the treatment that is unconfounded with the outcome and to take advantage of this bias-free component to make causal inference about the treatment effect (Robins, 1989; Angrist et al., 1996; Heckman, 1997).
The development of the IV approach can be traced back to Wright (1928) and Goldberger (1972) under linear structural equations in econometrics. Imbens and Angrist (1994), Angrist et al. (1996) and Heckman (1997) formalized the IV approach within the framework of potential outcomes or counterfactuals. Under additive and multiplicative structural nested models (SNMs), Robins (1989) and Robins (1994) evaluated the corresponding average treatment effect among treated individuals (ETT) conditional on the IV and observed covariates. Identification is achieved by assuming a certain degree of homogeneity with regard to the IV in an SNM of the conditional ETT (Hernán and Robins, 2006). Mainly, the assumption states that the magnitude of the conditional ETT does not vary with the IV. This is also referred to as the no-current treatment value interaction assumption. Under a similar identifying assumption, Vansteelandt and Goetghebeur (2003), Robins and Rotnitzky (2004), Tan (2010), Clarke et al. (2015) and Matsouaka and Tchetgen Tchetgen (2014) investigated estimation of this conditional causal effect using additive, multiplicative and logistic SNMs.
The literature mentioned above has some limitations. First of all, the literature focuses on the ETT conditional on the IV and observed covariates. The identification of such conditional ETT was achieved by specifying a functional form of the treatment causal effect. This is unattractive since it places constraints directly on the main parameter of interest and the misspecification of this functional form would lead to biased result. Second, the available inference methods require the treatment propensity score to be correctly specified even for an outcome regression-based estimator (Tan, 2010).
In this paper, we remedy these limitations in a novel framework for identification and estimation using an IV of the marginal ETT in the presence of unmeasured confounding. By targeting directly the marginal ETT, we allow the conditional causal effect to remain unrestricted. Our methods are particularly valuable when the primary goal is to obtain an accurate estimate of the treatment effect. Additionally, we propose a new identification strategy which is applicable to any type of outcome, and provides necessary and sufficient global identification conditions. Moreover, for inference, we propose three different semiparametric estimators allowing for flexible covariate adjustment, (i) inverse probability weighting (IPW), (ii) outcome regression (OR) and (iii) doubly robust (DR) estimation which is consistent if either (i) or (ii) is consistent but not necessarily both.
The outline for the paper is as follows. In Section 2, we introduce the notation and state the main assumptions. We study the nonparametric identification of ETT in Section 3. We introduce IPW, OR as well as DR estimators in Section 4. In Section 5, we assess the performance of various estimators in a simulation study. In Section 6, we further illustrate the methods with a study concerning the impact of participation in a 401(k) retirement programs on savings. We conclude with a brief discussion in Section 7.
2. Preliminary Results
Suppose that one observes independently and identically distributed data O = (A, Y, Z, C), where A is a binary treatment, Y is the outcome of interest, which may be dichotomous, polytomous, discrete or continuous, and the candidate IV Z and covariates C are both pre-exposure variables. Let a, y, z, c denote the possible values that A, Y, Z, C could take. Let Yaz denote the potential outcome if A and Z are set to a and z and let Ya denote the potential outcome only A is set to a. We formalize the IV assumptions using potential outcomes:
(IV.1) Stochastic exclusion restriction:
(IV.2) Unconfounded IV-outcome relation:
(IV.3) IV relevance:
Assumption (IV.1) states that Z does not have a direct effect on the outcome Y thus we use Ya to denote the potential outcome under treatment a for a = 0, 1. Assumption (IV.2) is ensured under physical randomization but will hold more generally if C includes all common causes of Z and Y. Assumptions (IV.1)–(IV.2) together imply that conditional on C, the IV is independent of the potential outcome for the unexposed, i.e., Y0 ⫫ Z|C. Assumption (IV.3) states that A and Z have a non-null association conditional on C, even if the association is not causal. If assumptions (IV.1)–(IV.3) are satisfied, Z is said to be a valid IV.
We make the consistency assumption Y = AY1 + (1 − A)Y0. The marginal treatment effect on the treated is ETT = E(Y1 − Y0|A = 1). Because E(Y1|A = 1) = E(Y|A = 1) can be consistently estimated from the average observed outcome of treated individuals, throughout, we focus on making inferences about ψ where
Suppose there exist unmeasured variables denoted by U such that controlling for (U, Z, C) suffices to account for confounding, i.e. Y0 ⫫ A|(U, Z, C), however,
| (1) |
where ⫫ denotes statistical independence. As pointed out by Robins et al. (2000), potential outcomes can be viewed as the ultimate unmeasured confounders. This is because by the consistency assumption, the observed outcome Y is a deterministic function of the treatment and the potential outcomes. Thus, given (Y0, Y1), U does not contain any further information about Y. To make explicit use of (1), we define the extended propensity score
as a function of Y0.
3. Nonparametric Identification
While assumptions (IV.1)–(IV.3) suffice to obtain a valid test of the sharp null hypothesis of no treatment effect (Robins, 1994) and can also be used to test for the presence of confounding bias (Pearl, 1995), ETT is not uniquely determined by the observed data without any additional restriction. For simplicity, we first consider the situation where covariates are omitted and outcome and IV are both binary. From the observed data, one can identify the quantities Pr(Y0, Z|A = 0), Pr(Z|A = 1) and Pr(A = 0). These quantities are functions of the unknown parameters: Pr(Z = 1), Pr(Y0 = 1), and Pr(A = 0|Y0, Z). Without imposing any additional assumption, there are six unknown parameters (one for Pr(Z = 1), one for Pr(Y0 = 1) and four for Pr(A = 0|Y0, Z)), however, only five degrees of freedom are available from the observed data (one for Pr(A = 0), one for Pr(Z| A = 1) and three for Pr(Y, Z|A = 0)). As a result, the joint distribution f (A, Y0, Z) is not uniquely identified. Particularly, ψ is not identified.
For identification purposes, additional assumptions, such as Robins’ no-current treatment value interaction assumption (Hernán and Robins, 2006), must be imposed to reduce the set of candidate models for the joint distribution f(A, Y0, Z, C). Below, we give a general necessary and sufficient condition for identification. Let and denote the collections of candidates for Pr(A = 0|Y0, Z, C) and f(Y0|C), which are known to satisfy (IV.1) and (IV.2).
Condition 1. Any two distinct elements and , satisfy the inequality:
The following proposition states that condition 1 is a necessary and sufficient condition for identifiability of the joint distribution of (A, Y0, Z, C), where Y0 and Z may be dichotomous, polytomous, discrete or continuous.
Proposition 1. The joint distribution of (A, Y0, Z, C) is identified in the model defined by and if and only if condition 1 holds.
It is convenient to check condition 1 for parametric models, but it may be harder for semiparametric and nonparametric models, since and can be complicated. The following corollary gives a more convenient condition.
Corollary 1. Suppose that for any two candidates , the ratio Pr1(A = 0|Y0, Z, C)/Pr2(A = 0|Y0, Z, C) is either a constant or varies with Z. Then the joint distribution of (A, Y0, Z, C) is identified.
Although the condition provided in Corollary 1 is a sufficient condition for identification, it allows identification of a large class of models. We further illustrate Proposition 1 and Corollary 1 with several examples. For simplicity, we again omit covariates, however, we show at the end of this section that similar results with covariates can be derived. For simplicity, we first consider the case of binary outcome with binary IV.
Example 1. Consider a model . The model is saturated since contains all possible treatment mechanisms. It can be shown that neither the joint distribution nor ψ is identified even under the assumptions (IV.1)–(IV.3).
Example 1 shows that the joint density f (A, Y0, Z) is not identified when the treatment selection mechanism is left unrestricted under (IV.1)–(IV.3). However, we show that the joint density f (A, Y0, Z) is identified assuming separable treatment mechanism on the additive scale.
Example 2. Consider a model . The model is separable since excludes an interaction between Y0 and Z. It can be shown that both the joint distribution and ψ is identified under assumptions (IV.1)–(IV.3).
Example 2 agrees with the intuition that identification follows from having fewer parameters than the saturated model. Under the assumed model, we have five unknown parameters and five available degrees of freedom from the empirical distribution. We show in the next example that the joint distribution and ψ can be identified in a general separable model when the outcome and instrument are both continuous.
Example 3. Consider the logistic separable treatment mechanism: , where q and h are unknown differentiable functions with h(0) = 0. It can be shown that satisfies condition 1 and thus the joint distribution is identified under (IV.1)–(IV.3).
These results can be generalized to include covariates C. For instance, by allowing both q and h to depend on C in example 3:
where h(0, C) = 0, the joint distribution is identified whenever the interaction term of Y0 and Z is absent.
In the Supplementary Materials, we present proofs for the above examples, and additional examples, such as the case of continuous outcome with binary IV, Probit link and a separable treatment mechanism.
4. Estimation
While nonparametric identification conditions are provided in Section 3, such conditions will seldom suffice for reliable statistical inference. Typically in observational studies, the set of covariates C is too large for nonparametric inference, due to the curse of dimensionality (Robins and Ritov, 1997). To make progress, we posit parametric models for various nuisance parameters, and provide three possible approaches for semiparametric inference that depend on different subsets of models. We describe an IPW, an OR and a DR estimator of the marginal ETT under assumptions (IV.1)–(IV.2) and condition 1. Throughout, we posit a parametric model fZ|C (z|c) = Pr(Z = z|C = c; ρ) for the conditional density of Z given C. Let denote the maximum likelihood estimator (MLE) of p. Let denote the empirical measure, that is . Let Ê denote the expectation taken under the empirical distribution of C and let denote the empirical probability of receiving treatment.
4.1. IPW estimator
For estimation, we first propose an IPW IV approach which extends standard IPW estimation of ETT to an IV setting. We make the positivity assumption that for all values of Y0, Z and C, the probability of being unexposed to treatment is bounded away from 0. The IPW approach relies on the crucial assumption that the extended propensity score model π(Y0, Z, C; γ) is correctly specified with unknown finite dimensional parameter γ and the following representation of ETT,
| (2) |
A derivation of the above equation is given in the Supplementary Materials. We solve the following equations to obtain an estimator of γ:
| (3) |
| (4) |
| (5) |
| (6) |
where satisfies the regularity condition (A.1) described in the Supplementary Materials. Equations (4) and (5) identify the association between (Z, C) and A in π(0, Z, C). If there is no selection bias, equations (3)–(5) are adequate to estimate the propensity score. By utilizing the IV property (IV.1)–(IV.2), equation (6) identifies the degree of selection bias encoded in the dependence of π on Y0. Both equation (4) and (6) require the conditional density of IV Z, fZ|C (z|c; ρ) to be correctly modeled. By equation (2), an extended propensity score estimator leads to an estimator of ψ. We have the following result:
Proposition 2. Under (IV.1)–(IV.2) and condition 1, suppose the extended propensity score model π(Y0, Z, C; γ) and fZ|C(z|c; ρ) are correctly specified, then the IPW estimator
is consistent for ψ.
We emphasize that the extended propensity score model can use any well-defined link function (e.g., logit, probit), and if condition 1 holds, Proposition 2 still holds. The functions h1, h2, t and l can be chosen based on the model for the extended propensity score. For example, assuming logit π(Y0, Z, C; γ) = θ0 + θ1Z + θ2C + ηY0 where is a k-dimensional parameter vector. The k-dimensional function (h1, h2, t)T can be chosen as and l can be chosen as any scalar function of (Z, C), e.g., l(Z, C) = Z. Thus we have exactly k + 1 estimating equations. The choice of h1, h2, t and l will generally impact efficiency but should not affect consistency as long as the identification conditions hold and the required models are correctly specified. The choice of h1, h2, t and l that leads to the most efficient IPW estimator can be derived using results in Newey and McFadden (1994). Due to space constraints, we only illustrate in details the choice of similar functions for efficient DR estimator in the section 4.3 and a similar derivation could be made here.
4.2. OR and DR estimators
Since Y0 is never observed for the treated group, we use the following equation to decompose E[Y0|A = 1, Z, C] into two parts: one can be estimated directly using restricted MLE and the other can be computed by solving an estimating equation. Specifically, we have
| (7) |
where g is any function of Y0 and C and α(Y0, Z, C) is the generalized odds ratio function relating A and Y0 conditional on Z and C as
Since the association between Y0 and A is attributed to unmeasured confounding, α(Y0, Z, C) can be interpreted as the selection bias function. Thus, we express the conditional mean function E{g(Y0, C)|A = 1, Z, C} in terms of f(Y|A = 0, Z, C) and α(Y0, Z, C). We prove the equation (7) in the Supplementary Materials.
Let f(Y|A = 0, Z, C; ξ) denote a model for the density of the outcome among the unexposed conditional on Z and C, and let denote the restricted MLE of ξ obtained using only data among the unexposed. Let η denote the parameter indexing a parametric model for the selection bias function α as α(Y0, Z, C; η). We obtain an estimator for η by solving:
| (8) |
for any choice of functions ω and g such that the regularity condition (A.2) stated in the Supplementary Materials holds. Intuitively, the left hand side of equation (8) is an empirical estimator of the expected conditional covariance between ω(Z, C) and g(Y0, C) given C, which should be zero by (IV.1)–(IV.2). Equation (8) requires the conditional density of IV Z, fZ|C (z|c; ρ) to be correctly modeled. Based on equation (7), we can construct an estimator for ψ based on and .
Proposition 3. Under (IV.1)–(IV.2) and condition 1, suppose α(Y0, Z, C; η), fZ|C (z|c; ρ) and f (Y|A = 0, Z, C; ξ) are correctly specified, then the OR estimator
is consistent for ψ.
Functions g and ω in equation (8) can be chosen based on the model we posit for α(Y0, Z, C). For example, assuming
| (9) |
g can be chosen as g(Y0, C) = ∂α(Y0, Z, C; η)/∂η = Y0 and ω can be chosen as any scalar function of (Z, C), e.g., ω(Z, C) = Z. The choice of g and ω may impact efficiency but does not affect consistency as long as the identification conditions hold and the required models are correctly specified. The choice of g and ω that leads to the most efficient OR estimator can be derived using Newey and McFadden (1994).
Tan (2010) proposed an OR estimator for the conditional ETT, which requires correctly specified models for both the treatment propensity score and the outcome regression function. In contrast, we circumvent the dependence of the regression estimator on the propensity score.
The proposed estimator for nuisance parameter η is closely related to the regression estimator proposed by Vansteelandt and Goetghebeur (2003) when Y is binary. Vansteelandt and Goetghebeur (2003) developed a two-stage logistic estimator which combines a logistic SMM at the first stage and a logistic regression association model at the second stage. Specifically, Vansteelandt and Goetghebeur (2003) focused on estimating ζ(Z, C) = logit Pr(Y1 = 1|A = 1, Z, C) − logit Pr(Y0 = 1|A = 1, Z, C), which encodes the conditional ETT given Z and C. Let ν denote the parameter indexing a model for ζ(Z, C) as ζ(Z, C; ν). They proposed to estimate ν in the estimating equation
| (10) |
where expit(x) = exp(x)/{1 + exp(x)} and ϑ(Z, C; ϱ) = logit Pr(Y = 1|A = 1, Z, C; ϱ).
Recall that we obtain an estimator of η indexing α(Y0, Z, C; η) in the equation (8), which can be re-expressed as
| (11) |
where δ(Z, C; ) = logit Pr(Y0 = 1|A = 0, Z, C). Equations (10) and (11) mainly differ in the way Pr(Y0 = 1|A = 1, Z, C) is estimated. More specifically, (10) obtains Pr(Y0 = 1|A = 1, Z, C) using Pr(Y1 = 1|A = 1, Z, C) as a baseline risk for the model while (11) uses Pr(Y0 = 1|A = 0, Z, C) as baseline risk. This difference is important since Vansteelandt and Goetghebeur (2003) failed to obtain a DR estimator of ζ(Z, C) while as we show next, our choice of parameterization yields a DR estimator of the marginal ETT.
Heretofore, we have constructed estimators in two different approaches. Both approaches assume correct models for α(Y0, Z, C; η) and fZ|C (z|c; ρ). The IPW approach further relies on a consistent estimator of the baseline extended propensity score β(Z, C) = logit Pr(A = 1|Y0 = 0, Z, C), which under the logit link and together with α(Y0, Z, C; η), provides a consistent estimator of the extended propensity score π(Y0, Z, C; γ) = expit {α(Y0, Z, C; η) + β(Z, C; θ)}. The OR approach further relies on a consistent estimator of f(Y|A = 0, Z, C), which together with α(Y0, Z, C; η), provides a consistent estimator of Pr(Y0 = 1|A = 1, Z, C) by (7). Define as the collection of laws with parametric models fZ|C(z|c; ρ), α(Y0, Z, C; η) and β(Z, C; θ) while f(Y|A = 0, Z, C) is unrestricted. Likewise, define as the collection of laws with parametric models fZ|C(z|c; ρ), α(Y0, Z, C; η) and f(Y|A = 0, Z, C; ξ) while β(Z, C) is unrestricted. The main appeal of a doubly robust estimator is that it remains consistent if either β(Z, C; θ) or f(Y|A = 0, Z, C; ξ) is correctly specified. To derive a DR estimator for ψ in the union space , we first propose a DR estimator for the parameter η of the selection bias model α(Y0, Z, C; η). For notational convenience, let
| (12) |
Consider the estimating equation for the selection bias parameter
| (13) |
where
Equation (13) is key to obtaining a DR estimation of the selection bias function and thus of ETT. Intuitively, the left hand side of equation (13) is also an empirical estimator of the expected conditional covariance between ω(Z, C) and g(Y0, C) given C, which should be zero by (IV.1)–(IV.2). In addition to the model fZ|C(z|c; ρ) for IV, equation (8) only involves outcome regression model, while equation (13) involves both outcome regression and propensity score models. Hence, the parameter obtained from (8) depends on the correct specification of outcome regression, while as we show in the following proposition, the parameter estimate for η obtained from (13) is doubly robust. We solve equation (13) jointly with equations (3)–(5) with replaced by . The choice of h1, h2, g and ω can be decided as in Sections 4.1 and 4.2.
Proposition 4. Under (IV.1)−(IV.2) and condition 1, and are consistent in the union model , where and
Proposition 4 implies that and are both DR estimators since their consistency only requires either the extended propensity score or the outcome regression model to be correctly specified but not necessarily both.
4.3. Local efficiency
The large sample variance of doubly robust estimators and at the intersection submodel where all models are correctly specified, is determined by the choice of g(Y, C) and ω(Z, C) in equation (13). In the Supplementary Materials, we derive the semiparametric efficient score of (η, ψ) in a model that only assumes that Z is a valid IV and the selection bias function α(Y0, Z, C; η) is correctly specified. As discussed in the Supplementary Materials, the efficient score is generally not available in closed-form, except in special cases, such as when Z and Y are both polytomous. Here, we illustrate the result by constructing a locally efficient estimator of (η, ψ) when Z and Y are both binary. In this vein, similar to the definition of , define
where υ is any function of (Y0, Z, C).
A one-step locally efficient estimator of η in is given by
where and
is the efficient score of η evaluated at the estimated intersection submodel . Further, let denote a DR estimator for ψ evaluated at the estimated intersection submodel with substituted in for . Then the efficient estimator of ψ is given by
5. Simulations
Simulations for both binary and continuous outcomes were conducted to evaluate the finite sample performance of the causal effect estimators derived in Sections 4.1 and 4.2. Let denote the complement space of and likewise define . Simulations were conducted under three scenarios: (i) , that is both outcome regression and extended propensity score are correctly specified, (ii) that is only the extended propensity score is correctly specified and (iii) that is only the outcome regression model is correctly specified.
Simulations were first carried out for a binary outcome. For scenario (i), the simulation study was conducted in the following steps:
Step 1: A hypothetical study population of size n = 1000 (or n = 5000) was generated and each individual had baseline covariates C1 and C2 generated independently from Bernoulli distributions with probability 0.4 and 0.6 respectively. Then the IV Z was generated from the model: logit Pr(Z = 1|C) = 0.2 + 0.4C1 − 0.5C2 and potential outcomes Y0, Y1 from models logit Pr(Y0 = 1|Z, C) = 0.6 + 0.8C1 − 2C2 and logit Pr(Y1 = 1|Z, C) = 0.7 − 0.3C1. The treatment variable A was generated from logit Pr(A = 1|Y0, Z, C) = 0.4 + 2Z + 0.8C1 − 0.6Y0 − 1.6C1Z, and the observed outcome was Y = Y0(1 − A) + Y1A.
Step 2: The following extended propensity score model was estimated and the parameters γ = (θ1, θ2, θ3, θ4, η) in the model
| (14) |
were estimated using estimating equations (3)–(6) with h1(Z, C) = (Z, C1Z)T, h2(C) = C1, t(Y, C) = Y and l(Z, C) = Z and was evaluated.
Step 3: The selection bias function was correctly specified as in (9), ξ in the regression outcome model
| (15) |
was estimated by restricted MLE, and α was estimated by solving equation (8) with ω(Z, C) = Z and g(Y, C) = Y and was evaluated.
Step 4: The selection bias function was correctly specified as in (9), ξ in equation (15) was estimated by restricted MLE, parameters γ in (14) was estimated using (3)–(5) and (13) where h, t, l, ω, g are chosen as in Step 2 and Step 3 and was evaluated.
Step 5: Steps 1–4 were repeated 1000 times.
The data generating mechanism described in Step 1 satisfies the assumptions (IV.1)–(IV.2) for both a = 0, 1. As shown in example 1, ψ is identified from the observed data since the treatment mechanism is a separable logit model. Also in the Supplementary Materials, we verify that model (15) for E(Y|A = 0, C, Z) contains the true data generating mechanism. Simulations for scenario (ii) were similar to scenario (i) except that (14) was replaced with
| (16) |
which is misspecified if θ4 ≠ 0 in equation (14). For scenario (iii), the potential outcome model (15) was replaced with
| (17) |
which is misspecified if ξ3 ≠ 0 and ξ5 = 0 in equation (15). We use the R package BB (Varadhan and Gilbert, 2009) to solve the nonlinear estimating equations. Simulation results for 1000 Monte Carlo samples are reported in Figure 1 and empirical coverage rates are presented in Table 1. Under correct model specification, all estimators have negligible bias which diminishes with increasing sample size. In agreement with our theoretical results, the IPW and regression estimators are biased with poor empirical coverages when the extended propensity score or the outcome model is mis-specified, respectively. The DR estimator performs well in terms of bias and coverage when either model is mis-specified but the other is correct. When all models are correctly specified, the relative efficiency of the locally semiparametric efficient estimator compared to the DR estimator of η and ψ are 0.840 and 0.810 respectively, based on Monte Carlo standard errors at sample size n = 5000. This shows that substantial efficiency gain may be possible at the intersection submodel when using the locally efficient score.
Figure 1:

Performance of the IPW, OR and DR estimators of ψ with binary outcomes.
Note: In each boxplot, the true value ψ0 is marked by the horizontal lines, white boxes are for n = 1000 and grey boxes are for n = 5000.
Table 1:
Empirical coverage rates based on 95% Wald confidence intervals for both binary and continuous outcomes
| Binary Y | Cont. Y | |||
|---|---|---|---|---|
| sample size (n) | 1000 | 5000 | 1000 | 5000 |
| (i) both π and μ are correct | ||||
| 0.86 | 0.90 | 0.96 | 0.95 | |
| 0.84 | 0.92 | 0.97 | 0.95 | |
| 0.85 | 0.91 | 0.97 | 0.96 | |
| (ii) only π is correct | ||||
| 0.86 | 0.90 | 0.96 | 0.95 | |
| 0.79 | 0.60 | 0.39 | 0.00 | |
| 0.86 | 0.91 | 0.97 | 0.95 | |
| (iii) only μ is correct | ||||
| 0.78 | 0.53 | 0.39 | 0.00 | |
| 0.84 | 0.92 | 0.97 | 0.95 | |
| 0.85 | 0.92 | 0.96 | 0.96 | |
The coverage was evaluated under three scenarios: (i) both outcome regression and the extended propensity score are correctly specified, in (ii) only the extended propensity score is correct and in (iii) only the outcome regression model is correct.
Simulations for a continuous outcome were conducted similarly as for the binary outcome in the following steps.
Step 1*: Covariates C1 and C2 were generated as in Step 1, Z was generated from model logit Pr(Z = 1|C) = 0.7 + 0.8C1 − C2, and Y0, Y1 from models Y0|Z, C ~ N(0.5 + C1 + 3C2, 1) and Y1|Z, C ~ N(1.1 − 1.3C1, 1), A was generated from logit Pr(A = 1|Y0, Z, C) = −0.2 − 3Z −3C1 + 0.3Y0 + 4C1Z, and Y = Y0(1 − A) + Y1A.
Step 2*: Same as Step 2.
Step 3*: Same as Step 3 except the following regression outcome models were fit to the data.
| (18) |
| (19) |
Step 4*: Same as Step 4 except that (15) was replaced by (18) and (19).
Step 5*: Same as Step 5
Simulation for a continuous outcome under scenario (ii) was carried out similarly as that for scenario (i) except that (14) was replaced by (16). For scenario (iii), the potential outcome models (18) and (19) were replaced with the linear models
| (20) |
| (21) |
We use the R package nleqslv (Hasselman, 2014) to solve the nonlinear estimating equations.
We verify in the Example A.1 of the Supplementary Materials that ψ is identified from the observed data. The simulation results for 1000 Monte Carlo samples are reported in Figure 2 and empirical coverage rates are presented in Table 1. Results are similar to the those for the binary outcome. Under correct model specification, all estimators have negligible bias which diminishes with increasing sample size. The IPW and OR estimators are biased with poor empirical coverages when the corresponding model is mis-specified. The DR estimator performs well in terms of bias and coverage when either the extended propensity score or the outcome regression model is correctly specified.
Figure 2:

Performance of the IPW, OR and DR estimators of ψ with continuous outcomes
Note: In each boxplot, the true value ψ0 is marked by the horizontal lines, white boxes are for n = 1000 and grey boxes are for n = 5000.
6. Application
Since the 1980s, tax-deferred programs such as individual Retirement Accounts (IRAs) and the 401(k) plan have played an important role as a channel for personal savings in the United States. Aiming to encourage investment for future retirement, the 401(k) plan offers tax deductions on deposits into retirement accounts and tax-free accrual of interest. The 401(k) plan shares similarities with IRAs in that both are deferred compensation plans for wage earners but the 401(k) plan is only provided by employers. The study includes 9275 people and once offered the 401(k) plan, individuals decide whether to participate in the program. However, participants usually have a stronger preference for savings which suggests the presence of selection bias. This was addressed as individual heterogeneity by Abadie (2003) and it has been pointed out that a simple comparison of personal savings between participants and non-participants may yield results that were biased upward. It was also postulated that given income, the 401(k) eligibility is unrelated to the individual preferences for savings thus can be used as an instrument for participation in 401(k) program (Poterba and Venti, 1994; Poterba et al., 1995). The complier causal effect for the 401(k) plan was studied by Abadie (2003). Here, we reanalyze these data to illustrate the proposed estimators of the marginal ETT.
We illustrate the methods in the context of a dichotomous outcome defined as the indicator that a person falls in the first quartile of net savings of the observed sample (equal to −$500). The treatment variable is a binary indicator of participation in a 401(k) plan and the IV is a binary indicator of 401(k) eligibility. The covariates are standardized log family income (log10 (income) −4.5), standardized age (age −41) and its square, marital status and family size. Age ranged from 25 to 64 years, marital status is binary indicator variable and family size ranges from 1 to 13 people. These covariates are thought to be associated with unobserved preferences for savings. Let ψ = E(Y0|A = 1) denote for a family that actually participated in the 401(k) program, the probability that they would have had net financial assets above the first quartile, had possibly contrary to fact, they been forced not to participate in the program. The ETT = E(Y1 − Y0| A = 1) is the effect of 401(k) plan on the difference scale for the probability of family net financial assets above the first quartile among participants. Equivalently, ETT can also be interpreted as an effect of the intervention in reducing a person’s risk for poor savings performance as measured by falling below the first quartile of the empirical distribution of savings for the sample. Before implementing our IV estimators, we first obtained a standard IPW estimator of the ETT under an assumption of no unmeasured confounding, i.e. defined as with α = 0. Thus, the propensity score was modeled as:
and estimated by standard maximum likelihood. The IPW estimate of ψ was with standard error (se) 0.014, where se was evaluated using the sandwich estimator accounting for all sources of variabilities. In comparison, the estimator based on the empirical estimate of E(Y|A = 1) was 0.883 (se = 0.006). Thus an estimate of ETT was (se = 0.016), which suggests the 401(k) plan may have a significant effect on increasing the family net financial assets among participants.
However, this result may be spurious due to the suspicion that even after controlling for observed covariates, there may still exist unmeasured factors that confound the relationship between 401(k) plan and the family net financial assets. Assuming assumptions (IV.1)–(IV.2) and condition 1, we applied the methods proposed in Section 4 to estimate the ETT in the presence of unmeasured confounders. The following parametric models were considered:
We specified the selection bias function as in (9), thus the selection bias function was assumed to depend on Y0 linearly. Possible deviations from this simple model was explored by allowing for potential interactions of Y0 with observed covariates in the extended propensity score. Thus, we posited the following parametric model for the extended propensity score which satisfies identifying condition 1 as a submodel of the separable model:
Table 2 reports point estimates and estimated standard errors for the IV, extended propensity score and the outcome regression models. Although the DR estimator also involves an outcome regression model among the unexposed, it is the same model as required for the regression estimator, thus these estimates are only repeated once. The instrument is strongly associated with family income (log OR = 2.823, se = 0.106), age (log OR = 0.007, se = 0.002) and age square (log OR = −0.002, se = 2e−4). The selection bias parameter was estimated to be 0.320 (se = 0.115) by IPW, 0.385 (se = 0.135) by OR and 0.280 (se = 0.101) by DR estimation. This provides strong evidence that unmeasured confounding may be present and the stronger saving preference one has, the more likely one would participate in the 401(k) plan. All three estimators of the marginal ETT also agree with each other: they are significant but with a smaller Z-score value than when the selection bias is ignored (for example, the IPW estimator suggests , se = 0.013). The efficient estimator for the selection bias parameter is 0.273 and for the ETT is 0.137, both in agreement with the other three estimators. Thus we may conclude that even after adjustment for unobserved preferences for savings, the 401(k) plan still can increase net financial assets among participants.
Table 2:
Point estimates and estimated se [in bracket] of IPW, OR and DR estimators for ETT of 401(k) plan as well as the parameters for IV, extended propensity score and outcome regression outcome models required by those estimators
| IV model | IPW propensity | regression | DR propensity | |
|---|---|---|---|---|
| Intercept | −0.180 [0.058] | −8.685 [1.832] | 1.307 [0.073] | −8.629 [1.796] |
| linc | 2.695 [0.107] | 1.626 [0.210] | 0.618 [0.128] | 1.633 [0.209] |
| age | 0.007 [0.002] | −0.009 [0.005] | 0.035 [0.003] | −0.009 [0.005] |
| fsize | −0.037 [0.019] | −0.004 [0.033] | −0.127 [0.022] | −0.005 [0.033] |
| marr | −0.145 [0.063] | −0.032 [0.108] | −0.133 [0.075] | −0.031 [0.108] |
| age2 | 0.002 [2e-04] | 0.001 [4e-04] | 6e-04 [3e-04] | 0.001 [4e-04] |
| Z | 9.150 [1.820] | −0.210 [0.074] | 9.126 [1.781] | |
| α | 0.320 [0.115] | 0.385 [0.135] | 0.280 [0.101] | |
| ψ = E(Y0|A = 1) | 0.749 [0.012] | 0.746 [0.012] | 0.750 [0.012] | |
| ETT | 0.134 [0.013] | 0.137 [0.014] | 0.132 [0.014] | |
These findings roughly agree with results obtained by Abadie in the sense that the IV estimate corrects the observational estimate towards the null. However, it may be difficult to directly compare our findings to those of Abadie who reported the compliers average treatment effect under a monotonicity assumption of the IV-exposure relationship, and assuming no unmeasured confounding of this first stage relation. Our approaches rely on neither assumption, but instead rely on condition 1 encoded in the functional form of the extended propensity score model for identification. In order to assess the robustness of the selection bias model, additional functional forms were explored. We considered adding to α an interaction between Y0 and each of the covariates: log income, marriage status, family size. There was no evidence in favor of any such interaction.
7. Discussion
In this paper, we establish that access to an IV allows for identification of an association between exposure to the treatment and the potential outcome when unexposed, which directly encodes the magnitude of selection bias into treatment due to confounding. We propose IPW, OR as well as DR estimators for the treatment effect amongst treated individuals. Vansteelandt and Goetghebeur (2003) and Robins (1994) proposed identification and inference approaches under no-current treatment value interaction assumption, thus their estimators remain consistent under the null hypothesis of no ETT. In contrast, the identification and inference approaches we proposed may be particularly valuable when an ITT analysis indicates a non-null treatment effect and thus Robins’ identification assumption of no-current treatment value interaction may be violated.
The proposed methods assume the treatment is binary. They can be generalized without much effort to categorical treatment. However, when the treatment is continuous (for example, A is treatment dose), then a parametric model for the treatment effect as well as a model for the density of A may be unavoidable for estimation. We leave this as a topic for future research.
Supplementary Material
Acknowledgements
The content is solely the responsibility of the authors. Professor Eric Tchetgen Tchetgen is supported by R01 AI032475, R21 AI113251, R01 ES020337, R01 AI104459. Wang Miao is supported by China Scholarship Council.
Footnotes
In another line of research, Imbens and Angrist (1994) and Angrist et al. (1996) defined the treatment effect on individuals who would comply to their assigned treatment. Under a monotonicity assumption about the effect of the IV on exposure, the complier average treatment effect can be identified. Further research along these lines include fully parametric estimation strategies (Tan, 2006; Barnard et al., 2003; Frangakis et al., 2004) as well as semiparametric methods (Abadie, 2003; Abadie et al., 2002; Tan, 2006; Ogburn et al., 2014).
Supplementary Material
Appendix A contains proofs of the propositions. Appendix B presents proofs of the examples in the main text, and more examples about identification of the models. Appendix C presents more derivations mentioned in the main text. Appendix D presents derivations of semiparametric efficiency theory.
References
- Abadie A (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics, 113:231–263. [Google Scholar]
- Abadie A, Angrist J, and Imbens G (2002). Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica, 70:91–117. [Google Scholar]
- Angrist J (1995). Using social security data on military applicants to estimate the effect of voluntary military service on earnings. [Google Scholar]
- Angrist J, Imbens G, and Rubin D (1996). Identification of causal effects using instrumental variables. Journal of the American, statistical Association, 91:444–455. [Google Scholar]
- Barnard J, Frangakis C, Hill J, and Rubin D (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in new york city. Journal of the American Statistical Association, 98:299–323. [Google Scholar]
- Clarke P, Palmer T, and Windmeijer F (2015). Estimating structural mean models with multiple instrumental variables using the generalised method of moments. Statistical Science, 30:96–117. [Google Scholar]
- Frangakis C, Brookmeyer R, Varadhan R, Safaeian M, Vlahov D, and Strathdee S (2004). Methodology for evaluating a partially controlled longitudinal treatment using principal stratification, with application to a needle exchange program. Journal of the American Statistical Association, 99:239–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldberger A (1972). Structural equation methods in the social sciences. Econometrica: Journal of the Econometric Society, pages 979–1001. [Google Scholar]
- Greenland S and Robins J (1986). Identifiability, exchangeability, and epidemiological confounding. International Journal of Epidemiology, 15:413–419. [DOI] [PubMed] [Google Scholar]
- Hahn J (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pages 315–331. [Google Scholar]
- Hasselman B (2014). nleqslv: Solve systems of non linear equations. R package version 2.1.1. [Google Scholar]
- Heckman J (1997). Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations. Journal of Human Resources, 32:441–62. [Google Scholar]
- Heckman J, Ichimura H, and Todd P (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. The review of economic studies, 64:605–654. [Google Scholar]
- Heckman J, Ichimura H, and Todd P (1998). Matching as an econometric evaluation estimator. The Review of Economic Studies, 65:261–294. [Google Scholar]
- Hernán M and Robins JM (2006). Instruments for causal inference: an epidemiologist’s dream? Epidemiology, 17:360–372. [DOI] [PubMed] [Google Scholar]
- Imbens G and Angrist J (1994). Identification and estimation of local average treatment effects. Econometrica: Journal of the Econometric Society, pages 467–475. [Google Scholar]
- Matsouaka RA and Tchetgen Tchetgen EJ (2014). Likelihood based estimation of logistic structural nested mean models with an instrumental variable. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miettinen O (1974). Proportion of disease caused or prevented by a given exposure, trait or intervention. American journal of epidemiology, 99:325–332. [DOI] [PubMed] [Google Scholar]
- Newey WK and McFadden D (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, 4:2111–2245. [Google Scholar]
- Ogburn E, Rotnitzky A, and Robins J (2014). Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearl J (1995). On the testability of causal models with latent and instrumental variables In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 435–443. Morgan Kaufmann Publishers Inc. [Google Scholar]
- Poterba J and Venti S (1994). 401 (k) plans and tax-deferred saving In Studies in the Economics of Aging, pages 105–142. University of Chicago Press. [Google Scholar]
- Poterba J, Venti S, and Wise D (1995). Do 401 (k) contributions crowd out other personal saving? Journal of Public Economics, 58:1–32. [Google Scholar]
- Robins J (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health service research methodology: a focus on AIDS, 113:159. [Google Scholar]
- Robins J (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods, 23:2379–2412. [Google Scholar]
- Robins J and Ritov Y (1997). Toward a curse of dimensionality appropriate (coda) asymptotic theory for semi-parametric models. Statistics in medicine, 16:285–319. [DOI] [PubMed] [Google Scholar]
- Robins J and Rotnitzky A (2004). Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika, 91:763783. [Google Scholar]
- Robins J, Rotnitzky A, and Scharfstein D (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models In Statistical models in epidemiology, the environment, and clinical trials, volume 116, pages 1–94. Springer. [Google Scholar]
- Tan Z (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association, 101:1607–1618. [Google Scholar]
- Tan Z (2010). Marginal and nested structural models using instrumental variables. Journal of the American Statistical Association, 105:157–169. [Google Scholar]
- Vansteelandt S and Goetghebeur E (2003). Causal inference with generalized structural mean models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65:817–835. [Google Scholar]
- Varadhan R and Gilbert P (2009). BB: An r package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software, 32:1–26. [Google Scholar]
- Wright S (1928). Appendix to the tariff on animal and vegetable oils. New York: MacMillan.(1934),” The Method of Path Coefficients,” Annals of Mathematical Statistics, 5:161–215. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
