Summary
We consider estimating causal odds ratios using an instrumental variable under a logistic structural nested mean model (LSNMM). Current methods for LSNMMs either rely heavily on possible “uncongenial” modeling assumptions or involve intricate numerical challenges, which have impeded their use. In this article, we present an alternative method that ensures a congenial parametrization, circumvents computational complexity of existing methods, and is easy to implement. We illustrate the proposed method to (1) estimate the causal effect of years of education on earnings using data from the NLSYM and (2) assess the impact of moving families from high to low-poverty neighborhoods had on lifetime major depressive disorder among adolescents in the “Moving to Opportunity (MTO) for Fair Housing Demonstration Project” from the Department of Housing and Urban Development.
Keywords: Causality, Instrumental variable, Odds ratio, Confounding, Structural model, Non-compliance
1. Introduction
Instrumental variable (IV) methods are used to estimate, under certain assumptions, the effects of an exposure on an outcome when unobserved confounding is suspected to be present. An IV is a pre-exposure variable that, conditional on a set of measured baseline covariates, is (1) associated with the exposure, (2) associated with the outcome only through the exposure, that is, there is no direct effect of the IV on the outcome upon intervening on exposure (also known as the exclusion restriction assumption), and (3) independent of any unmeasured confounding variable of the effects of the exposure on the outcome (Vansteelandt and others, 2011).
In clinical trials affected by non-compliance, random assignment to treatment (the exposure in this context) is often used as an IV, whereas in observational studies the choice of a valid IV is challenging (Martens and others, 2006; Murray, 2006; Glymour and others, 2012). Nevertheless, once an appropriate IV has been identified, it may potentially be used to recover a consistent estimate of the effect of an exposure despite the presence of unmeasured confounding.
Compared to IV methods for continuous outcomes, those for binary outcomes have received far less attention. Our aim is to estimate the conditional causal odds ratio (COR), that is, the effect of an exposure on a binary outcome, conditional on the exposure level, IV, and measured covariates under an LSNMM. As demonstrated by Robins (1994), one cannot identify the COR solely based on the standard IV assumptions (1)–(3). In fact, Robins and Rotnitzky (2004) show explicitly that the resulting causal model is overparametrized, in the sense that the likelihood for the LSNMM depends on more unknown parameters than the observed data likelihood and thus the model must somehow be restricted for identification purposes.
Any viable approach to identify and estimate the COR requires additional restrictions beyond the standard IV assumptions (Vansteelandt and Goetghebeur, 2005; Liu and others, 2015).
Both Vansteelandt and Goetghebeur (2003) and Robins and Rotnitzky (2004) make the “no-current interaction assumption” (Assumption (4)), but develop distinct approaches for estimating the COR parameter of an LSNMM under Assumptions (1)–(4). Vansteelandt and Goetghebeur (2003) posit a so-called association model for the exposure, IV and covariates with the binary outcome. However, when the association model is not saturated, it is prone to model incompatibility with the LSNMM (Robins and Rotnitzky, 2004; Vansteelandt and others, 2011). Robins and Rotnitzky (2004) develop an alternative approach that avoids this limitation using a parametrization that is always compatible. Unfortunately, their approach can be computationally prohibitive and challenging, especially with a continuous IV. It requires repeatedly solving a complicated integral equation numerically for each observation used in the process of finding an estimator of the LSNMM, which makes it difficult to implement.
The purpose of this article is to develop a different strategy to estimate the parameters of an LSNMM under assumptions (1)–(4) using a novel parametrization, which resolves the aforementioned difficulties with previous estimators of an LSNMM. Unlike Vansteelandt and Goetghebeur (2003), but similar to Robins and Rotnitzky (2004), we use a (variation independent) compatible parametrization of the observed data likelihood. Furthermore, unlike Robins and Rotnitzky (2004), our approach does not involve solving iterative integral equations and is thus readily implementable regardless of the nature or the dimension of the exposure, IV, and covariates.
Using the potential outcome framework, in Section 2, we lay out the IV conditions, present the causal effect of interest, and describe identification assumptions. We continue in Section 3 with a review of existing estimation methods for the LSNMM. In Section 4, we introduce the new parametrization, show how it contrasts with the approach of Robins and Rotnitzky (2004), and demonstrate some of its important properties. Using the proposed parametrization, we also present a straightforward maximum likelihood approach for estimating the LSNMM. In addition, we provide a goodness-of-fit (GOF) test statistic which is useful to evaluate parametric assumptions about nuisance parameters of the fitted likelihood model under Assumptions (1) through (4). In Section 5, we report a simulation study for both a binary and a continuous exposure, respectively, using continuous baseline covariates. We also closely examine the finite-sample performance of the proposed GOF test statistic. Finally, we apply our method to two different data sets to assess the impact of years of education on wages in the United States National Longitudinal Survey of Young Men (NLSYM) study and to evaluate the effects of moving from high-poverty to low-poverty neighborhoods on lifetime major depressive disorder among youth using data from the US Department of Housing and Urban Development “Moving to Opportunity (MTO) for Fair Housing Demonstration Project” (reported in the online supplementary material available at Biostatistics online).
2. Notation and assumptions
Suppose, we observe
independent and identically distributed copies of the vector
where
is an IV for the effect of an exposure
on an outcome
given a set of measured baseline covariates
. The vector
includes all measured confounders of the effects of
on
and for the effects of
on
. We define the potential outcome
as the outcome we would have observed had, possibly contrary to fact,
been set to
and
set to
by external intervention. Likewise,
denotes the outcome had
been set to
. To identify causal effects, we make the consistency assumption that if a person is assigned to a specific exposure
and an IV level
, their observed outcome
coincides with their potential outcome
(Robins and Rotnitzky, 2004). The IV conditions presented in the introduction may now be formally expressed as: (1) non-null association between
and 

; (2) exclusion restriction:
almost surely for all
; and (3) independence of potential outcomes and IV: 

for all
The notation 

indicates stochastic independence between variables A and B given C.
We define an LSNMM as
| (2.1) |
where
for any
Throughout the article, we will index the LSNMM with a finite dimensional parameter
, that is,
The primary inferential goal of this article is to identify and estimate the parameter
. The function
represents a causal contrast; it compares the average potential outcomes, on the log odds scale, for the subset of the population with
. As such,
characterizes a COR as a function of
and may be used to evaluate heterogeneity of the exposure causal effect by the IV and other pre-exposure variables. Our focus on CORs aligns with a common practice in the analysis of observational studies where measured baseline covariates are used to control for confounders. CORs are of interest in settings where the effect of reducing the exposure to
is investigated within subgroups of patients who share similar characteristics
. Moreover, such conditional exposure effects are likely to be more transportable across populations (Vansteelandt and Keiding, 2011; Burgess, 2013).
Assumptions (1)–(3) do not suffice to identify
(see Robins and others, 2000). Both Vansteelandt and Goetghebeur (2003) and Robins and Rotnitzky (2004) make as an additional assumption (4) the “no current treatment value interaction” assumption
almost surely, for
a function of
and
only, that is, there is no effect modification of the exposure by the IV. Therefore, conditional on the observed covariates, the effect of treatment is constant for treated individuals across levels of the IV
Familiar choices for
are
or
, where
is a component of
Throughout this article, we assume that
is correctly specified with unknown parameter
, which is identified under assumptions (1)–(4).
We would like to point out that Assumption (4) is not empirically testable without making an additional assumption. This is because under assumptions (1)–(4) the likelihood is in fact a perfect fit to the observed data (see Tchetgen Tchetgen and Vansteelandt, 2013 for further details). Vansteelandt and Goetghebeur (2005) and Clarke and Windmeijer (2010) studied the impact on inference of possible violation of Assumption (4). Alternative identification conditions other than (4) are of great interest and constitute an ongoing research topic (Tchetgen Tchetgen and Vansteelandt, 2013). Richardson and Robins (2010) and Richardson and others (2011) elucidated the issue of identification and a careful analysis of identification in a basic binary IV model.
3. Review of SNMM estimations for binary outcomes
3.1. Double-logistic estimator
3.1.1. Estimation:
From (2.1),
, where
. Using this relationship, Vansteelandt and Goetghebeur (2003) suggested modeling the observed association
to derive an unbiased estimating equation for
. They developed the so-called double-logistic estimator and showed that
can be identified if one postulates an association model
using the observed outcome
, the exposure
and the IV 
To estimate
, an estimator
of
is first obtained from
using for instance maximum likelihood estimation. Then, combining the association model
with the LSNMM (2.1) yields an unbiased prediction
of the average
of counterfactual outcome
within levels of
for each subject
,
. A consistent point estimator
of
can finally be obtained by solving the estimating equation
where
is an arbitrary function of
and
The choice of
does not affect consistency but does affect efficiency (see Robins, 1994; Clarke and others, 2015; or Vansteelandt and others, 2011 for optimal choices that yield an efficient estimator of
).
3.1.2. Models congeniality:
To identify a consistent estimator of
stemming from this method depends largely on how the association model reproduces major features of the data. When the association model is not saturated, it can be incompatible with the LSNMM (2.1) and lead to an inconsistent estimator (Robins and Rotnitzky, 2004; Vansteelandt and others, 2011). Even worse, for some values of
, there may be no solution
to the estimating equation. Model incompatibility or lack of model congeniality (to use the terminology of Meng (1994) and Vansteelandt and others (2011)) arises when two models for the observed data cannot hold simultaneously for all parameter values allowed by the respective models. As a consequence there is no data generating mechanism for which both can hold, which is worse than model misspecification. In the latter, there exist data generating mechanisms corresponding to the model; however, none of them coincide with the mechanism leading to the observed data. As we know, with a large number of covariates, it is a daunting task to fit the saturated model that includes all available covariates and all possible higher order interactions. Therefore, one may instead need to use a parsimonious model that is flexible enough to capture important features of the data. Unfortunately, it is exactly when parametric restrictions are imposed on the association model—particularly with respect to the main effects of
along with its interactions with some relevant covariates in
—that we may have major inconsistencies between the association model
and the LSNMM (2.1), leading to a lack of models congeniality and a noisy COR estimation.
3.2. Congenial parametrization of Robins and Rotnitzky
To guarantee a parametrization that is always congenial, Robins and Rotnitzky (2004) proposed one based on the contrast
, which encodes the degree of unobserved confounding and referred to as the selection bias function. In absence of exposure (i.e.,
) or if there is no unmeasured confounding (i.e., 

),
.
Using the selection bias function
, we have
| (3.1) |
is the unique solution to the integral equation
with
and
the cumulative distribution function (CDF) of the conditional (exposure) density probability function
of the random variable
. In other words,
is a functional of
and
implicitly defined by the integral equation that must be solved for each observation.
Estimation and numerical optimization burden: Parametric working models
,
,
are postulated to make inference. The MLE of
maximizes
where
is evaluated at
, the solution of the integral equation. As previously stated, the parametrization of Robins and Rotnitzky (2004) has the advantage of providing an association model that is always compatible with the LSNMM (2.1). Unfortunately, for most choices of models for
,
and
, the required integral equation cannot be solved for
in closed form, except when
is binary.
Numerical optimization of the joint density of the observables under this parametrization involves finding, for each
, numerical solutions
from the integral equation for each observed
, within each iteration of the algorithm (Robins and Rotnitzky, 2004; Vansteelandt and others, 2011). In fact, when the exposure takes more than two values, or is continuous or multivariate, this approach is computationally challenging, particularly when the IV is continuous and there is a large number of covariates
. To put things in perspective, this means if we have 500 subjects in the data set and if the optimization algorithm requires 100 iterations to converge, we will need to solve 50 000 integral equations in total to find the final estimate of the causal odds ratio. This numerical drawback has impeded the widespread use of this approach, despite its mathematical and theoretical underpinning.
4. New parametrization
We now propose a different congenial parametrization that obviates the need to solve integral equations. Let
denote the conditional density function of the random variable
(or the probability mass function if
is discrete) and
its corresponding CDF. While Robins and Rotnitzky (2004) parametrize the conditional density function
(among other things) to get to the parametric model (3.1), our parametrization uses the conditional density function
. All proofs related to this section are given in the supplementary material available online at http://www.biostatistics.oxfordjournals.org.
Define
We show (in the supplementary material available at Biostatistics online) that
is equal to
The parametric model (3.1) becomes,
Under this parametrization, we are free to choose models for
,
, and
The density
is determined by
Under Assumptions (1)–(3) and the proposed parametrization, we have the following key result:
Theorem 1: We have
is equal to
if and only if 
This result gives one the freedom to posit variation independent parametric models for
,
, and
such that the marginalization property of the result will hold for all parameter values, even if all models are incorrect.
4.1. Maximum likelihood estimation
Let
and
the density functions of the random variables
and
, respectively. The observed data likelihood factorizes as
To draw inference under the new parametrization, we obtain the maximum likelihood estimator (MLE) of
using parametric models
and
Using models for
and 
is derived from
. Thus, the likelihood becomes
Such a likelihood can be maximized using PROC NLMIXED in SAS or the optim function in R. For most choices of the selection bias function
and of the distribution
we make in practice, the integral
will have a closed form solution. Nevertheless, when a choice of
and
does not lead to a closed form expression of
the latter can be approximated numerically, say using Gauss-Hermite quadrature integral approximation (Liu and Pierce, 1994) or by Monte-Carlo simulation, and be easily incorporated in any standard software code.
Finally, the MLE of
is uncorrelated with that of
. In fact, we need not estimate the latter to obtain an estimate of the former. Thus, the MLE of
cannot exploit any prior information about
such as the known randomization probability in a randomized experiment. However, as we show in Section D of the supplementary material, available at Biostatistics online one can leverage knowledge about the law of
given
to construct a GOF test statistic for nuisance models of the likelihood function derived in this section, which is asymptotically normal with mean zero only if the likelihood is correctly specified. The GOF statistic is based on an influence function for
in a model where the likelihood is otherwise unrestricted, and therefore, it naturally accounts for variability of all unknown nuisance parameters under the null of no model misspecification.
5. Simulation study
In this section, we provide a data generating process following our proposed parametrization. We sampled the baseline covariates
from independent bivariate normal distributions such that
and
with correlation coefficient of 0.5. Then, we generated a binary IV 
and defined
and
. In addition, we specified the distribution
generated
with density
and derived
where
Finally, we made the simple choice 
Overall, we generated a total of 2000 data sets of size
and estimated
, the empirical type I error, and the power of the GOF test statistic, that is, the proportion of simulated data sets for which
. We run the simulations using SAS PROC NLMIXED.
5.1. Binary exposure
Let
and
. We have
and
We generated the binary exposure

For each parameter, we report in Table 1 the bias, the mean square error (MSE), and the coverage probability, that is, the proportion of 95% confidence intervals that covered the true parameter. These results highlight the good performance of our approach, with small bias and small MSE. Furthermore, coverage probabilities hover around 95%, reflecting good coverage. The Monte-Carlo type I error rate of the GOF test statistic is equal to 0.012 indicating the GOF rejects the null hypothesis less often than at the nomimal level of a correctly specified model, which may partially reflect the conservative variance estimator
used to construct the test statistic.
Table 1.
Simulation Results: Binary Exposure
| Model | Parameter | Bias | MSE | Coverage | S.E. |
|---|---|---|---|---|---|
![]() |
![]() |
0.002 | 0.102 | 0.96 | 0.319 |
![]() |
![]() |
0.004 |
0.106 | 0.95 | 0.326 |
![]() |
0.005 | 0.007 | 0.95 | 0.083 | |
![]() |
0.002 |
0.001 | 0.95 | 0.038 | |
![]() |
![]() |
0.001 |
0.001 | 0.94 | 0.033 |
![]() |
0.001 | 0.003 | 0.95 | 0.054 | |
![]() |
0.006 | 0.031 | 0.96 | 0.177 | |
![]() |
![]() |
0.000 | 0.002 | 0.95 | 0.041 |
![]() |
0.001 |
0.002 | 0.95 | 0.040 | |
![]() |
0.001 |
0.004 | 0.95 | 0.065 | |
![]() |
![]() |
0.000 | 0.001 | 0.94 | 0.032 |
![]() |
0.001 | 0.000 | 0.95 | 0.030 |
Corresponding GOF test: Type I error = 0.012
5.2. Continuous exposure
Consider
such that
We have
We show in Section E of the supplementary material available at Biostatistics online that
follows a mixture of two normal distributions with density
where,
for
Estimation results are summarized in Table 2. Similar to binary exposure, the results confirm small bias and MSE as well as good coverage probability. This indicates that our approach performs very well. The realized type I error for the specification test of nuisance models as a goodness-of-fit test gave a type I error equal to 0.039, which—although somewhat better than for binary exposure—is still conservative.
Table 2.
Simulation Results: Continuous Exposure
| Model | Parameter | Bias | MSE | Coverage | S.E. |
|---|---|---|---|---|---|
![]() |
![]() |
0.001 | 0.001 | 0.95 | 0.042 |
![]() |
![]() |
0.000 | 0.003 | 0.95 | 0.052 |
![]() |
0.000 | 0.004 | 0.94 | 0.062 | |
![]() |
0.000 | 0.000 | 0.95 | 0.016 | |
![]() |
![]() |
0.000 | 0.000 | 0.95 | 0.014 |
![]() |
0.001 |
0.002 | 0.94 | 0.042 | |
![]() |
0.001 |
0.000 | 0.95 | 0.010 | |
![]() |
0.003 | 0.029 | 0.94 | 0.172 | |
![]() |
![]() |
0.000 | 0.005 | 0.95 | 0.073 |
![]() |
0.002 | 0.003 | 0.95 | 0.054 | |
![]() |
0.004 |
0.019 | 0.95 | 0.138 | |
![]() |
![]() |
0.001 | 0.001 | 0.95 | 0.041 |
![]() |
0.002 | 0.001 | 0.95 | 0.040 |
Corresponding GOF test: Type I error = 0.039.
5.3. Power of the goodness-of-fit test
In addition to the type I error, we also assessed the power of the GOF test statistic to detect the presence of model misspecification under various departures from the assumed likelihood model. We considered different misspecifications of models for
and
The results, presented in Table 3, show that the power of the GOF test varies depending on the type of misspecification. Greatest power is achieved when
is omitted from the model
Relatively lower power was observed for leaving out
interaction term from
Moderate power (
) was observed for related misspecifications of 
Table 3.
Goodness-of-fit Test: Power to Detect Departures from the True Models
| Misspecified Model | Missing covariates | Parameter Values
|
Power |
|---|---|---|---|
| (1) Binary Exposure | |||
![]() |
![]()
|
0.6, 1.5 |
0.41 |
![]() |
1.5 | 0.40 | |
![]() |
![]() |
0.8 | 0.03 |
![]() |
1.5 | 0.89 | |
| (2) Continuous Exposure | |||
![]() |
0.5 |
0.95 | |
![]() |
![]()
|
0.6, 1.5 |
0.62 |
![]() |
0.6 | 0.43 | |
![]() |
![]() |
0.8 | 0.06 |
![]() |
0.6 | 0.88 |
Covariates (with corresponding parameter values) used in the generated model, but omitted in the fitted model.
We observed similar patterns for both binary and continuous exposures (Table 3). The power of the proposed GOF test ranges a fair amount depending on the form of model misspecification. For instance, when we specify only the main effects of the baseline covariates and ignore interaction terms in the model for
, the power to reject the misspecified model is low. However, when the true model does not have an interaction and a main effect term for the true model for
is omitted, the GOF test rejects the posited model with substantially higher power. Finally, omitting the quadratic term in the model for
results in relatively moderate power.
6. Data application
To illustrate the proposed method, we analyze two different data sets one with a continuous exposure and the other with binary exposure. The first application uses data from the 1976 subset of the United States NLSYM, looking at the effect of years of education on earnings based on a sample of 3010 working men age 24–34 (Card, 1995).
Our second application considers the impact moving low income famillies from high poverty to low poverty neighborhoods had on lifetime major depressive disorder among adolescents in the MTO study (see Section G of the suppplementary material available at Biostatistics online).
The effect of years of education on wages
There has been a longstanding interest in the causal impact of duration of schooling on earnings (Card, 1995; Heckman and others, 2006). Following Card (1995), we use the indicator
of whether a study participant lived in the proximity of a four-year college in 1966 as an IV to study the effect of years of education on hourly wages. For illustration purposes, we use this classical example to estimate the effect of education on
, the indicator of earning an hourly wage greater than or equal to the median (i.e., 537.5 cents).
We consider 12 years of education as the reference point and define
=Years of Education -12. The vector of potential confounding variables
includes mother’s and father’s years of education as well as the indicators of whether the person is black; lived with both natural parents, with one natural parent and one step parent, or with mother only at age 14; lived in one of the nine regions of residence or in a standard metropolitan statistical area (SMSA); and their family had a library card when he was 14 years old. Missing covariates values were imputed using simple imputation.
Although Card (1995) makes a compelling case for the validity of this choice of IV, one cannot rule out with certainty that there may be other factors, such as family or neighborhood characteristics, changes in the institutional structure of the education system, that are associated with the instrument
and can affect hourly wages apart from years of education. Nevertheless, for expository purposes, we focus our illustration on this instrument.
To run the LSNMM, we consider the following parametric models: 


and
.
The goodness-of-fit test statistic p-value is equal to 0.26. This indicates that there is no evidence that the data are not consistent with the likelihood model we have used, assuming the model for the causal contrast
is correctly specified.
Table 4 reports the results for
and
, where
(95% CI: 0.23–0.94). Based on these results, for given values of
we can infer the effects of
years of education on the probability of earning an hourly greater or equal to the median wage in 1976. For example, a black study participant with a high-school diploma, who lived with a high-school-educated single mother (i.e., 12 years of education) in a metropolitan area and whose family had a library when he was 14 years old, the odds of earning an hourly wage greater than or equal to the median (hourly wage in 1976) would have been 3.5 times what they are currently had he received 15 years of education instead. That is, for a study participant with
and
, the corresponding odds ratio is
.
Table 4.
Effect of Years of Education on Earning Abilities
| Model | Parameter | Estimate | S.E. | P-value | 95% Conf. | Interval |
|---|---|---|---|---|---|---|
) |
0.58 | 0.18 | 0.0012 | 0.23 | 0.94 | |
| black | 0.06 | 0.07 | 0.44 |
0.09 |
0.20 | |
| library card |
0.20 |
0.07 | 0.004 |
0.33 |
0.06 |
|
| lived with mom and dad | 0.14 | 0.10 | 0.17 |
0.06 |
0.34 | |
| mother’s education |
0.001 |
0.004 | 0.83 |
0.01 |
0.01 | |
| region 2 | 0.10 | 0.15 | 0.48 |
0.19 |
0.40 | |
| region 3 |
0.10 |
0.14 | 0.49 |
0.38 |
0.18 | |
| region 4 |
0.20 |
0.17 | 0.22 |
0.53 |
0.13 | |
![]() |
region 5 | 0.11 | 0.14 | ![]() |
0.17 |
0.40 |
| region 6 | 0.04 | 0.16 | 0.81 |
0.27 |
0.34 | |
| region 7 | 0.27 | 0.15 | ![]() |
0.04 |
0.57 | |
| region 8 |
0.19 |
0.23 | ![]() |
0.66 |
0.27 | |
| region 9 |
0.34 |
0.18 | 0.06 |
0.69 |
0.01 | |
| lived with single mom | 0.12 | 0.54 | 0.01 |
0.17 |
0.32 | |
| SMSA |
0.13 |
0.06 | 0.023 |
0.24 |
0.02 |
|
| lived with step dad |
0.18 |
0.11 | ![]() |
0.40 |
0.05 |
The binary outcome
is the indicator of whether a hourly wage is greater than or equal to the median wage in 1976 of 537.5 cents.
In addition, as shown in the full table (Section F of the supplementary material available at Biostatistics online), the selection bias function is significantly different from zero providing explicit empirical evidence of the impact of unobserved confounding. This bias does not appear to depend on the interaction between years of education and the IV, but does depend on years of education, on the interaction between years of education and the family having a library card, and on the interaction between years of education and geographic location (region 9).
7. Conclusion
In this article, we have presented a new parametrization for a logistic structural nested mean model (LSNMM) for a binary outcome and we have proposed a corresponding maximum likelihood approach for estimation. Our approach builds upon the theoretical framework of Vansteelandt and Goetghebeur (2003) and Robins and Rotnitzky (2004). Unlike Vansteelandt and Goetghebeur (2003), and similar to Robins and Rotnitzky (2004), our approach yields a parametric model that is guaranteed to always be congenial (or compatible) with the LSNMM. However, unlike Robins and Rotnitzky (2004), we obviate the need to numerically solve integral equations, which can be computationally cumbersome and is not easily scalable with the dimension of the exposure $X$. In addition, a key attraction of our approach is that it is readily implemented using standard statistical software. Our simulation results confirm the good performance of the proposed approach. To illustrate our approach, we applied it using two different data sets, one with a binary exposure (whether a single-mother moved with her family out of a poor neighborhood) and the other with a continuous exposure (the number of years of education).
Our simulations showed that the proposed GOF is quite conservative in the settings we considered and its power to detect possible departures from the assumed model can be moderate to low. In some settings, the low power of the GOF statistic may also reflect the conservative estimate of variance used to standardize the statistic. The main advantage of the current GOF is its simplicity. In future work, we plan to further study the performance of the GOF statistic when standardized by a consistent estimator of its variance, which was not considered in the foregoing due to severe computational roadblocks.
As previously discussed, the MLE of
is uncorrelated with that of
. In fact, we need not estimate the latter to obtain an estimate of the former. This, in turn, implies that the MLE of
cannot exploit any prior information about
such as the known randomization probability in a randomized experiment. This is a notable limitation of the likelihood approach. To remedy this problem Vansteelandt and Goetghebeur (2003) and Robins and Rotnitzky (2004) propose methods that are doubly robust under the sharp null hypothesis,
, of no exposure causal effect by explicitly using any available knowledge about
. Robins and Rotnitzky (2004), in particular, propose to use an influence function of
for inference, in the semiparametric model defined by Assumptions (1)–(4) only, which is endowed with the above robustness property, but suffers the same computational limitations as their likelihood approach.
An alternative approach to the likelihood can be obtained by solving the estimating equation of Robins and Rotnitzky (2004) under our proposed parametrization, which is not further pursued here. In addition, to assess the exclusion restriction Assumption (2) they suggest as possible analytical strategy a sensitivity analysis by varying their so-called “weak exclusion assumption function” over a possible range. Such an approach can also be used with our proposed parametrization. Finally, the method we have described in this article assumes random sampling and therefore would not be directly applicable for case-control sampling or for other outcome dependent sampling designs. A straightforward adjustment for the sampling design entails applying inverse-probability weighting (IPW) for selection into the sample. However, weighting may potentially be inefficient. A more efficient approach that makes use of the recent developments in the analysis of secondary outcomes in case-control studies (see Sofer and others, 2014; Tchetgen Tchetgen, 2014, extending the methods presented herein will be described elsewhere.
It is worth noting that direct comparison of the three approaches discussed herein would be difficult if not impossible in the context of simulation since it would be challenging to posit models for the nuisance parameters that agree under the different parametrizations. However, in the absence of covariates, or in the presence of low dimensional covariates allowing for the use of saturated models, we generally expect all methods would perform reasonably well in practice.
Supplementary Material
Acknowledgments
The authors acknowledge research support from the National Institutes of Health (NIH). We are also grateful to Nicole Schmidt for helpful comments and suggestions regarding the MTO data. Conflict of Interest: None declared.
Funding
Both authors were supported by NIH grants 1R01MD006064, 1R21HD066312 (T. Osypuk, PI) and R21ES019712 (R. A. Matsouaka) and 1R21ES019712, R01AI104459 and R01HL080644 (E. J. Tchetgen Tchetgen).
Supplementary materials
supplementary material is available online at http://biostatistics.oxfordjournals.org.
References
- Burgess S. (2013). Identifying the odds ratio estimated by a two-stage IV analysis with a logistic regression model. Statistics in Medicine 32, 4726–4747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Card D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In: Christofides Louis N. Kenneth Grant, E. and Swidinsky Robert (editors), Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp. Toronto: University of Toronto Press, pp. 201–222. [Google Scholar]
- Clarke P. S. and Windmeijer F. (2010). Identification of causal effects on binary outcomes using structural mean models. Biostatistics 11, 756–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke P. S. Palmer T. M. and Windmeijer F. (2015). Estimating structural mean models with multiple IVs using the generalised method of moments. Statistical Science 30(1), 96–117. [Google Scholar]
- Glymour M. M. Tchetgen E. J. T. and Robins J. M. (2012). Response to letters on “Credible mendelian randomization studies: approaches for evaluating the instrumental variable assumptions”. American Journal of Epidemiology 176, 458–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heckman J. J. Lochner L. J. and Todd P. E. (2006). Earnings functions, rates of return and treatment effects: The mincer equation and beyond. Handbook of the Economics of Education 1, 307–458. [Google Scholar]
- Liu Q. and Pierce D. A. (1994). A note on gauss–hermite quadrature. Biometrika 81, 624–629. [Google Scholar]
- Liu L. Miao W. Sun B. Robins J. and Tchetgen Tchetgen E. J. (2015). Doubly Robust Estimation of a Marginal Average Effect of Treatment on the Treated With an Instrumental Variable. Harvard University, Biostatistics Working Paper Series Paper 191. [Google Scholar]
- Martens E. P. Pestman W. R. de Boer A. Belitser S. V. and Klungel O. H. (2006). Instrumental variables: application and limitations. Epidemiology 17, 260–267. [DOI] [PubMed] [Google Scholar]
- Meng X. L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science 9(4), 538–558. [Google Scholar]
- Murray M. P. (2006). Avoiding invalid instruments and coping with weak instruments. The Journal of Economic Perspectives 20, 111–132. [Google Scholar]
- Richardson T. S. and Robins J. M. (2010). Analysis of the binary instrumental variable model. In Dechter R. Geffner H. and Halpern J.Y. (editors). Heuristics, Probability and Causality. A Tribute to Judea Pearl. College Publications, pp. 415–444. [Google Scholar]
- Richardson T. S. Evans R. J. and Robins J. M. (2011). Transparent parameterizations of models for potential outcomes. Bayesian Statistics 9, 569–610. [Google Scholar]
- Robins J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and methods 23, 2379–2412. [Google Scholar]
- Robins J. and Rotnitzky A. (2004). Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika 91, 763–783. [Google Scholar]
- Robins J. M Rotnitzky A. and Scharfstein D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Holloran E and Berry D (editors), Statistical Models in Epidemiology, the Environment, and Clinical Trials. New York: Springer, pp. 1–94. [Google Scholar]
- Sofer T. Cornelis M. C. Kraft P. and Tchetgen E. J. T. (2014). Control Function Assisted IPW Estimation with a Secondary Outcome in Case-Control Studies. Harvard University, Biostatistics Working Paper Series Paper 174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen Tchetgen E. J. (2014). A general regression framework for a secondary outcome in case–control studies. Biostatistics 15, 117–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen Tchetgen E. J. & Vansteelandt S. (2013) Alternative identification and inference for the effect of treatment on the treated with an instrumental variable. Harvard University, Biostatistics Working Paper Series Paper 166. [Google Scholar]
- Vansteelandt S. and Goetghebeur E. (2003). Causal inference with generalized structural mean models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, 817–835. [Google Scholar]
- Vansteelandt S. and Goetghebeur E. (2005). Sense and sensitivity when correcting for observed exposures in randomized clinical trials. Statistics in Medicine 24, 191–210. [DOI] [PubMed] [Google Scholar]
- Vansteelandt S. and Keiding N. (2011). Invited commentary: G-computation-lost in translation? American Journal of Epidemiology 173, 739–742. [DOI] [PubMed] [Google Scholar]
- Vansteelandt S. Bowden J. Babanezhad M. and Goetghebeur E. (2011). On instrumental variables estimation of causal odds ratios. Statistical Science 26, 403–422. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





























































































