Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 Jan 23;104(1):229–236. doi: 10.1093/biomet/asw064

On falsification of the binary instrumental variable model

Linbo Wang 1,2,3,, James M Robins 1,2,3, Thomas S Richardson 1,2,3
PMCID: PMC5819759  PMID: 29505035

SUMMARY

Instrumental variables are widely used for estimating causal effects in the presence of unmeasured confounding. The discrete instrumental variable model has testable implications for the law of the observed data. However, current assessments of instrumental validity are typically based solely on subject-matter arguments rather than these testable implications, partly due to a lack of formal statistical tests with known properties. In this paper, we develop simple procedures for testing the binary instrumental variable model. Our methods are based on existing techniques for comparing two treatments, such as the Inline graphic-test and the Gail–Simon test. We illustrate the importance of testing the instrumental variable model by evaluating the exogeneity of college proximity using the National Longitudinal Survey of Young Men.

Keywords: Binary response, Gail–Simon test, Instrumental variable, Qualitative interaction, t-test, Two-by-two table

1. INTRODUCTION

The instrumental variable method has been widely used for estimating causal effects in the presence of unmeasured confounders. A variable Inline graphic is called an instrumental variable if: (a) it is independent of unmeasured confounders Inline graphic; (b) it does not have a direct effect on the outcome Inline graphic; (c) it has a nonzero average causal effect on the treatment Inline graphic (Angrist et al., 1996). In many applications, assumption (a) is reasonable only after controlling for observed covariates Inline graphic (Baiocchi et al., 2014). The resulting model is called the conditional instrumental variable model. Figure 1 gives a directed acyclic graphical model representation (Pearl, 2009) of the conditional instrumental variable model, in which the faithfulness (Spirtes et al., 2000) of the edge Inline graphic is assumed.

Fig. 1.

Fig. 1.

Directed acyclic graph representing an instrumental variable model. The variables Inline graphic, Inline graphic, Inline graphic and Inline graphic are observed; Inline graphic is unobserved.

Unlike the assumption of no unmeasured confounders between Inline graphic and Inline graphic, the instrumental variable model with discrete observables Inline graphic imposes nontrivial constraints on the observed-data distribution. In particular, Balke & Pearl (1997) and Bonet (2001) give the following necessary and sufficient condition for an observed-data distribution Inline graphic to be compatible with an unconditional binary instrumental variable model where Inline graphic, Inline graphic and Inline graphic take values Inline graphic and Inline graphic:

pr(D=d,Y=yZ=1)+pr(D=d,Y=1yZ=0)1(d=0,1;y=0,1). (1)

Here the unconditional instrumental variable model refers to the model with an empty control variable set Inline graphic. In particular, if the potential instrument Inline graphic is randomized so that assumption (a) holds, then violation of each inequality in (1) corresponds to a nonzero average controlled direct effect of Inline graphic on Inline graphic, which violates assumption (b) (Cai et al., 2008; Richardson et al., 2011). Although assumption (c) imposes on the observables the constraint

pr(D=1Z=1)pr(D=1Z=0), (2)

it is in general not possible to reject (2) with a statistical test. Hence hereafter we do not discuss constraint (2). Similarly, the testable implications of a conditional binary instrumental variable model are given by

pr(D=d,Y=yZ=1,V=v)+pr(D=d,Y=1yZ=0,V=v)1(d=0,1;y=0,1;vV), (3)

where Inline graphic contains all possible values for Inline graphic. In practice, the inequalities (1) can be used to partially test the binary unconditional instrumental variable model. Likewise, (3) can be used to test the binary conditional instrumental variable model. In contrast, it is impossible to empirically falsify the assumption of no unmeasured confounders between Inline graphic and Inline graphic as in an observational study without an instrument.

Although there have been many discussions of estimation of causal effects under the binary instrumental variable model (Vansteelandt et al., 2011; Clarke & Windmeijer, 2012), less attention has been paid to testing its validity. Prior to our work, Ramsahai & Lauritzen (2011) considered testing an unconditional binary instrumental variable model using a likelihood ratio test. Their approach involves solving a constrained optimization problem and cannot be used to test the conditional binary instrumental variable model as described by Fig. 1. Furthermore, their approach tests the four inequalities in (1) jointly. Hence, without modification, it can only be used to falsify the binary instrumental variable model, but cannot identify which specific average controlled direct effect of Inline graphic on Inline graphic must be positive or negative. In related work, Kang et al. (2013) provided a falsification test for the instrumental variable assumptions given knowledge of a subpopulation where the edge Inline graphic is absent. In this paper we develop a novel perspective on falsification of the binary instrumental variable model. Specifically, we show that testing (1) or (3) is equivalent to testing for a nonpositive effect of the instrument Inline graphic on a constructed variable.

2. TESTS FOR THE UNCONDITIONAL BINARY INSTRUMENTAL VARIABLE MODEL

To fix ideas, we first consider testing the instrumental variable inequality

pr(D=0,Y=1Z=1)+pr(D=0,Y=0Z=0)1. (4)

Equation (4) can be rewritten as

pr(D=0,Y=1Z=1)1+pr(D=0,Y=0Z=0)0.

Define a new variable

Q01={I(D=0,Y=1),Z=1,1I(D=0,Y=0),Z=0,

where Inline graphic is the indicator function. It then follows that

pr(D=0,Y=1Z=1){1pr(D=0,Y=0Z=0)}=pr(Q01=1Z=1)pr(Q01=1Z=0)Δ01.

Testing (4) is therefore equivalent to the testing problem

H001:Δ010versusHa01:Δ01>0, (5)

which is simply one-sided testing for a Inline graphic table.

In general, we have four inequalities of the form (4) with a binary instrumental variable model, so multiplicity adjustment is needed. Suppose for now that we have one-sided tests Inline graphic such that the size of Inline graphic goes to zero asymptotically in the interior of the null space defined by Inline graphic. Furthermore, assume that the rejection region of Inline graphic has no intersection with the null space defined by Inline graphic (Perlman & Wu, 1999). To get a level-Inline graphic test for (1), a naive Bonferroni correction would require that each Inline graphic have size less than or equal to Inline graphic for testing Inline graphic. However, the left-hand sides of the four inequalities in (1) sum to 2, and hence at most two of them can simultaneously hold with equality. Based on this, we now show that it suffices to control the level of each test Inline graphic at Inline graphic.

Specifically, let Inline graphic and let Inline graphic. The null space defined by (1) can be represented by an octahedron Inline graphic in the simplex Inline graphic, where Inline graphic is defined as

Z={ζ:u00+u01+u102,u00,u01,u100}.

Figure 2 gives a graphical depiction of Inline graphic and Inline graphic. Each of the four blue-shaded facets corresponds to one inequality in (1) holding with equality. Six points, shown in red, have two inequalities in (1) holding with equality. The interior of the null space Inline graphic corresponds to cases where none of the four inequalities in (1) holds with equality.

Fig. 2.

Fig. 2.

Representation of the simplex Inline graphic and the null space Inline graphic. The edges of the simplex Inline graphic are represented by thick black lines, and the null space Inline graphic is the octahedron whose vertices are shown in red; four of the eight surfaces are shaded blue.

We are now ready to present our multiplicity adjustment procedure. The proof of Theorem 1 is given in the Appendix.

THEOREM 1.

Propose a testing procedure as follows: reject (1) if forInline graphicandInline graphicat least one ofInline graphicis rejected byInline graphicat levelInline graphic. Under the null hypotheses (1):

  • (i) if two inequalities in (1) hold with equality at the true value Inline graphic, then the proposed test has size Inline graphic;

  • (ii) if only one of the inequalities in (1) holds with equality at the true value Inline graphic, then asymptotically the proposed test has size Inline graphic;

  • (iii) if none of the inequalities in (1) holds with equality at the true value Inline graphic, then asymptotically the proposed test has size Inline graphic.

In particular, the proposed test always has asymptotic size no greater than Inline graphic.

We now turn to the choice of Inline graphic. Over the past century, there has been much discussion on testing association in Inline graphic tables, including size and power comparisons for different test statistics and methods of computing the Inline graphic-value; see Lydersen et al. (2009) for a review. When the sample size is large, asymptotic tests such as those based on the Inline graphic-statistic are popular among researchers. However, under independent and identically distributed sampling they may not preserve the test size with small samples, in which case unconditional exact tests such as the Fisher–Boschloo test are recommended.

Remark 1.

The computation time for unconditional tests can be excessive when the sample size is moderate or large, in which case it may be desirable to use the procedure of Berger & Boos (1994) to reduce computation time. The proposed test still has asymptotic size no greater than Inline graphic provided Inline graphic, where Inline graphic is the confidence level for the nuisance parameter.

Remark 2.

The Wald test for the Inline graphic table corresponding to (5) coincides with the Wald test for (4), where Inline graphic and Inline graphic are estimated via maximum likelihood. However, our introduction of Inline graphic builds the connection between testing unconditional instrumental inequalities and testing Inline graphic tables, and hence motivates many more approaches to testing the unconditional instrumental inequalities.

We now discuss the interpretation of results from our testing procedure. As noted by Cai et al. (2008) and Richardson et al. (2011), under the randomization assumption, the average controlled direct effect of Inline graphic on Inline graphic, Inline graphic, satisfies

graphic file with name asw064m6.jpg (6)

It follows that violation of each inequality in (1) corresponds to a nonzero average controlled direct effect of Inline graphic on Inline graphic. Our testing procedure is therefore interpretable in the sense that if we reject the binary instrumental variable model, we would also know which average controlled direct effect is positive or negative. For example, suppose we reject the null that Inline graphic, then from (6) we would also conclude that Inline graphic is positive.

3. TESTS FOR THE CONDITIONAL BINARY INSTRUMENTAL VARIABLE MODEL

Suppose now we wish to test the instrumental variable inequality that

pr(D=0,Y=1Z=1,V=v)+pr(D=0,Y=0Z=0,V=v)1,vV. (7)

Using the same arguments as in § 2, we can rewrite the testing problem of (7) as

H0,c01:for all vV,Δ01(v)0versusHa,c01:there exists vV such that Δ01(v)>0, (8)

where Inline graphic and a subscript c denotes conditional.

The testing problem (8) concerns the null hypothesis that a particular treatment is at least as good as the other treatment in all subsets of units, which has been studied extensively. For example, with Inline graphic discrete, the Gail–Simon test for qualitative interaction can be used to test hypotheses of the form (8) with a slight modification (Gail & Simon, 1985 p. 364). Chang et al. (2015) considered the problem with a general Inline graphic based on Inline graphic-type functionals of uniformly consistent nonparametric kernel estimators of Inline graphic. These tests make no assumptions on the functional form of Inline graphic, which is particularly appealing as Inline graphic is not directly interpretable.

As we have four hypotheses of the form (7), a multiplicity adjustment is warranted. However, unlike the case with unconditional instrumental variable models, the four inequalities in (3) can be violated simultaneously, as each of them concerns multiple covariate values. In other words, no result analogous to Theorem 1 holds unless Inline graphic takes only one value. Instead, a naive Bonferroni correction may be used to account for multiple comparisons so that to get an overall level-Inline graphic test, hypotheses of the form (7) are tested at level Inline graphic.

Remark 3.

When Inline graphic is discrete, one can alternatively apply Theorem 1 to test, for each Inline graphic, the hypotheses

pr(D=d,Y=yZ=1,V=v)+pr(D=d,Y=1yZ=0,V=v)1(d=0,1;y=0,1), (9)

and then use a Bonferroni correction to account for multiple testing due to levels of Inline graphic. In this way, each hypothesis in (9) is tested at level Inline graphic, where Inline graphic is the number of possible levels for Inline graphic. Since tests of the forms (7) and (9) are different, neither approach generally dominates the other.

Remark 4.

The Gail–Simon test examines the hypotheses (3) for all possible values of Inline graphic simultaneously. Alternatively, it may be tempting to test (3) for different levels of Inline graphic and claim that Inline graphic is a valid instrument within the subset of the population for which the hypotheses (3) are not rejected. Failure to violate an instrumental variable inequality, however, does not prove that Inline graphic is an instrument. This will ultimately rest on whether, based on subject-matter knowledge, we believe that we have measured enough covariates Inline graphic to control confounding, as well as subject-matter arguments for the absence of direct effects of Inline graphic on Inline graphic.

Consequently, one should avoid using the test (3) as a way to restrict the range of Inline graphic, unless a substantive argument could be made that Inline graphic is an instrument for one range of Inline graphic but not for another.

4. THE CAUSAL EFFECT OF EDUCATION ON EARNINGS

We illustrate the use of the proposed tests by examining the instrumental variable model assumed by Okui et al. (2012). The goal of their analysis was to estimate the causal effect of education on earnings. To account for unobserved preferences for education levels, Okui et al. (2012) followed Card (1995) and used presence of a nearby four-year college as an instrument. The validity of this approach relies on the assumptions that college proximity affects earnings only through education and, conditional on adjusted potential confounders, college proximity is independent of underlying factors that also affect earnings. These assumptions, however, are hardly watertight. In fact, as pointed out by Card (1995), living near a college may influence earnings through higher elementary and secondary school quality, and it may also be associated with higher motivation to achieve labour market success.

To investigate the possible exogeneity of college proximity, we use the dataset provided by Okui et al. (2012), which contains 3010 observations from the National Longitudinal Survey of Young Men. Following Tan (2006), we consider education after high school as the treatment Inline graphic. The outcome wage is dichotomized at its median. For illustrative purposes, we consider three instrumental variable models with nested sets of covariates: (I) experience only; (II) experience and race; (III) experience, race and region of residence. The third set was also considered previously by Okui et al. (2012). We use the Gail–Simon test with Bonferroni correction to examine the testable implications of these instrumental variable models.

Table 1 summarizes the test results. The model conditional only on experience is rejected by the proposed test. The Inline graphic-value from the test on Inline graphic is significant at the 0Inline graphic05 level, and the Inline graphic-value from the test on Inline graphic is also borderline significant. These show that either college proximity has positive {direct} effects on earnings in some subgroups, or after adjusting for experience college proximity is still correlated with unmeasured confounders such as underlying motivation for labour market success. The proposed test fails to reject the instrumental variable model of Okui et al. (2012). However, as we discussed in Remark 4, with large sample sizes, failure to violate the instrumental variable inequalities shows that an instrumental variable model is compatible with the observed data, but does not validate such a model. Specifically, if one believes that the sample size is sufficiently large, then the results in Table 1 show that the instrumental variable model of Okui et al. (2012) is compatible with the observed data. One should use their model if one also believes that college proximity affects earnings only through education, and that there is no unmeasured confounding after adjusting for experience, race and region of residence. In contrast, one should not trust the instrumental variable model conditional only on experience, regardless of one’s prior substantive belief.

Table 1.

The Inline graphic-values and numbers of subgroups obtained from partial tests for the binary instrumental variable models using college proximity as an instrument for education after high school

Covariate set Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Number of subgroups
(I) 1Inline graphic000 0Inline graphic010 1Inline graphic000 0Inline graphic034 24
(II) 1Inline graphic000 0Inline graphic132 1Inline graphic000 0Inline graphic143 47
(III) 1Inline graphic000 1Inline graphic000 1Inline graphic000 1Inline graphic000 819

5. DISCUSSION

Although instrumental variable methods are widely used to identify causal effects in the presence of unmeasured confounding, their assumptions have mainly been assessed based on subject-matter arguments rather than statistical evidence. However, there are controversies about the validity of many instruments, especially if they are not randomized; for example, see Rosenzweig & Wolpin (2000) for a discussion on using natural experiments as instruments. Therefore, it should be routine to check the instrumental variable model against the observed data; see also Didelez et al. (2010). In this paper, we introduce a simple method for testing the binary instrumental variable model.

Our approach can be extended to test discrete instrumental variable models with binary outcomes. According to Pearl (1995), testable implications in this case include

max{p(0,d0),,p(0,dzmax)}+max{p(1,d0),,p(1,dzmax)}1(d=0,,dmax), (10)

where Inline graphic takes values in Inline graphic and Inline graphic takes values in Inline graphic. With slight modifications of the multiplicity adjustments, the techniques introduced in this paper can be used to test the inequalities (10); see the Appendix for details. In general, there are other observed-data constraints implied by the discrete instrumental variable model (Bonet, 2001), the testing of which is an interesting topic for future research.

Monotonicity is also often assumed in instrumental variable analysis. See Huber & Mellace (2015) for a joint test of the unconditional instrumental variable model and the monotonicity assumption. A future research problem would be to extend the proposed method to test the binary instrumental variable model under monotonicity.

Although we have focused primarily on testing the binary instrumental variable model, as we explain in § 2, with randomized experiments our proposed tests can be directly applied to identify the sign of the average controlled direct effects Inline graphicInline graphic. These average controlled direct effects quantify the extent to which the randomized treatment Inline graphic affects the outcome Inline graphic not through the mediator Inline graphic, and are important for explaining causal mechanisms.

ACKNOWLEDGEMENT

We thank Chengchun Shi for helpful comments. This research was supported by the U.S. National Institutes of Health and Office of Naval Research. This work was initiated when the first author was a graduate student at the University of Washington.

Appendix

Proof of Theorem 1

The second and third claims in Theorem 1 follow directly from the assumption that the size of Inline graphic goes to zero asymptotically in the interior of the null space defined by Inline graphic. We now consider the case where two inequalities in (1) hold with equality at the true value Inline graphic. Without loss of generality, we assume Inline graphic where the dot denotes the true value. As Inline graphic and Inline graphicInline graphic we immediately get that for Inline graphic and Inline graphic, Inline graphic} and hence Inline graphic and Inline graphic. As a result, for Inline graphic and Inline graphic {one cannot reject Inline graphic, with probability 1}. On the other hand, as at most one of Inline graphic and Inline graphic can be violated empirically, they cannot be rejected simultaneously given our assumptions on Inline graphic. The probability of rejecting at least one of Inline graphic and Inline graphic therefore equals Inline graphic in this case.

Multiplicity adjustment with the discrete instrumental variable model

The constraints in (10) can be written as

p(0,dz1)+p(1,dz2)1(z1,z2=0,,zmax,z1z2;d=0,,dmax). (A1)

There are Inline graphic inequalities in (A1), the left-hand sides of which sum to Inline graphic. Hence at most Inline graphic of them can hold with equality simultaneously. Similar to Theorem 1, the proposed testing procedure for the unconditional discrete instrumental variable model proceeds as follows: reject (A1) if for Inline graphic and Inline graphic, at least one of the hypotheses in (A1), denoted as Inline graphic, is rejected by the corresponding Inline graphic at level Inline graphic. For the conditional discrete instrumental variable model, the Bonferroni correction is appropriate; see also Remark 3.

References

  1. Angrist J. D. Imbens G. W. & Rubin D. B. (1996). Identification of causal effects using instrumental variables. J. Am. Statist. Assoc. 91, 444–55. [Google Scholar]
  2. Baiocchi M. Cheng J. & Small D. S. (2014). Instrumental variable methods for causal inference. Statist. Med. 33, 2297–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Balke A. & Pearl J. (1997). Bounds on treatment effects from studies with imperfect compliance. J. Am. Statist. Assoc. 92, 1171–6. [Google Scholar]
  4. Berger R. L. & Boos D. D. (1994). Inline graphic values maximized over a confidence set for the nuisance parameter. J. Am. Statist. Assoc. 89, 1012–6. [Google Scholar]
  5. Bonet B. (2001). Instrumentality tests revisited. In Proc. 17th Conf. Uncert. Artif. Intel., Breese J. & Koller D. eds. San Francisco, California: Morgan Kaufmann Publishers, pp. 48–55. [Google Scholar]
  6. Cai Z. Kuroki M. Pearl J. & Tian J. (2008). Bounds on direct effects in the presence of confounded intermediate variables. Biometrics 64, 695–701. [DOI] [PubMed] [Google Scholar]
  7. Card D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp, Christofides L. N. Swidinsky R. & Grant E. K. eds. Toronto: University of Toronto Press, pp. 201–22. [Google Scholar]
  8. Chang M. Lee S. & Whang Y.-J. (2015). Nonparametric tests of conditional treatment effects with an application to single-sex schooling on academic achievements. Economet. J. 18, 307–46. [Google Scholar]
  9. Clarke P. S. & Windmeijer F. (2012). Instrumental variable estimators for binary outcomes. J. Am. Statist. Assoc. 107, 1638–52. [Google Scholar]
  10. Didelez V. Meng S. & Sheehan N. A. (2010). Assumptions of IV methods for observational epidemiology. Statist. Sci. 25, 22–40. [Google Scholar]
  11. Gail M. & Simon R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41, 361–72. [PubMed] [Google Scholar]
  12. Huber M. & Mellace G. (2015). Testing instrument validity for LATE identification based on inequality moment constraints. Rev. Econ. Statist. 97, 398–411. [Google Scholar]
  13. Kang H. Kreuels B. Adjei O. Krumkamp R. May J. & Small D. S. (2013). The causal effect of malaria on stunting: A Mendelian randomization and matching approach. Int. J. Epidemiol. 42, 1390–8. [DOI] [PubMed] [Google Scholar]
  14. Lydersen S. Fagerland M. W. & Laake P. (2009). Recommended tests for association in Inline graphic tables. Statist. Med. 28, 1159–75. [DOI] [PubMed] [Google Scholar]
  15. Okui R. Small D. S. Tan Z. & Robins J. M. (2012). Doubly robust instrumental variable regression. Statist. Sinica 22, 173–205. [Google Scholar]
  16. Pearl J. (1995). Causal inference from indirect experiments. Artif. Intel. Med. 7, 561–82. [DOI] [PubMed] [Google Scholar]
  17. Pearl J. (2009). Causality. Cambridge: Cambridge University Press. [Google Scholar]
  18. Perlman M. D. & Wu L. (1999). The emperor’s new tests. Statist. Sci. 14, 355–69. [Google Scholar]
  19. Ramsahai R. & Lauritzen S. (2011). Likelihood analysis of the binary instrumental variable model. Biometrika 98, 987–94. [Google Scholar]
  20. Richardson T. S. Evans R. J. & Robins J. M. (2011). Transparent parameterizations of models for potential outcomes. Bayesian Statist. 9, 569–610. [Google Scholar]
  21. Rosenzweig M. R. & Wolpin K. I. (2000). Natural “natural experiments” in economics. J. Econ. Lit. 38, 827–74. [Google Scholar]
  22. Spirtes P. Glymour C. N. & Scheines R. (2000). Causation, Prediction, and Search. Cambridge, Massachusetts: MIT Press. [Google Scholar]
  23. Tan Z. (2006). Regression and weighting methods for causal inference using instrumental variables. J. Am. Statist. Assoc. 101, 1607–18. [Google Scholar]
  24. Vansteelandt S. Bowden J. Babanezhad M. & Goetghebeur E. (2011). On instrumental variables estimation of causal odds ratios. Statist. Sci. 26, 403–22. [Google Scholar]

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES