The Search for Identification
Since the 1980s, social scientists have become increasingly interested in how to make credible claims about cause and effect using observational data (1). Ignited by seminal papers demonstrating the promise of natural experiments (2), the causal revolution has put social scientists on the lookout for “as-if” random sources of variation in the world. The search for instrumental variables, discontinuities, and other naturally occurring randomizations has become a massive industry. But what if we had a large set of instrumental variables right under our skin, encoded in our genes? After all, our genes are a random draw of the genes of our parents, and many human behaviors and traits—from alcoholism to personality to political partisanship—have a genetic component. If we used the genotypes that influence, say, education, as an instrument for schooling completed, we might be able to figure out how much education causally affects wages net of confounders, like childhood nutrition, that affect both education and wages. One commonly adopted approach, Mendelian randomization (MR), deploys genes as instruments to unpack causal relationships in this way (3). In PNAS, DiPrete et al. (4) identify the conditions under which the identifying assumptions behind MR are not met and propose an alternative, genetic instrumental variables, which they argue makes it possible to credibly deploy genes as instruments.
These efforts are important because if we could use genes as instrumental variables, we could make causal inferences about an expanded set of social and behavioral treatments and outcomes. And, because everyone has genes, we could study these relationships in broader populations, rather than be limited to highly local effects in narrow populations of compliers, as is often the case with many instruments today. These advantages, coupled with advances in the infrastructure to support research employing genetic data, mean that the payoff for figuring out how to credibly employ genes as instruments could be substantial.
What Makes a Good Instrument?
But a good instrument needs to meet several criteria. First, the instrument must be relevant. That is, it must be correlated with the treatment; we seek to avoid instruments that are only weakly correlated with the treatment (5). Second, the instrument should be exogenous and satisfy the exclusion restriction, which stipulates that the instrument must not affect the outcome other than through the treatment. Third, the instrument should have a monotonic relationship with the treatment, rather than increase take-up of the treatment for some and decrease it for others.
When do genes meet these criteria? The relevance criterion is met when, for example, single variants have strong effects on the treatment. But, as we have learned from candidate gene studies, these cases are rare. Most single gene markers have little predictive value, making them weak instruments (6). This problem has been solved by using genome-wide polygenic scores (PGS), which add up the effects of many different genetic variants across the genome. These scores have much more predictive power than any single candidate gene variant. For example, the latest PGS for height has an R2 of 0.30 (7), while the PGS for education predicts 13% of the variation in the National Longitudinal Study of Adolescent to Adult Health (8).
The turn toward using PGSs has important implications for the satisfaction of the other two instrumental variable criteria. Because polygenic scores capture linear, additive effects, their use as instruments depends on whether there are gene–gene interactions (epistasis) and gene–environment interactions that might challenge the monotonicity assumption. But given that PGSs sum the additive, averaged effects estimated across cohorts from very different environments, it is reasonable to expect that they do not pick up cross-over epistatic or gene-by-environment effects that could create a set of defiers that react in the opposite way.
A remaining obstacle to employing genes as instruments is that the last decade of research has taught us that pleiotropy is the rule in human genetics rather than the exception (9). That is, a given set of genes can have effects on many traits. Genes are expressed in multiple tissues at different times of development, and pop up in many biological pathways. This creates the strong possibility that genes that we use as instruments could violate the exclusion restriction by having effects on our outcome through pathways that do not pass through the treatment. Polygenic scores that sum small genetic effects across the genome can thus compound the problem by creating the occasion for many pleiotropic violations of the exclusion restriction.
Sometimes we might be less concerned about pleiotropy-induced violations of the exclusion restriction. One case is when single genes have large effects on the treatment and we know from other studies that our ultimate phenotype is highly polygenic. Take, for example, eye color, which is largely controlled by variation in two genes. If we wanted to assess discrimination in the education system based on eye color, we could instrument iris pigmentation with these two markers and then determine eye color’s impact (perhaps using sibling models to eliminate population stratification). We know from genome-wide association studies that compared with the huge effect in the first stage, bias from pleiotropy is going to be small in absolute terms because education is affected by thousands of variants across the genome, each with tiny effects. And if we know the reduced form correlation between the eye-color genotypes and the outcome is large, we would be good to go. But in rare cases such as these, we might as well just compare siblings who are discordant on the phenotype.
Another case where we might defend MR involves pathways that are dependent on the environmental context. If alcohol dehydrogenase variants only affect heart disease through their effects on alcohol consumption and metabolism, we should see no effect of those same variants in communities where alcohol is not consumed for cultural reasons. A concrete example of this is provided by Pitt et al. (10), who study a genetic variant that is protective for arsenic poisoning in an Indian population where the wells were contaminated by this heavy metal. By showing there was no effect of this genetic variant on educational outcomes in a United States sample, where there is presumably no arsenic in the water, the authors help build a case for the validity of their identifying assumptions. While such a placebo test does not conclusively rule out the possibility of exclusion restriction violations in alternate environments, it can help us gain greater confidence in the proposed genetic instruments.
Block That Path?
DiPrete et al. (4) advance a solution for the pleiotropy problem when using polygenic scores, arguing that it is possible to block pleiotropic pathways by including the PGS for the outcome conditional on treatment as a control. To develop their argument, they use simulations to compare how well different estimators recover the true effect of a treatment in the presence of pleiotropy and other forms of confounding. First, DiPrete et al. estimate the effect of a heritable treatment using a naïve ordinary least-squares regression estimator, regressing the outcome on the treatment. To this they compare the typical MR approach using a PGS for the treatment as an instrument. In the presence of moderate to strong pleiotropy, this instrumental variable regression recovers biased estimates. In their next step, DiPrete et al. add as a control estimates of the PGS for the outcome conditional on the treatment, an approach they call “enhanced Mendelian randomization.” The authors argue that adding in the true PGS should theoretically capture all of the linear causal paths that flow from a person’s genes to the outcome that do not pass through the treatment. Adding it as a control, they argue, would thus block any pleiotropic pathways opened by the PGS for the treatment that would violate the exclusion restriction.
A practical problem, however, is that researchers typically do not know the true conditional PGS for the outcome. Due to finite sample sizes, we can only retrieve noisy measures. This is a problem because it means that a conditional PGS that we estimate would not fully block all pleiotropic pathways. What can be done? DiPrete at el (4) propose estimating two separate conditional polygenic scores on independent, nonoverlapping samples of the same population. The authors argue that these two PGSs would represent two independent indicators of the true PGS that differ from each other due to random sampling error. As long as we can credibly assume that the measurement error that plagues PGS estimates is classic rather than systematic, we can use one indicator as an instrument for the other, purging the latter of measurement error. The indicator purged of measurement error
DiPrete et al. advance a solution for the pleiotropy problem when using polygenic scores, arguing that it is possible to block pleiotropic pathways by including the PGS for the outcome conditional on treatment as a control.
can then be used as the control in the second stage—what they call genetic instrumental variables (GIV).
For the conditional PGS for the outcome to effectively block all pleiotropic pathways, researchers must assume that the effects of genes do not interact (no epistasis) and are linear. The conditional PGS can further fail when there are rare but relevant variants that are not observed in the samples on which the PGS is constructed. In both cases, the PGS would display systematic measurement error from functional form misspecification and sample limitations in its construction. As such, one PGS cannot be used to purge another of measurement error.
Crucially, the overall strategy depends on first adequately accounting for population structure and environmental confounding. One important form of confounding that may persist is genetic nurturance (11). Where there is genetic nurturance, genetic variants that parents possess but do not pass on to their child can still affect the child’s outcome by influencing the environment in which the child grows up. One example of this may be that parents who possess alleles associated with higher educational attainment may be more educated and better able to provide enriching home environments for their children, improving their children’s educational outcomes indirectly rather than directly through genetic transmission to the next generation. This presents a problem for GIV because it introduces environmental heterogeneity that is correlated with the genotype of the child, jeopardizing the exogeneity of the instrument. DiPrete et al.’s (4) simulations show that GIV recovers biased estimates in the presence of such genetically correlated, environmental confounders.
A further problem arises when there are strong unobserved environmental confounders that are not correlated with the genes in question but that causally affect the treatment and the outcome. This is reflected in greater detail in simulation results in the Supporting Information of DiPrete et al. (4). In such cases, adding a PGS for the outcome that is estimated from conditioning on the treatment can induce a noncausal association between genes and unobserved confounders. The resulting path that is opened up can introduce bias, exacerbating problems posed by unobserved confounders. In their empirical example of the effect of height on education, milk consumption could be an environmental factor that affects both height and schooling outcomes that may not be picked up by family fixed effects, their recommended strategy. Indeed, behavior genetic analysis suggests that the lion’s share of phenotypic variance explained by environmental factors is not shared by siblings (12). For researchers, this challenge means that when there is low pleiotropy but significant unobserved confounding, GIV will perform worse than MR.
Thus, as DiPrete et al. (4) discuss in their Supporting Information, whether GIV is useful depends on the research problem in question. If pleiotropy is the only form of confounding and the PGS accurately captures the relevant genetic pathways to the outcome, GIV can be an effective approach. We might imagine a case where the entire bias is due to genetic confounds: that is, where the phenotypes are highly heritable (say working memory and cortical surface area) (13). But the rub is that this may only be true in a limited set of cases and, unlike in the case of simulations, we often do not know how big an environmental confound may be present, the extent to which it is correlated with the genetic measures in our model, how much genetic pleiotropy exists or, for that matter, the extent to which estimates from traditional approaches are biased in the first place. While DiPrete et al. (4) present an important step forward, much work remains to be done to fully harness the would-be instrumental variables lurking under our skin.
Acknowledgments
We thank Ian Lundberg for drawing a Directed Acyclic Graph (DAG) and members of the Stewart laboratory and the Princeton Biosociology laboratory for help with preparing the manuscript.
Footnotes
The authors declare no conflict of interest.
See companion article on page E4970.
References
- 1.Angrist JD, Pischke J-S. The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. J Econ Perspect. 2010;24:3–30. [Google Scholar]
- 2.Angrist JD. Lifetime earnings and the Vietnam era draft lottery: Evidence from Social Security administrative records. Am Econ Rev. 1990;80:313–336. [Google Scholar]
- 3.Smith GD, Ebrahim S. ‘Mendelian randomization’: Can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 4.DiPrete TA, Burik CAP, Koellinger PD. Genetic instrumental variable regression: Explaining socioeconomic and health outcomes in nonexperimental data. Proc Natl Acad Sci USA. 2018;115:E4970–E4979. doi: 10.1073/pnas.1707388115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bound J, Jaeger DA, Baker RM. Problems with instrumental variables estimation when the correlation between the instruments and the endogeneous explanatory variable is weak. J Am Stat Assoc. 1995;90:443–450. [Google Scholar]
- 6.Duncan LE, Keller MC. A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am J Psychiatry. 2011;168:1041–1049. doi: 10.1176/appi.ajp.2011.11020191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lello L, et al. Accurate genomic prediction of human height. bioRxiv. 2017 doi: 10.1101/190124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lee JJ, et al. Gene discovery and polygenic prediction from a 1.1-million-person GWAS of educational attainment. Nat Genet, in press. [DOI] [PMC free article] [PubMed]
- 9.Bulik-Sullivan B, et al. ReproGen Consortium; Psychiatric Genomics Consortium; Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pitt M, Rosenzweig M, Hassan N. Identifying the Cost of a Public Health Success: Arsenic Well Water Contamination and Productivity in Bangladesh. National Bureau of Economic Research; Cambridge, MA: 2015. [Google Scholar]
- 11.Kong A, et al. The nature of nurture: Effects of parental genotypes. Science. 2018;359:424–428. doi: 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]
- 12.Turkheimer E. Three laws of behavior genetics and what they mean. Curr Dir Psychol Sci. 2000;9:160–164. [Google Scholar]
- 13.Panizzon MS, et al. Distinct genetic influences on cortical surface area and cortical thickness. Cereb Cortex. 2009;19:2728–2735. doi: 10.1093/cercor/bhp026. [DOI] [PMC free article] [PubMed] [Google Scholar]