Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2021 Jun 2;190(12):2658–2661. doi: 10.1093/aje/kwab142

Invited Commentary: The Promise and Pitfalls of Causal Inference With Multivariate Environmental Exposures

Corwin M Zigler
PMCID: PMC8796803  PMID: 34079988

Abstract

The accompanying article by Keil et al. (Am J Epidemiol. 2021;190(12):2647–2657) deploys Bayesian g-computation to investigate the causal effect of 6 airborne metal exposures linked to power-plant emissions on birth weight. In so doing, it articulates the potential value of framing the analysis of environmental mixtures as an explicit contrast between exposure distributions that might arise in response to a well-defined intervention—here, the decommissioning of coal plants. Framing the mixture analysis as that of an approximate “target trial” is an important approach that deserves incorporation into the already rich literature on the analysis of environmental mixtures. However, its deployment in the power plant example highlights challenges that can arise when the target trial is at odds with the exposure distribution observed in the data, a discordance that seems particularly difficult in studies of environmental mixtures. Bayesian methodology such as model averaging and informative priors can help, but they are ultimately limited for overcoming this salient challenge.

Keywords: Bayesian inference, causal inference, environmental mixtures, power plants


Editor ’s note: The opinions expressed in this article are those of the author and do not necessarily reflect the views of the American Journal of Epidemiology.

A key element throughout the history of causal inference is the foregrounding of framing observational studies as approximations of (hypothesized) randomized trials (1, 2). The work by Keil et al. (3) in this issue of the Journal deploys this perspective to evaluate the extent to which a reduction in 6 metal exposures linked to coal-fired power plant emissions might causally affect birth weights in Milwaukee, Wisconsin. A chief contribution of this work is the authors’ articulation of how explicit causal inference methods can direct research questions away from some of the thorny statistical issues related to analysis of complex mixtures and focus investigations around the evaluation of causal estimands that correspond to consequences of specific, well-defined actions. I congratulate Keil et al. for pursuing this important continuation of the recent uptick in use of causal methods in environmental epidemiology (4–11) and enthusiastically endorse these efforts to advance an already rich body of literature on the analysis of complex environmental mixtures.

FRAMING CAUSAL QUESTIONS WITH MIXTURES

Using the infrastructure of potential outcomes, Keil et al. (3) frame their analysis as an explicit contrast between exposure distributions associated with a specific action of the type that might occur in a (hypothetical) designed experiment. Framing observational studies as approximate experiments often involves stretching notions of what interventions could hypothetically occur. When dealing in the realm of the hypothetical, it is important to remember that simply declaring that an action could have hypothetically occurred does not guarantee that 1) investigating its impact has public health relevance or 2) the observed data contain the relevant information with which to reliably characterize it. The example in the paper by Keil et al. (3) stands on much firmer ground with regard to point 1 than it does to point 2, highlighting important friction that can arise in the practice of environmental epidemiology, especially when dealing with complex mixtures.

Hypothetical, simulated, and infeasible interventions

The nominal focus of Keil et al. is “a hypothetical intervention to reduce exposure to 6 airborne metals by decommissioning 3 coal-fired power plants in Milwaukee County prior to 2010” (3, p. 2648) and its causal effect (vs. no action) on birth weight. In contrast to similarly framed analyses of power plants (12–16), an important feature of the analysis of Keil et al. is that the decommissioning of coal plants never actually occurred. Absent actual decommissioning of plants, Keil et al. assume that hypothetical decommissioning would lead to a uniform proportional reduction in each of the 6 metals relative to their observed values across the study area. This reduction dictates the underlying trial the methods are designed to approximate: The “control arm” of the trial is the observed exposure mixture, and the “intervention arm” is the uniform proportional reduction in levels of all 6 metals. Confounders are those variables that are related to metals and birth weight; positivity relates to the likelihood of a census tract’s exhibiting a particular level of the 6 metals; and consistency assumes that hypothetical exposure levels are unrelated to the manner in which they were realized. The relationship to coal plants bears nominal relevance to the framing of the hypothetical target trial but is not, strictly speaking, related to the methodological underpinnings of the causal inference methodology. One purpose of the framing around coal plants is that it isolates one of infinitely many exposure contrasts that could be used to generate a hypothetical trial of these 6 exposures; the simultaneous uniform proportional reduction is the exposure contrast of interest, and not, say, a shift in each individual exposure dimension (while holding the others fixed) or a change in the relative mixture proportions of each component, both of which would correspond to different underlying hypothetical trials. Whether this uniform proportional decrease actually corresponds to what would happen upon decommissioning of coal plants is considered in Keil et al.’s paper (3), but detailed assessment of this correspondence was apparently beyond the scope of the investigation.

Defining versus designing the hypothetical trial

When considering causal estimation in the hypothetical target trial, it is important to suspend focus on power plants and evaluate the approximation of the target trial on the basis of the distribution of the 6 metal exposures. Here, the authors clearly identify that approximating their target trial would require extrapolation beyond the range of observed data. Keil et al.’s Web Figure 1 shows that the levels of these 6 metals implied by the uniform proportional reduction would fall almost entirely below the range of values observed in the study area. Thus, the hypothetical trial entails an “intervention arm” for which there is no representation in the observed data; estimates will necessarily extrapolate to these levels of exposure, and validity is almost entirely dependent upon the adequacy of the statistical model used to perform that extrapolation. This is dangerous territory: Results will be highly dependent on model specification (a cartoonish but possibly instructive example appears in Gutman and Rubin (17)). An alternatively defined target trial bounded by the percentiles of the observed exposure distribution (in a secondary analysis) is easier to approximate, but its relationship with the decommissioning of power plants is not clear. Hence the inherent friction: The readily interpretable framing relative to power plant closures is undercut by the realities of the observed data, and a framing that more closely hews to the observed data sacrifices the intervention-focused interpretation. I suspect that this friction arises more often in studies of environmental mixtures than in those of univariate exposures, since specifying hypothetical levels of multiple exposures may not cohere to the inherent correlations and interdependencies of the natural exposure mixture, thus increasing the likelihood of inconsistencies with the observed data.

BAYESIAN CAUSAL INFERENCE: MODEL AVERAGING AND REGULARIZATION

For estimation, Keil et al. provide a combination of g-computation with the mechanics of Bayesian model averaging (BMA) and informative prior distributions (3). The potential benefit of incorporating model uncertainty into g-computation gives, in a general sense, a clear answer to the common question of “Why Bayesian?” when it comes to established causal inference techniques such as g-computation that have been employed outside the Bayesian paradigm for decades. The set of correlated exposures, possible confounders, and higher-order interactions contained within the regression model underlying g-computation prime the analysis to exhibit unstable and erratic estimates. Coupling this challenge with that of extrapolating beyond the observed range of the data motivates averaging over many possible models instead of pinning hope on only one. While Keil et al. point toward some advantages of their approach (3), I see some clear future directions for more closely tailoring these methods to causal inference investigations. More importantly, these methods are not capable of resolving the friction at the heart of the analysis, and they cannot overcome poor “design” of the hypothetical target trial.

Bayesian model averaging

BMA simultaneously considers a model class of all 283 possible regression specifications using the 83 quantities enumerated by Keil et al. (3), effectively letting the data “choose” which terms are relevant for predicting birth weight, without forcing the investigator to include an unnecessarily high-dimensional set of regression parameters that risk exacerbating the difficulties created by multiple highly correlated exposures. The final posterior distribution summarizing inference for causal effects weighs each possible model in the model class according to its support from the data (although, as a matter of implementation, I am wondering how a space consisting of 283 distinct models could have been effectively explored with 36,000 random walk Markov chain Monte Carlo iterations).

As Keil et al. state in their Web Appendix 1, “under standard causal identification conditions and correct model assumptions,” BMA can form the basis of birth weight predictions under specified exposure levels. But does every model in the class of 283 models satisfy the requisite assumptions about, for example, confounding? Even when models satisfying the identification conditions are implicitly assumed to lie within the model class, BMA provides no guarantee that the posterior distribution will concentrate around these models. This issue has motivated a steady stream of research on “Bayesian confounding adjustment,” including work explicitly targeting model averaging for multiple exposures (18–24). The full implications of this line of work in the analysis of Keil et al. are not clear, although the different causal estimate obtained when the confounder main effects are forced into the model (sensitivity analysis G) is suggestive.

What’s more, I find it noteworthy that the primary analysis assigns substantial posterior weight to models that omit some (or even all!) of the 6 metal exposure main effects. As sensitivity analysis H shows, forcing the inclusion of all exposure main effects yields an effect estimate roughly twice the magnitude of that in the primary analysis! A careful consideration of why this might occur seems important, as it may relate to (among other issues) confounding differences across exposures or correlation among the exposures, both of which would illustrate the persistence of standard concerns around regression analysis of complex mixtures. In any case, how to interpret averages over models with different exposure variables (and wildly different causal estimates) is unclear, especially in relation to the hypothetical target trial and its relationship to coal plant closures.

Informative prior distributions and regularization

While it is not required for BMA, Keil et al. (3) additionally incorporate strong informative prior information on the underlying regression coefficients, as most completely described in Keil et al.’s Web Table 1. The “hierarchical specification” takes on a peculiar form, assuming that the regression coefficients associated with the 14 main-effect terms come from the same underlying normal distribution, effectively shrinking the parameter estimates toward each other. The unknown mean of this normal distribution (μ1) is given a highly informative prior centered at 0 with variance 1, implying shrinkage of all of these effects toward 0 (a separate but analogous distribution is specified for the remaining 69 higher-order terms). Why one would specify that the main effect for, say, maternal age should be shrunk toward that of, say, chromium is never discussed. The nonhierarchical versions of the model shrink the regression parameters toward 0 without shrinking them toward each other.

Bayesian regularization—essentially shrinking regression coefficients toward 0 with informative priors—has benefits in mixture problems when regression estimates are prone to “blow up” due to sparsity and correlation among exposure terms. This is the presumed motivation for the informative priors in Keil et al.’s paper (3), although the rationale for adopting a strategy that falls outside the more mainstream literature on regularization (25–28) is not clear. Judging from the sensitivity analyses, the implications of the prior specification in the birth weight analysis are at least as consequential as the account of model uncertainty.

Model-averaged extrapolation

Despite possible improvements over “standard” g-computation, there is no reason to expect the Bayesian methodologies advanced by Keil et al. (3) to overcome the challenges inherent to the “design” of the hypothetical target trial. The simulation study is underequipped to support the claim that the Bayesian approach provides assurance that precision improvements do not come at the expense of large bias, as it evaluates settings where the simple data-generating model lies either in or close to the analysis model class. The possibilities for model misspecification in the birth weight analysis surely extend beyond slightly different variations of the linear model structure.

SUMMARY

Keil et al. (3) offer a useful installment of explicit causal inference methodology to advance to the study of complex environmental mixtures. In so doing, they highlight a key friction between the benefits of framing studies of mixtures with clearly defined actions (such as coal plant decommissioning) and the possible limitations of observed data for reliably estimating causal effects of those actions. While the adoption of Bayesian g-computation presents some potential improvements over more standard methods, these advances cannot resolve the rift between the nominal link to power plants and the necessary extrapolation beyond the range of the observed data. The dizzying sensitivity analyses highlight substantial sensitivity to the model specification, making it difficult to discern which of the many disparate results should be regarded as “primary” in any sense. Nonetheless, it is worth considering how the methods outlined here would fare in settings with fewer data limitations, particularly those that would permit more refined characterization of power plant exposures with distance-based proxies or methods specifically designed to characterize pollution from point sources, both of which have been employed in epidemiologic (and causal inference) investigations of power plants (12, 13, 15, 29, 30).

ACKNOWLEDGMENTS

Author affiliations: Department of Statistics and Data Sciences, College of Natural Sciences, University of Texas at Austin, Austin, Texas, United States (Corwin M. Zigler); and Department of Women’s Health, Dell Medical School, University of Texas at Austin, Austin, Texas, United States (Corwin M. Zigler).

This work was supported by grant NIHR01ES026217 from the National Institute of Environmental Health Sciences and grant EPA 83587201 from the Environmental Protection Agency.

This publication’s contents are solely the responsibility of the grantee and do not necessarily represent the official views of the Environmental Protection Agency. Further, the Environmental Protection Agency does not endorse the purchase of any commercial products or services mentioned in this publication.

Conflict of interest: none declared.

REFERENCES

  • 1. Rubin  DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;2(3):808–840. [Google Scholar]
  • 2. Hernán  MA, Alonso  A, Logan  R, et al.  Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Keil  AP, Buckley  JP, Kalkbrenner  AE. Bayesian g-computation for estimating impacts of interventions on exposure mixtures: demonstration with metals from coal-fired power plants and birth weight. Am J Epidemiol. 2021;190(12):2647–2657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Zigler  CM, Dominici  F. Point: clarifying policy evidence with potential-outcomes thinking—beyond exposure-response estimation in air pollution epidemiology. Am J Epidemiol. 2014;180(12):1133–1140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Dominici  F, Greenstone  M, Sunstein  CR. Particulate matter matters. Science. 2014;344(6181):257–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Dominici  F, Zigler  C. Best practices for gauging evidence of causality in air pollution epidemiology. Am J Epidemiol. 2017;186(12):1303–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zigler  CM, Choirat  C, Dominici  F. Impact of National Ambient Air Quality Standards Nonattainment Designations on particulate pollution and health. Epidemiology. 2018;29(2):165–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wei  Y, Wang  Y, Wu  X, et al.  Causal effects of air pollution on mortality rate in Massachusetts. Am J Epidemiol. 2020;189(11):1316–1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Goldman  GT, Dominici  F. Don’t abandon evidence and process on air pollution policy. Science. 2019;363(6434):1398–1400. [DOI] [PubMed] [Google Scholar]
  • 10. Carone  M, Dominici  F, Sheppard  L. In pursuit of evidence in air pollution epidemiology: the role of causally driven data science. Epidemiology. 2020;31(1):1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Wu  X, Braun  D, Schwartz  J, et al.  Evaluating the impact of long-term exposure to fine particulate matter on mortality among the elderly. Sci Adv. 2020;6(29):eaba5692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Casey  JA, Karasek  D, Ogburn  EL, et al.  Retirements of coal and oil power plants in California: association with reduced preterm birth among populations nearby. Am J Epidemiol. 2018;187(8):1586–1594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Casey  JA, Su  JG, Henneman  LRF, et al.  Improved asthma outcomes observed in the vicinity of coal power plant retirement, retrofit and conversion to natural gas. Nat Energy. 2020;5(5):398–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kim  C, Daniels  MJ, Hogan  JW, et al.  Bayesian methods for multiple mediators: relating principal stratification and causal mediation in the analysis of power plant emission controls. Ann Appl Stat. 2019;13(3):1927–1956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Kim  C, Henneman  LRF, Choirat  C, et al.  Health effects of power plant emissions through ambient air quality. J R Stat Soc A Stat Soc. 2020;183(4):1677–1703. [Google Scholar]
  • 16. Henneman  LRF, Choirat  C, Zigler  CM. Accountability assessment of health improvements in the United States associated with reduced coal emissions between 2005 and 2012. Epidemiology. 2019;30(4):477–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Gutman  R, Rubin  DB. Analyses that inform policy decisions. Biometrics. 2012;68(3):671–675. [DOI] [PubMed] [Google Scholar]
  • 18. Wang  C, Parmigiani  G, Dominici  F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68(3):661–671. [DOI] [PubMed] [Google Scholar]
  • 19. Zigler  CM, Dominici  F. Uncertainty in propensity score estimation: Bayesian methods for variable selection and model-averaged causal effects. J Am Stat Assoc. 2014;109(505):95–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wilson  A, Zigler  C, Patel  C, et al.  Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics. 2018;74(3):1034–1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wang  C, Dominici  F, Parmigiani  G, et al.  Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015;71(3):654–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Talbot  D, Lefebvre  G, Atherton  J. The Bayesian causal effect estimation algorithm. J Causal Inference. 2015;3(2):207–236. [Google Scholar]
  • 23. Hahn  PR, Carvalho  CM, et al.  Regularization and confounding in linear regression for treatment effect estimation. Bayesian Anal. 2018;13(1):163–182. [Google Scholar]
  • 24. Cefalu  M, Dominici  F, Arvold  N, et al.  Model averaged double robust estimation. Biometrics. 2017;73(2):410–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Tibshirani  R. Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B Stat Methodol. 2011;73(3):273–282. [Google Scholar]
  • 26. Carvalho  CM, Polson  NG, Scott  JG. The horseshoe estimator for sparse signals. Biometrika. 2010;97(2):465–480. [Google Scholar]
  • 27. Polson  NG, Scott  JG. Shrink globally, act locally: sparse Bayesian regularization and prediction. In: Bernardo  JM, Bayarri  MJ, Berger  JO, et al., eds. Bayesian Statistics 9: Proceedings of the Ninth Valencia International Meeting. New York, NY: Oxford University Press; 2011:501–538. [Google Scholar]
  • 28. Gelman  A, Jakulin  A, Pittau  MG, et al.  A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008;2(4):1360–1383. [Google Scholar]
  • 29. Henneman  LRF, Choirat  C, Ivey  C, et al.  Characterizing population exposure to coal emissions sources in the United States using the HyADS model. Atmos Environ (1994). 2019;203:271–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Tessum  CW, Hill  JD, Marshall  JD. InMAP: a model for air pollution interventions. PLoS One. 2017;12(4):e0176131. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES