Abstract
The very insightful and clear paper by VanderWeele and Vansteelandt in this issue of the Journal (Am J Epidemiol. 2010;172(12):1339–1348) bridges the gap between biostatistics methodologists focusing on causal methods for mediation analyses and the practitioners of mediational analyses to the benefit of both groups. In an effort to continue the bridging of this gap, this invited commentary relates the important issue of “natural direct effects” to the well-known epidemiologic method of direct standardization. Additionally, attention is paid to the importance of temporal sequencing to help substantiate the mediation relations among the exposure, mediation, and outcome. A crucial mathematical distortion under the logistics model, called “absence of collapsibility,” is noted in motivating VanderWeele and Vansteelandt's use of the log-linear model for comparing the effect of exposure adjusted for the mediator with the effect of exposure unadjusted for the mediator. It is also noted that this issue applies to one approach to assessing confounding. Finally, some issues are raised for consideration when testing the interaction between the exposure and mediator before assessing mediation.
Keywords: collapsibility, confounding, epidemiologic methods, logistic regression, log-linear models, standardization
Drs. VanderWeele and Vansteelandt (1) have covered much ground in presenting mediation analyses for binary outcomes in the context of case-control and cohort studies. First, with their presentation of these methods for understanding statistical mediation, they have highlighted the importance of understanding the conceptual mechanisms of how exposures to risk factors or treatments impact outcomes. Second, they have presented the concept of 2 different types of mediation analyses, controlled and natural effects, along with an innovative technical approximation for the latter. With these effects, they have highlighted the approaches for determining mediation and noted some of the difficulties in doing so. Third, one of the difficulties arises from the assumptions underlying these mediation approaches. They have clarified the assumptions that are needed for unbiased mediation inference, especially with respect to natural effects under standard mediation models in general and logistic/log-linear models in particular. Understanding and, if possible, evaluating such assumptions are crucial in understanding the vulnerabilities of mediation analyses and, therefore, investigations of mechanisms of risk factor or treatment processes. Finally, they have shown the need to address exposure–mediation interactions and how they can be resolved by using natural effects rather than controlled effects.
With respect to the importance of understanding the mechanism from mediation analyses, ideally, one would like clear temporal relations among the exposure, mediator, and outcome to at least ensure the directions of the arrows in VanderWeele and Vansteelandt's Figures 1 and 2 (1). The example mold data were based on a cross-sectional survey, so that the directions of the paths in terms of the arrows are not ensured and are based strictly on the conceptual model of the investigators that perceived control of the home is influenced by mold and not vice versa. Similarly, depression is influenced by perceived home control and mold and not vice versa. One could reverse the directions in that high perceived home control could lead to efforts to reduce the mold problem. Further, depression could stifle perception of home control that, in turn, would reduce efforts to control mold.
Although the authors (1) do a very admirable job distinguishing between direct and natural effects mathematically, especially in terms of assumptions as well as conceptually, there has been some ambiguity about when to use these different effects, especially if they are the same under linear and log-linear models without interactions. One may argue that natural effects should be used because they accommodate interactions and nonlinear relations such as the logistic model. Nonetheless, the prevailing use of controlled effects in the social science literature, as well as in the biostatistics journals (including papers by this discussant), suggests that it may take time for natural effects to take hold in practice. It turns out that the authors provide clinically insightful interpretations of natural effects in a very nice review paper on mediation with clear interpretations (2).
In trying to tie the current causal approaches to mediation with more established epidemiologic methods, the natural direct effect is an extension in several ways of direct standardization. Direct standardization has traditionally been used to adjust for confounding via a difference between weighted estimates of rates, risks, or proportions, where the weights are probabilities corresponding to a discrete distribution of a confounder. In the case of the natural direct effect as presented by the authors (1), the “standard” distribution is the mediator distribution for the unexposed or untreated group. Moreover, the natural effect is a model-based risk ratio instead of a difference in standardized risks. In the case of the standardization, the weighted sum is used to equalize the 2 exposure groups with respect to the confounder. Likewise, in the case of the natural direct effect, the integral in the authors' Appendix is used to equalize the 2 exposure groups with respect to the continuous distribution of the mediator.
Others (3, 4) have presented natural direct effects more in line with direct standardization. Instead of defining the natural direct effect as a log risk ratio parameter under a model as the authors (1) have done, Huang et al. (3) defined the natural direct effect in terms of an odds ratio computed from standardized probabilities where the standard distribution of the mediator was the distribution for the unexposed or untreated group. Janes et al. (4) used the same approach to adjust the odds ratio for a confounder.
The second analogy to more traditional epidemiologic methods relates to the proportion explained 2 paragraphs above equation 7 in VanderWeele and Vansteelandt's article (1), which in the mediation context is the indirect effect under the linear and log-linear models. It is also used to assess confounding when comparing the effect of exposure with and without adjusting for a potential confounder (5). If this difference is greater than a particular threshold (e.g., 15%), then confounding is indicated.
These similarities between methods for assessing and adjusting for confounding and mediation highlight the fact that the distinction between a confounder and mediator is based on their different temporal relations with respect to the exposure or treatment variable. The distinction can be seen in the pathways involving C and M in Figure 1 (1). Without temporal ordering as in the case of the mold example, the distinction between C and M is a conceptual one with no information from observed data.
With these similarities between confounding and mediation in mind, both are subject to the same mathematical limitations of the logistic model. Although VanderWeele and Vansteelandt (1) do not explicitly mention it, their approximation based on the log-linear model for estimating the natural effects circumvents a mathematical limitation of the model known as the absence of “collapsibility.” Under the nonlinearity of the logistic model, the integral (expectation in the Appendix with respect to the distribution of the mediator in the last equation of the Appendix (1)) of the probability under a logistic model with log odds ratios for the intervention and mediator to obtain the marginal probability without adjustment for the mediator would produce an effect of the exposure, φ1, which is not an odds ratio. Consequently, the resulting φ1 parameter would not be comparable to the Θ1 parameter with the mediator in the model. A similar limitation has been shown with respect to potential confounders (6–8). This distortion does not happen under the log-linear model in the Appendix.
This mathematical distortion under the logistic model impacts the interpretation of the “proportion explained” in the paragraph immediately preceding equation 7 (1). That is, under the logistic model, φ1 − Θ1 may still be different from zero even when there is no mediation (or confounding if M were a confounder). This is due to the fact that, as long as there is a relation between M and Y, φ1 may be different from Θ1 even when M is not related to X at all (no mediation or no confounding). Similar distortions occur for the Cox model. The log-linear approximation by the authors circumvents this problem because the distortion does not occur with the log-linear model (9).
Finally, the authors (1) address the important mediation assumption of no mediator–exposure interaction. It turns out that, under certain unobserved confounding assumptions, identification and bias problems may not be as problematic even when the baseline intervention impacts the mediator, in contrast to research by others (e.g., Kraemer et al. (10)).
Acknowledgments
Author affiliation: Center for Clinical Epidemiology and Biostatistics, Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania (Thomas Ten Have).
T. T. H. received funding from R01MH078016 for this review.
Conflict of interest: none declared.
References
- 1.VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol. 2010;172(12):1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.VanderWeele TJ, Vansteelandt S. Conceptual issues concerning mediation, interventions and composition. Stat Interface. 2009;2:457–468. [Google Scholar]
- 3.Huang B, Sivaganesan S, Succop P, et al. Statistical assessment of mediational effects for logistic mediational models. Stat Med. 2004;23(17):2713–2728. doi: 10.1002/sim.1847. [DOI] [PubMed] [Google Scholar]
- 4.Janes H, Dominici F, Zeger S. On quantifying the magnitude of confounding. Biostatistics. 2010;11(3):572–582. doi: 10.1093/biostatistics/kxq007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rothman K, Greenland S. Modern Epidemiology. 2nd ed. Philadelphia, PA: Lippencott-Raven; 1998. [Google Scholar]
- 6.Gail M, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika. 1984;71(3):431–444. [Google Scholar]
- 7.Robins J, Rotnitzky A. Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika. 2004;91(4):763–783. [Google Scholar]
- 8.Vansteelandt S, Goetghebeur E. Causal inference with generalized structural mean models. J R Stat Soc Series B Stat Methodol. 2003;65(part 4):817–835. [Google Scholar]
- 9.Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44(4):1049–1060. [PubMed] [Google Scholar]
- 10.Kraemer HC, Wilson GT, Fairburn CG, et al. Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry. 2002;59(10):877–883. doi: 10.1001/archpsyc.59.10.877. [DOI] [PubMed] [Google Scholar]