Prior work on mediation analysis has considered the estimation of direct and indirect effects using parametric models with a binary outcome (1–4) and has considered case-control designs within either a causal (1) or path-diagram (5) framework. Here we discuss the use of approaches for causal mediation analysis when applied to a matched case-control study design. We assume throughout that the outcome is rare; this assumption is needed for closed-form analytical formulas for the direct and indirect effects described below, and the assumption cannot be relaxed under incidence density sampling for the formulas below to remain valid.
For cohort data, with a binary outcome Y and a binary mediator M, with exposure A, and with baseline covariates C, consider the following models:
| (Model 1) |
| (Model 2) |
Under the assumptions that, on a causal diagram (6), conditional on the measured covariates C, there is 1) no exposure-outcome confounding, 2) no mediator-outcome confounding, and 3) no exposure-mediator confounding and 4) there is no mediator-outcome confounder affected by the exposure, the natural direct effect (NDE) and natural indirect effect (NIE) on an odds ratio (OR) scale are given approximately (2) by
For a binary outcome and a continuous mediator under models
| (Model 3) |
where M is normally distributed conditional on A and C, with constant variance σ2, under assumptions 1–4, the natural direct and indirect effects, conditional on C = c, are given approximately (1) by
The expressions are sometimes evaluated at the mean level of the covariates C. The assumption of normally distributed M is only necessary for the NDE and only if there is exposure-mediator interaction so that θ3 ≠ 0, and even then this assumption can sometimes be relaxed (3).
VanderWeele and Vansteelandt (1) noted that the approach to mediation analysis described above is applicable in unmatched case-control designs if the mediator model is fitted among the control subjects, since the control subjects constitute either a sample of the underlying population (under incidence density sampling) or, under a rare outcome, a close approximation of the underlying population (if controls are sampled from the noncases).
In a matched case-control design, controls are matched to cases on a subset of the measured covariates C. The argument generally given for such matching is one of efficiency (7), and it is still possible to fit the outcome model for Y and to estimate (θ1, θ2, θ3) with such matched case-control data. However, with the mediator model in this setting, the matched controls no longer constitute a sample of the underlying population. One way around this would be to fit the mediator model to the data from the underlying sample of controls from which the matched controls were drawn, if such data were already available prior to the matching. Suppose, instead, that the mediator model is fitted to the matched controls. Under model 2 or model 3, it is still possible to obtain valid estimates of β2 for both the components of C that are matched to the cases and those that are not. Provided that, for the underlying population, the error term in the regression model (model 3) is normally distributed with constant variance σ2, conditional on the covariates, then it is also possible to use the error variance from the regression model (model 3), fitted among the matched controls, as an estimate of σ2. It is thus possible to estimate all of the parameters in the expressions for the direct and indirect effects above and thus to estimate natural direct and indirect effects conditional on the covariates C = c. This allows for the use of standard causal mediation analysis software (2), even with matched case-control designs.
With the approach to matched case-control designs described above, the conditional direct and indirect effects could be reported at any specific covariate level or at several covariate levels. However, the practice of setting the covariates C = c to their average value among the controls in the direct and indirect effect expressions, in a matched case-control study, does need to be interpreted with some caution. This is because the average value of the covariates among the matched controls will not equal their average value for the population, because some of the covariates have been matched to the cases. One way around this would again be to fit the mediator model to the data from the underlying sample of controls from which the matched controls were drawn. Alternatively, if the mean values of the covariates for the controls were otherwise known, these could be used in the direct and indirect effect expressions above. Finally, if the mean values of the covariates among the matched controls were used, then the direct and indirect effect estimates could still be interpreted as conditional direct and indirect effects, conditional on the covariates taking the value of the mean level of the matched controls. If this were done, it would be good to report those mean covariate values along with the direct and indirect effect estimates themselves so that a reader could appropriately interpret the effects.
Acknowledgments
The research was supported by National Institutes of Health grants ES017876 and AI104459.
Conflict of interest: none declared.
References
- 1.VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol. 2010;17212:1339–1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Valeri L, VanderWeele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;182:137–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Tchetgen Tchetgen EJ. A note on formulae for causal mediation analysis in an odds ratio context. Epidemiol Methods. 2014;21:21–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction. New York, NY: Oxford University Press; 2015. [Google Scholar]
- 5.Wang J, Spitz MR, Amos CI et al. Method for evaluating multiple mediators: mediating effects of smoking and COPD on the association between the CHRNA5-A3 variant and lung cancer risk. PLoS One. 2012;710:e47705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pearl J. Causality. 2nd ed New York, NY: Cambridge University Press; 2009. [Google Scholar]
- 7.Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed Philadelphia, PA: Lippincott Williams & Wilkins; 2008. [Google Scholar]
