Abstract
A mediator is a factor that occurs after the exposure of interest, precedes the outcome of interest (i.e. between the exposure and the outcome) and is associated with both the exposure and the outcome of interest (i.e. is on the pathway between exposure and outcome). Mediation analyses can be valuable in many reproductive health contexts, as mediation analysis can help researchers to better identify, quantify and understand the underlying pathways of the association they are studying. The purpose of this commentary is to introduce the concept of mediation and provide examples that solidify understanding of mediation for valid discovery and interpretation in the field of reproductive medicine.
Introduction
In clinical and public health research, it can be useful to decompose an association between an exposure and outcome into component parts or pathways by conducting a mediation analysis. By understanding these pathways of association, we can better understand the causal structure of the association, and in doing so determine where to intervene and where future research is needed.
By way of example, we will first introduce a scenario in which mediation approaches would be useful. In reproductive endocrinology, we are interested in maximizing implantation and pregnancy while also minimizing the risk of adverse outcomes for the babies conceived using fertility treatment. IVF has been shown to be associated with preterm birth compared to spontaneous conception (Oberg et al., 2018). IVF also influences the probability of multiple gestations, with IVF pregnancies having a higher, albeit low, probability of multiples even when only one embryo is transferred. It is well established that multiple gestations increase the risk of preterm birth (Blondel et al., 2002). Thus, when counseling IVF patients, it would be helpful to know whether the higher risk of preterm birth among IVF patients is driven by underlying infertility diagnoses, multiple gestations or if there is something about the IVF procedure itself that increases preterm births. Mediation analysis can quantify and separate the effect of IVF itself from the effect of multiple gestations (a possible mediator) on preterm birth. The relative magnitudes of these effects may enable us to provide more precise counseling for patients regarding the relationship between IVF and the risk of preterm birth. Additionally, understanding this relationship may open future research directions related to both IVF and preterm birth. Therefore, utilizing statistical mediation approaches can help disentangle complex etiology and more effectively direct resources to improve health.
Mediation Definitions
At its simplest, a mediator is a factor that occurs after the exposure of interest, precedes the outcome of interest and is associated with both the exposure and the outcome of interest. In Fig. 1A, our exposure is represented by the letter E, our outcome is represented by the letter O and our mediator is represented by the letter M. This figure is constructed using the assumptions of directed acyclic graphs (DAGs) (Robins 1987; Pearl 1995; Greenland et al., 1999) and assumes that temporality and causal association move in the direction of the arrows. Thus, a mediator occurs after the exposure but before the outcome and is causally associated with both.
Figure 1A and B visually represent the multiple pathways that we are trying to disentangle with mediation analyses. Depending on the research question of interest and the assumptions we make, we can estimate different quantities related to these pathways. One pathway is shown going directly from our exposure, IVF, to our outcome, preterm birth. This pathway is called the direct effect and represents the effect that the exposure has on the outcome (Fig. 1A and B, bold line). Another pathway is through the mediator; this pathway is called the indirect effect and represents the effect that the exposure has on the outcome that occurs through the mediator (Fig. 1A and B, dashed lines).
Two distinct but related quantities that express the magnitude of the direct effect are called the natural direct effect and the controlled direct effect. The natural direct effect represents the average extent to which a person’s outcome would differ if they were exposed compared to if they were unexposed if the mediator did not change, and instead, in both settings the mediator was held constant at the value it would be if that person had been unexposed. To put another way, the natural direct effect is the effect the exposure has on the outcome that is independent of the mediator. In our example, the natural direct effect is quantifying what would happen if we blocked the associations through the mediator, that is, if IVF had no effect on multiple gestations or multiple gestations had no effect on preterm birth. The controlled direct effect, in contrast, is the effect the exposure has on the outcome if the mediator is set to the same value for every participant in the population (i.e. independent of exposure status). For example, we could ask specifically what the effect of IVF on preterm birth would be if every person had a singleton gestation. Whether we are interested in quantifying the natural or the controlled direct effect will depend on the question that we are attempting to answer with the study design and statistical analyses. Controlled direct effects may be of greater interest in policy evaluation because the analysis incorporates intervening on (i.e. removing if harmful or making ubiquitous if beneficial) the mediator (VanderWeele, 2016).
The controlled direct effect, however, does not have an equivalent indirect effect, while the natural direct effect does: the natural indirect effect. This quantity reflects, on average, how a person’s outcome would differ if their mediator value changed from what it would be if they were exposed to what it would be if they were not, but the exposure itself did not change. Although the natural effects have complex interpretations, they imply our intuitive understanding of mediation: when combined, the natural direct and natural indirect effects comprise the total effect. In linear regression, when combined, the natural direct and natural indirect effects sum to the total effect. For relative measures, like odds ratios (OR) and risk ratios (RR), the product of the natural direct and natural indirect effect is the total effect. This total effect corresponds to the typical relation that we estimate in an analysis that ignored the mediator and quantified only the overall exposure–outcome relationship.
In our example, we can estimate these informative effects. The total effect is the overall effect that IVF has on preterm birth. The total effect can be decomposed into the natural direct effect, which is the effect of IVF on preterm birth that occurs independent of multiple gestations (Fig. 1B: bold line), and the natural indirect effect, which is the effect of IVF on preterm birth that can be attributed to multiple gestations (Fig. 1B: dashed line).
Proportion Mediated
Another quantity we could estimate is the proportion mediated, which quantifies how much of the total association of interest we can attribute to the indirect effect through the mediator. In our example, this would be the proportion of the association between IVF and preterm birth that is explained by or is attributable to multiple gestations. The proportion mediated ranges from 0 to 100%, with 100% representing a scenario in which the entire effect of the exposure on the outcome is due to the mediator (occasionally, random variability or effects in opposite directions can result in proportions mediated outside the range of 0 to 100%). Thus, associations with a high proportion mediated indicate that the exposure has a large effect on the mediator, and the mediator has a large effect on the outcome (larger natural indirect effect). Conversely, associations with a low proportion mediated indicate that the exposure has a minor effect on the mediator, the mediator has a minor effect on the outcome or both (larger natural direct effect).
Confounding
It is important to recognize that a mediator differs from a confounder. A confounder, as described previously in this series of commentaries (Correia et al., 2020), is a variable that is associated with both the exposure and the outcome but occurs before the exposure. When a confounder is present and not addressed in the analysis or in the study design, the effect estimates may be biased. In Fig. 1A, a confounder C is associated with both our exposure and our outcome, but occurs before our exposure. Confounding and mediation are important but distinct concepts that require different statistical approaches and assumptions. Recognizing and addressing confounding is important to ensure unbiased effect estimates. Recognizing mediation can be important for disentangling pathways of association and constructing targeted interventions for improved clinical success. Furthermore, even if not conducting a formal mediation analysis, adjusting for an intermediate variable, like multiple gestation, in a model when there are confounders (measured or unmeasured) of multiple gestation and the outcome will lead to bias (Hernan et al., 2002). Defining the DAGs at the beginning of any study will help to clarify and confirm the variables and their temporal relationship, and confirm which must be approached as confounders or mediators.
Let us consider our initial example where we are interested in estimating the extent to which the association between IVF and preterm birth is mediated through multiple gestations (Fig. 1B). Multiple gestation is on the causal pathway between our exposure and our outcome, as multiple gestation is a consequence of IVF as opposed to something that precedes IVF. We also may be concerned that age is associated with both IVF and preterm birth. However, age is not on the causal pathway between IVF and preterm birth, as it precedes both IVF and preterm birth; thus, age should be considered a confounding variable and accounted for in the statistical analysis (Correia et al., 2020). The association between IVF and preterm birth may also be confounded by the underlying infertility diagnosis or severity, so incorporating information on infertility history as a potential confounder will help clarify the relationship between IVF and preterm birth.
Interpreting Mediation Analyses: A Hypothetical Example
In a hypothetical study, as shown in Table I, we can model the relationships between IVF utilization, multiple gestations and preterm birth to quantify the total effect as well as the natural direct effect and the natural indirect effect. Suppose we find that the total effect of IVF on preterm birth is 2.3 on the RR scale, and we see that the natural direct effect of IVF on preterm birth is attenuated (RR = 1.2) when compared to the total effect, suggesting that there is an effect of the mediator. The natural indirect effect has an RR of 1.9. When we quantify the proportion mediated, we see that 84% of the effect of IVF on preterm birth is mediated through multiple gestations. Thus, given the high proportion mediated, we can infer that in our hypothetical population, the majority of the relationship between IVF and preterm birth is due to the mediator, multiple gestations. Understanding these distinct components of the effect through mediation analyses can help clarify our understanding of how IVF is related to preterm birth. When counseling an IVF patient, this hypothetical data indicates that the effect of IVF on preterm birth is driven primarily by multiple gestations. When interpreting these hypothetical findings, it may be appropriate to more closely monitor, screen or intervene on IVF patients with multiple gestations, whereas less concern about preterm birth may be reasonable for IVF patients with singleton gestations.
Table I.
RR of preterm birth | |
---|---|
Total effect of IVF on preterm birth | 2.3 |
Natural direct effect (independent of multiple gestations) | 1.2 |
Natural indirect effect (attributed to multiple gestations) | 1.9 |
Proportion mediated by multiple gestations* | 84% |
RR: risk ratio
*Calculated as RRNDE(RRNIE − 1)/(RRNDE × RRNIE − 1) (Vanderweele and Vansteelandt 2010).
Additional Assumptions of Mediation Analysis
As with other types of analysis, the typical study design and statistical assumptions, which have been covered in other commentaries (Correia et al., 2020; Dodge et al., 2020), are required to make appropriate inferences in mediation analysis as well. As with all causal inference models, we must assume that there are no unmeasured confounders of the exposure–outcome relationship (Correia et al., 2020). However, for mediation, we must satisfy additional confounding assumptions. These differ depending on the types of effects we are interested in estimating (VanderWeele, 2015). To estimate the natural direct and natural indirect effects, we must assume the following regarding the confounding structure of our research question:
(i) no unmeasured confounders of the exposure–outcome relationship
(ii) no unmeasured confounders of the exposure–mediator relationship
(iii) no unmeasured confounders of the mediator–outcome relationship
(iv) no mediator–outcome confounder that is affected by exposure
Consider again our example of the relationship between IVF and preterm birth. We discussed the possibility of age confounding the association between IVF and preterm birth. We may also be concerned that age is associated with multiple gestations, independent of conception by IVF (Knopman et al., 2014; Busnelli et al., 2019). In that circumstance, age would be a potential confounder of the primary relationship between IVF and preterm birth, but it would also be a confounder of the exposure to mediator relationship and a confounder of the mediator to outcome relationship (Fig. 2). We could also imagine a scenario where there was additional confounding both between the exposure and the mediator (Ue) and/or between the mediator and the outcome (Um) (Fig. 2). If we ignored these confounders while applying mediation analyses, the effect estimates we produced would be incorrect. If these confounding relationships exist, we would have residual confounding between our exposure and our mediator and also between our mediator and our outcome. This potential confounding will violate the additional assumptions required of mediation and lead to incorrect estimates if not addressed in the analysis.
The appropriate method for mediation analysis depends on the confounding structure and on the research question. Additionally, for valid mediation analysis, it is important to understand whether there is interaction between our exposure and mediator, that is, whether the effect of the exposure on the outcome varies by the level of the mediator and whether the effect of the mediator on the outcome varies by level of the exposure. In a scenario where exposure–mediator interaction is present, classical regression mediation approaches may not yield equivalent findings as counterfactual causal inference approaches to mediation (VanderWeele, 2015).
Mediation Analyses
Historically, several classic regression approaches have been utilized to confirm and quantify mediation and conduct mediation analyses, including the difference method and the product method (Baron and Kenny, 1986; Mackinnon et al., 1995; Judd et al., 2001; VanderWeele, 2016). In recent years, epidemiologic and statistical methodologists have placed greater emphasis on mediation to yield causal effect estimates (e.g. RR, OR) corresponding with these pathways, and the simple regression approaches have been generalized to allow for causal interpretation under certain assumptions (Valeri and Vanderweele, 2013; VanderWeele, 2016). Other methods for causal mediation are valid under different modeling or causal assumptions and include weighting (Lange et al., 2012), simulation (Imai et al., 2010), imputation (Vansteelandt et al., 2012) and parametric g-formula (VanderWeele and Tchetgen Tchetgen, 2017) approaches. The details of these statistical approaches are beyond the scope of this commentary; however, we encourage readers to seek advanced training and utilize multidisciplinary teams that include collaborators with this skillset to conduct mediation analyses.
Concluding Remarks
Mediation analyses are a useful tool to consider when addressing research questions. However, the required assumptions differ from those of traditional exposure–outcome approaches that account for confounding, and these assumptions must be addressed to yield valid estimates (VanderWeele, 2015; VanderWeele, 2016). Research disciplines differ in their analytic approaches to quantify meditation (VanderWeele, 2016). However, in some scenarios, these methods can yield equivalently valid estimates. The purpose of this primer is to recognize the utility of mediation in many contexts of reproductive health. Questions are often posed or results interpreted in reproductive medicine that are, in fact, incorporating mediation without realizing that that invocation means that these specific design and analysis tools should be applied. Unless a researcher is specifically addressing a mediation hypothesis, it would not be appropriate to add a mediator to your regression model. Additional resources regarding the assumptions, strengths and differences of these approaches have been published in the statistical methods literature (Valeri and Vanderweele, 2013; VanderWeele, 2015). We hope this brief commentary supports readers as they consider mediation, and we again encourage readers to seek advanced training around mediation and utilize multidisciplinary teams to conduct mediation analyses.
Authors’ roles
L.V.F., K.F.B.C., L.E.D., M.R.H. and S.A.M. contributed to the conception, drafting of the manuscript and final approval. A.M.M., P.L.W., L.H.S. and T.L.T. contributed to the interpretation of the data, revising of the manuscript critically for important intellectual content and final approval.
Funding
2 L50 HD085412-03 (L.E.D.).
Conflict of interest
L.V.F. received a consultant fee from Ovia Health and had conference travel and an honorarium paid by Merck & Co. S.A.M. has received a consulting fee for service as an Advisory Board member for the Endometriosis Disease Burden and Endometriosis International Steering Committee working groups of AbbVie, Inc. No other conflicts of interest have been reported.
References
- Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51:1173–1182. [DOI] [PubMed] [Google Scholar]
- Blondel B, Kogan MD, Alexander GR, Dattani N, Kramer MS, Macfarlane A, Wen SW. The impact of the increasing number of multiple births on the rates of preterm birth and low birthweight: an international study. Am J Public Health 2002;92:1323–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busnelli A, Dallagiovanna C, Reschini M, Paffoni A, Fedele L, Somigliana E. Risk factors for monozygotic twinning after in vitro fertilization: a systematic review and meta-analysis. Fertil Steril 2019;111:302–317. [DOI] [PubMed] [Google Scholar]
- Correia KFB, Dodge LE, Farland LV, Hacker MR, Ginsburg E, Whitcomb BW, Wise LA, Missmer SA. Confounding and effect measure modification in reproductive medicine. Hum Repro 2020. doi: 10.1093/humrep/deaa051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dodge LE, Farland LV, Correia KFB, Missmer SA, Seidler EA, Wilkinson J, Modest AM, Hacker MR. Choice of statistical model in observational studies of ART. Hum Repro 2020. doi: 10.1093/humrep/deaa050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37–48. [PubMed] [Google Scholar]
- Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155:176–184. [DOI] [PubMed] [Google Scholar]
- Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods 2010;15:309–334. [DOI] [PubMed] [Google Scholar]
- Judd CM, Kenny DA, McClelland GH. Estimating and testing mediation and moderation in within-subject designs. Psychol Methods 2001;6:115–134. [DOI] [PubMed] [Google Scholar]
- Knopman JM, Krey LC, Oh C, Lee J, McCaffrey C, Noyes N. What makes them split? Identifying risk factors that lead to monozygotic twins after in vitro fertilization. Fertil Steril 2014;102:82–89. [DOI] [PubMed] [Google Scholar]
- Lange T, Vansteelandt S, Bekaert M. A simple unified approach for estimating natural direct and indirect effects. Am J Epidemiol 2012;176:190–195. [DOI] [PubMed] [Google Scholar]
- Mackinnon DP, Warsi G, Dwyer JH. A simulation study of mediated effect measures. Multivar Behav Res 1995;30:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oberg AS, VanderWeele TJ, Almqvist C, Hernandez-Diaz S. Pregnancy complications following fertility treatment-disentangling the role of multiple gestation. Int J Epidemiol 2018;47:1333–1342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearl J. Causal diagrams for empirical research. Biometrika 1995;82:669–688. [Google Scholar]
- Robins J. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. J Chronic Dis 1987;40 Suppl 2:139s–161s. [DOI] [PubMed] [Google Scholar]
- Valeri L, Vanderweele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods 2013;18:137–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction. New York: Oxford University Press, 2015. [Google Scholar]
- VanderWeele TJ. Mediation analysis: a practitioner’s guide. Annu Rev Public Health 2016;37:17–32. [DOI] [PubMed] [Google Scholar]
- VanderWeele TJ, Tchetgen Tchetgen EJ. Mediation analysis with time varying exposures and mediators. J R Stat Soc Series B Stat Methodology 2017;79:917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol 2010;172:1339–1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vansteelandt S, Bekaert M, Lange T. Imputation strategies for the estimation of natural direct and indirect effects. Epidemiologic Methods 2012;131. [Google Scholar]