Skip to main content
Contemporary Clinical Trials Communications logoLink to Contemporary Clinical Trials Communications
. 2017 Jun 22;7:130–135. doi: 10.1016/j.conctc.2017.06.005

Comparison of methods for the analysis of relatively simple mediation models

Judith JM Rijnhart a,, Jos WR Twisk a, Mai JM Chinapaw b, Michiel R de Boer c, Martijn W Heymans a
PMCID: PMC5898549  PMID: 29696178

Abstract

Background/aims

Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction.

Methods

Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework.

Results

OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction.

Conclusions

Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.

Keywords: Mediation analysis, Indirect effect, Ordinary least square regression, Structural equation modeling, Potential outcomes framework, Cross-sectional data

Abbreviations: OLS, ordinary least square; SEM, structural equation modeling; BMI, body mass index; SBC, sweetened beverages consumption; CI, confidence interval; SE, standard error; FIML, full-information maximum likelihood

1. Introduction

Statistical mediation analysis is an important statistical tool in the field of clinical trials. Many studies use statistical mediation analysis to unravel the pathways underlying the effect of an intervention on a particular outcome variable [1], [2], [3]. With statistical mediation analysis the total effect of an intervention on an outcome variable is decomposed into a direct and indirect effect. The indirect effect goes through a mediator variable (a and b paths in Fig. 1), and the remaining effect reflects the direct effect (c’ path in Fig. 1) [4]. Therefore, mediation analysis is useful for determining which mediator variables may be targeted by the intervention and thus play a role in the treatment effect.

Fig. 1.

Fig. 1

Path diagram of a relatively simple mediation model.

In 1981, Judd and Kenny proposed the use of the sequence of regression equations (1), (2), (3) for statistical mediation analysis [5]:

Y=i1+cX+ε1 (1)
M=i2+aX+ε2 (2)
Y=i3+cX+bM+ε3 (3)

where in equation (1), c represents the total effect of the exposure variable X on the outcome variable Y. In equation (2), a represents the effect of the exposure variable X on the mediator variable M. In equation (3), c' represents the direct effect of the exposure variable X on the outcome variable Y, and b represents the effect of the mediator variable M on the outcome variable Y. In all three equations i represents the intercept and ε represents the error term. Based on the coefficients from these three equations, the indirect effect can be calculated as the product of the a and b coefficients or as the difference between the c and c’ coefficients. Furthermore, the proportion mediated can be calculated as either ab/(ab + c’), ab/c, or 1-(c’/c) [6].

Equations (1), (2), (3) can be fitted using ordinary least square (OLS) regression, which is often used within epidemiology, or structural equation modeling (SEM), which is often used within psychology [7]. Another regression-based method for statistical mediation analysis is the potential outcomes framework. The aim of this framework is to enhance causal inferences about the mediation model [8]. Ideally, causal inferences should be based on a comparison of a subjects' value of the mediator and outcome variable under both exposure levels [9]. However, in practice the values of the mediator and outcome variable are only measured under the observed exposure level. The mediator and outcome values under the other exposure level remains unobserved. The potential outcomes framework provides definitions of causal effects that can be used to decompose the total effect of an exposure variable on an outcome variable into causal direct and indirect effects, without requiring the measurement of mediator and outcome values under both exposure levels for each subject [9]. These definitions are based on the coefficients in equations (2), (3).

With the availability of several methods for statistical mediation analysis, the question arises which method for statistical mediation analysis should be preferred. Although a previous study did compare the results from OLS regression with SEM [10], so far the results from OLS regression and SEM have not been compared with the results from the potential outcomes framework. Therefore, the aim of this paper is to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework. To do this, we used the three methods to estimate the mediated effect in three mediation models with a continuous mediator and outcome variable: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction.

2. Methods

2.1. Data example

The data example in this paper comes from a randomized controlled trial assessing the effect of an intervention aiming to prevent unhealthy weight gain among school-aged children [11], [12]. In this trial, 546 schoolchildren were randomized to either the experimental (n = 285) or control condition (n = 261). The main outcome in this trial was the change in body mass index (BMI). The association between the intervention and the change in BMI appeared to be mediated by the change in sweetened beverages consumption (SBC) [13]. The mediator and outcome variable were both measured at baseline and after eight months and for both variables standardized residual change scores were used in the mediation analyses, to be able to take into account the baseline values of these variables.

2.2. Methods for statistical mediation analysis

2.2.1. Ordinary least square regression

With OLS regression, equations (1), (2), (3) (see Section 1) are fitted as three separate regression models. The regression coefficients in these models are estimated by minimizing the sum of the squared deviations of each observation to the regression line [14]. The indirect effect based on the product of the a and b coefficients and the indirect effect based on the difference between the c and c’ coefficients will be the same when the mediator and outcome variable are both continuous [15]. Furthermore, also the three methods for calculating the proportion mediated (ab/(ab + c’), ab/c, and 1-(c’/c)) will be the same when the mediator and outcome variable are both continuous [6]. Several methods have been proposed for the calculation of a confidence interval (CI) for the indirect effect. The most often used methods are Sobel's CI, the percentile bootstrap CI, and the distribution of the product CI [16].

2.2.2. Structural equation modeling

With SEM, equations (2), (3) (see Section 1) are fitted simultaneously as one model. SEM models are based on maximum likelihood estimation, which is an iterative estimation procedure maximizing the agreement between the predicted and the observed covariance matrix [17]. When only equations (2), (3) are fitted, the indirect effect can be calculated as the product of the a and b coefficients. Furthermore, the total effect of the exposure variable on the outcome variable can be calculated as the summation of the direct and indirect effect (ab + c’), and the proportion mediated as the indirect effect divided by the total effect ab/(ab + c’). As in OLS regression, Sobel's CI, the percentile bootstrap CI, and the distribution of the product CI can also be calculated for the indirect effect estimated in SEM [16].

2.2.3. Potential outcomes framework

There are two approaches available for the potential outcomes framework, an analytical and a simulation-based approach [18]. Both approaches use two regression models based on equations (2), (3) (see Section 1) as input for calculating the causal direct and indirect effect and will generally lead to the same results. The only R package that offers the potential outcomes framework for mediation analysis employs the simulation-based approach [19]. Since we used this R package to analyse the data example in this paper, we will limit our explanation of the potential outcomes framework to the simulation-based approach. Information on the analytical approach can be found elsewhere [18].

Within the simulation-based approach, first, a pre-specified number of bootstrap samples with replacement from the original data set are drawn [8]. After this, two new exposure variables are added to each bootstrap sample; one representing the intervention level, assigning the same value to all subjects, e.g. 1, and one representing the control level, again assigning the same value to all subjects, e.g. 0. Then, an OLS model based on equation (2) is are fitted to each bootstrap sample. Based on this model, the value of the mediator variable is simulated for both the treatment and control level. Where M(0) denotes the simulated value of the mediator variable for the control level, and M(1) denotes the simulated value of the mediator variable for the intervention level. These two simulated values of the mediator variable for each subject for both the treatment and control level are added as new variables to each bootstrap sample.

Then an OLS model based on equation (3) is fitted to each bootstrap sample. Based on this model, the value of the outcome variable is simulated for four combinations of the exposure and mediator values. Where Y(0,M(0)) denotes the simulated value of the outcome variable for the control level of the exposure variable and the simulated mediator value for the control level, Y(0,M(1)) denotes the simulated value of the outcome variable for the control level of the exposure variable and the simulated mediator value for the intervention level, Y(1,M(0)) denotes the simulated value of the outcome variable for the intervention level of the exposure variable and the simulated mediator value for the control level, and Y(1,M(1)) denotes the simulated value of the outcome variable for the intervention level of the exposure variable and the simulated mediator value for the intervention level. These four predicted values of the outcome variable are also added as new variable to each bootstrap sample.

The direct and indirect effect for each level of the exposure variable separately are estimated for each subject in each bootstrap sample. The direct effect is calculated by subtracting the values of the outcome variable under both exposure levels from each other, while the value of the mediator variable is held constant at the exposure level of interest [8]. So, for the control level the direct effect is calculated as Y(1,M(0))Y(0,(M(0)), and for the intervention level as Y(1,M(1))Y(0,(M(1)). The direct effect for the exposure level of interest is the average of the direct effects in all bootstrap samples. The overall direct effect is the average of the two direct effects for both exposure levels. When there is no exposure-mediator interaction, the two direct effects for the two exposure levels will be the same. The indirect effect is calculated by subtracting the value of the outcome variable under the mediator variable for both exposure levels, while the value of the exposure variable is held constant at the exposure level of interest. So, for the control level the indirect effect is calculated as Y(0,M(1))Y(0,(M(0)), and for the intervention level as Y(1,M(1))Y(1,(M(0)). The indirect effect for the exposure level of interest is the average of all individual indirect effects in all bootstrap samples. The overall indirect effect is the average of the two indirect effects under both exposure levels. When there is no exposure-mediator interaction, the two indirect effects for the two exposure levels will be the same. The percentile bootstrap CI can be constructed for the indirect effect. The total effect equals the sum of the overall direct and indirect effect. The proportion mediated can be calculated as the ratio of the indirect effect to the total effect [20].

2.3. Statistical analyses

We performed all statistical analyses with R statistical software version 3.1.1 [21]. The R package ‘lavaan’ was used to apply SEM [22], and the R package ‘mediation’ to apply the potential outcomes framework [19]. The percentile bootstrap CIs were estimated using the R package ‘boot’ and 5000 bootstrap resamples [23]. The distribution of the product CI was estimated using the R package ‘Rmediation’ [24].

We compared the estimates of the total, direct, and indirect effect with corresponding standard errors (SEs), and the proportion mediated across OLS regression, SEM, and the potential outcomes framework in three mediation models; 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. For the crude model we also compared Sobel's CI, the percentile bootstrap CI, and the distribution of the product CI for the indirect effect across and within the three methods for statistical mediation analysis.

In the confounder-adjusted model we assessed the potential confounding effect of the daily average of minutes of active transport to school, e.g. biking or walking, at baseline on the effect estimates in the mediation model. To adjust the effect estimates for confounding, we added the confounder variable to equations (1), (2), (3) (see Section 1) [25], [26]. After this, we compared the effect estimates from the confounder-adjusted model with the crude model to assess the influence of the confounder on the effect estimates. Furthermore, to investigate exposure-mediator interaction, we added an exposure-mediator interaction term to equation (3) [19]. This interaction term was computed as the multiplication of the exposure and mediator variable. A significant exposure-mediator interaction term indicates that the relationship between the mediator and outcome variable, and thus the indirect effect, is different for the two levels of the exposure variable.

3. Results

3.1. Crude model

Table 1 shows the coefficients with corresponding SEs for the mediation model with the intervention as the exposure variable, SBC as the mediator variable, and BMI as the outcome variable. It can be seen that OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates and SEs. The proportion of the total effect mediated was therefore also the same across the three methods.

Table 1.

Crude coefficients and standard errors (SEs) yielded by the three compared methods.

Tested pathway Effect estimate OLS regression SEM Potential outcomesa,b
Intervention → BMI Total effect c −0.17 (0.09) −0.17 (0.09) −0.17
Intervention → SBC a coefficient −0.44 (0.08) −0.44 (0.08) −0.44 (0.08)
SBC → BMI | Interventionc b coefficient 0.06 (0.04) 0.06 (0.04) −0.06 (0.04)
Intervention → BMI | SBCc Direct effect c’ −0.15 (0.09) −0.15 (0.09) −0.15
Intervention → SBC → BMI Indirect effectd −0.02 (0.02) −0.02 (0.02) −0.02
Proportion mediated 11.7% 11.7% 11.7%

OLS: ordinary least square; SEM: structural equation modeling; SE: standard error; BMI: body mass index; SBC: sweetened beverages consumption.

a

The estimation of SEs for the indirect and total effect is not facilitated within the R package ‘mediation’.

b

The a and b coefficients are derived from the mediator and outcome model that serve as input for the ‘mediate’ function in the R package ‘mediation’.

c

The vertical bar represents a conditional statement, which means that the effect depicted in front of the vertical bar is adjusted for the variable after the vertical bar.

d

Sobel's SE is presented for the indirect effect estimated within OLS regression and SEM.

The results in Table 1 can be interpreted as follows: schoolchildren in the intervention group had a smaller increase in BMI eight months after the intervention than schoolchildren in the control group (total effect c). Furthermore, the decrease in SBC after eight months was higher in the intervention group (a coefficient). A change in SBC was associated with a change in BMI (b coefficient). The direct effect of the intervention on change in BMI was −0.15, and the indirect effect of the intervention on change in BMI through the change in sweetened beverage consumption was −0.02. The proportion of the total effect of the intervention on BMI mediated by SBC was 11.7%. However, since the indirect effect is close to zero, the part of the total effect that is mediated by the sweetened beverage consumption might not be of clinical importance.

Table 2 shows the 95% CIs for the indirect effect yielded by each method. Sobel's CI and the distribution of the product CI are both not implemented within the R package ‘mediation’ and were therefore not estimated for the indirect effect yielded by the potential outcomes framework. Sobel's CI and the percentile bootstrap CI ranged from −0.06 to 0.01 and the distribution of the product CI from −0.07 to 0.01. The 95% CIs did therefore not differ across the compared methods and only slightly within the compared methods.

Table 2.

95% Confidence Intervals (CIs) for the indirect effect yielded by the three compared methods.

OLS regression SEM Potential outcomesa
Sobel's −0.06 to 0.01 −0.06 to 0.01 Not available
Percentile bootstrap −0.06 to 0.01 −0.06 to 0.01 −0.06 to 0.01
Distribution of the product −0.07 to 0.01 −0.07 to 0.01 Not available

OLS: ordinary least square; SEM: structural equation modeling.

a

Sobel's confidence interval and the distribution of the product confidence interval are not implemented within the R package ‘mediation’.

3.2. Confounding-adjusted model

Table 3 shows the coefficients with corresponding SEs for the mediation model adjusted for confounding by the daily average of minutes of active transport to school at baseline. As in the crude model, OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates and SEs. The results in Table 3 can be interpreted in a similar was as the results in Table 1. So for example, the direct effect of intervention on change in BMI was −0.12 and the indirect effect through SBC was −0.03 after adjustment for confounding by the daily average of minutes of active transport to school at baseline.

Table 3.

Confounder-adjusted coefficients and standard errors (SEs) yielded by the three compared methods.

Effect estimate OLS regression SEM Potential outcomesa,b
Total effect c −0.15 (0.09) −0.15 (0.09) −0.15
a coefficient −0.45 (0.08) −0.45 (0.08) −0.45 (0.08)
b coefficient 0.06 (0.04) 0.06 (0.04) −0.06 (0.04)
Direct effect c −0.12 (0.09) −0.12 (0.09) −0.12
Indirect effectc −0.03 (0.02) −0.03 (0.02) −0.03
Proportion mediated 20.0% 20.0% 20.0%

OLS: ordinary least square; SEM: structural equation modeling; SE: standard error.

a

The estimation of SEs for the indirect and total effect is not facilitated within the R package ‘mediation’.

b

The a, b and interaction coefficient are derived from the mediator and outcome model that serve as input for the ‘mediate’ function in the R package ‘mediation’.

c

Sobel's SE is presented for the indirect effect estimated within OLS regression and SEM.

When comparing the effect estimates from the crude model in Table 1 with the effect estimates form the confounder-adjusted model in Table 3, we observe that the total effect decreases from −0.17 to −0.15. Furthermore, the direct effect decreases from −0.15 to −0.12, while the indirect effect increases from −0.02 to −0.03. Finally, the proportion mediated increases from 11.7% to 20.0%. Therefore we can conclude that the daily average of minutes of active transport to school at baseline is a confounder of the effect estimates in this mediation model.

3.3. Model with an interaction term for exposure-mediator interaction

Table 4 shows the coefficients with corresponding SEs for the mediation model assessing exposure-mediator interaction. As in the previous models, OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates. However, the SEs of the b coefficient, interaction coefficient, and indirect effect yielded by SEM were slightly smaller than in OLS regression and the potential outcomes framework.

Table 4.

Coefficients and standard errors (SEs) yielded by the three compared methods when including an exposure-mediator interaction term.

Effect estimate OLS regression SEM Potential outcomesa,b
Total effect c −0.17 (0.09) −0.17 (0.09) −0.17
a coefficient −0.44 (0.08) −0.44 (0.08) −0.44 (0.08)
b coefficient 0.05 (0.06) 0.05 (0.04) 0.05 (0.06)
Direct effect c −0.14 (0.09) −0.14 (0.09) −0.14
Interaction coefficient 0.02 (0.09) 0.02 (0.07) 0.02 (0.09)
Indirect effectc −0.02 (0.03) −0.02 (0.02) −0.02
Proportion mediated 11.7% 11.7% 11.7%

OLS: ordinary least square; SEM: structural equation modeling; SE: standard error.

a

The estimation of SEs for the indirect and total effect is not facilitated within the R package ‘mediation’.

b

The a, b, and interaction coefficient are derived from the mediator and outcome model that serve as input for the ‘mediate’ function in the R package ‘mediation’.

c

Sobel's SE is presented for the indirect effect estimated within OLS regression and SEM.

The effect estimates in Table 4 represent the effect estimates for the control group. So for the control group, the direct effect of the intervention on change in BMI was −0.14 and the indirect effect through the change in SBC was −0.02. The proportion of the total effect of the intervention on change in BMI mediated by SBC was 11.7%. To derive the effect estimates for the treatment group, the value of the interaction coefficient, 0.02, should be added to the b coefficient and direct effect c’. The direct effect for the treatment group is then −0.12 and the b coefficient 0.07. Consequently, the indirect effect for the treatment group is −0.03 (−0.44·0.07) and the proportion mediated 17.6% (−0.03/(−0.12 + −0.03)). However, the interaction coefficient for exposure-mediator interaction was non-significant within all three methods (p = 0.84 in OLS regression and the potential outcomes framework, p = 0.80 in SEM). It is therefore, in this data example eventually not necessary to report the results separately for the treatment and control group.

4. Discussion

In this study we showed that OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates for three mediation models with a continuous mediator and outcome variable: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. These results are supported by Iacobucci and colleagues [10] who showed the mathematical equivalence of the coefficients yielded by OLS regression and SEM, and Imai and colleagues [27] who provided proof for the equivalence of the coefficients yielded by SEM and the potential outcomes framework in mediation models with a continuous mediator and outcome variable.

With respect to the SEs for the effect estimates, and the 95% CIs for the indirect effect some differences were observed. First, in the model with an interaction term for exposure-mediator interaction, SEM yielded a smaller SE of the b coefficient, interaction coefficient, and indirect effect than the other two methods. Iaccobucci and colleagues [10] also found that the SEs in SEM are often smaller than in OLS regression. However, these differences are mostly very small, and dependent on the sample size of the study and the software used [25]. Therefore, these differences can safely be ignored. Second, small differences in the confidence limits were observed between the different types of 95% CIs. These differences can be explained by the fact that Sobel's CI does not take into account the possible skewed distribution of the indirect effect, whereas the other two CIs do take this into account [16]. It is therefore not advised to calculate Sobel's CI for the indirect effect.

To illustrate how mediation models can be adjusted for confounding, we adjusted the mediation model in this paper for one confounder. In practice it is important to be aware that one should not only consider exposure-outcome confounders, but also exposure-mediator confounders and mediator-outcome confounders. Although randomized allocation to the intervention might eliminate potential confounders of the intervention-outcome relationship and the intervention-mediator relationship, it is not able to eliminate mediator-outcome confounding [28]. Therefore, potential mediator-outcome confounders should always be considered during the analyses.

4.1. Causal steps method

Although still widely applied in the literature, we did not apply the causal steps method of Baron and Kenny [7] to the data example in this article. According to the causal steps method, the relationship between an exposure and outcome variable is mediated when the a, b, and c coefficients are all significant. Partial mediation occurs when the c’ coefficient is also significant, and full-mediation occurs when the c’ coefficient is non-significant. The first limitation of the causal steps method is that it relies heavily on the significance of the coefficients in equations (1), (2), (3). The second limitation of the causal steps method is that it does not provide an estimate of the indirect effect. A highly significant, but small indirect effect may have little clinical relevance, While a non-significant indirect effect may be clinically relevant, but may lack the statistical power to statistically justify this conclusion [29]. A CI reflects the degree of precision of an indirect effect by providing a range of possible population values for the indirect effect. It is therefore advisable to consider the clinical relevance and precision of the indirect effect with a CI, instead of its statistical significance.

4.2. Statistical mediation analysis of other data situations

In this paper, we compared the results yielded by OLS regression, SEM, and the potential outcomes framework based on mediation models with a continuous mediator and outcome variable. In practice other data situations might occur than the ones discussed in this paper. Some of these data situations are handled the same way by the three methods, while other situations are handled differently. Table 5 provides an overview of how OLS regression, SEM, and the potential outcomes framework handle these other data situations.

Table 5.

Overview of the way each method for statistical mediation analysis handles other types of data situations.

Situation Ordinary least square regression Structural equation modeling Potential outcomes frameworka
Handling of missing data Listwise deletion by default. Other missing data techniques can be applied manually. Listwise deletion by default. Full-information maximum likelihood is facilitated. Listwise deletion by default. Multiple imputation can be applied manually.
Inclusion of constructs measured by multiple variables As a sum score, factor score, or computed index. As a latent variable through factor analysis, controlling for measurement error. As a sum score, factor score or computed index.
Multiple mediator models Separate estimation the indirect through each mediator variable. Simultaneous estimation of all indirect effects in the mediation model. Provides an estimate of the total indirect effect through all mediator variables combined.
Dichotomous mediator and/or outcome variable Fit logistic regression models instead of OLS regression models.


Standardization of the coefficients
before estimating the indirect effect is advised when the mediator and outcome are both dichotomous.
Fit equations (1), (2), (3) as logistic regressions instead of linear regressions.


Standardization of the coefficients
before estimating the indirect effect is advised when the mediator and outcome are both dichotomous.
Replace OLS regression models with logistic regression models.


Only use when the outcome prevalence is lower than 10%.
Multilevel and longitudinal data Replace OLS regression models with multiple linear mixed models. Use multilevel SEM. Replace OLS regression models with multiple linear mixed models.

OLS: ordinary least square; SEM: structural equation modeling.

More information on the way the three methods handle these situations can be found in the references in the text.

a

Based on the way the R package ‘mediation’ handles these situations, which may deviate from the way the SAS, STATA, and SPSS macros handle these situations.

4.2.1. Handling of missing data

When there is missing data, most statistical software packages handle missing data by default with listwise deletion. However, when the missing data is not completely at random, listwise deletion will result in biased effect estimates [30]. When the missing data is at random, i.e. related to observed variables, multiple imputation and full-information maximum likelihood (FIML) produce unbiased effect estimates. Most software packages do facilitate the use of SEM in combination with FIML. However, missing data handling techniques are often not directly facilitated in combination with OLS regression and the potential outcomes framework, and should therefore be applied manually [19].

4.2.2. Inclusion of constructs measured by multiple variables

Sometimes multiple variables are used to measure several aspects of the same construct. For example, quality of life is a construct that is often measured with multiple variables. In OLS regression and the potential outcome framework, these constructs measured with multiple variables can only be included in the model as a sum score, factor score or as a computed index. In this case, the data will not be used to its full advantage [31]. Within SEM, the constructs measured by multiple items can be modeled as latent variables, in which control is made for measurement error. This is one of the major advantages of SEM.

4.2.3. Multiple mediator models

In many situations the relationship between an exposure and outcome variable is hypothesized to be mediated by more than one mediator variable. In these situations a multiple mediator model needs to be fitted to the data. When using the potential outcomes framework based on the R package ‘mediation’, only the overall indirect effect can be estimated [19]. In both OLS regression and SEM, mediator-specific indirect effects and the overall indirect effect can both be estimated. However, in OLS regression, the number of regression models will increase as the number of mediators increase, which reduces efficiency [31]. SEM is in this case more efficient, since all mediators can be included in the same model.

4.2.4. Dichotomous mediator and/or outcome variables

When the mediator is dichotomous, equation (2) (see Section 1) needs to be fitted with logistic regression, and when the outcome is dichotomous, equations (1), (3) (see Section 1) need to be fitted with logistic regression [6], [19]. It is advisable to standardize the coefficients yielded by multiple logistic regression and SEM before estimating the indirect effect when the outcome is dichotomous, due to the non-collapsibility of the odds ratio in logistic regression [6]. The potential outcomes framework only yields unbiased estimates of the indirect effect when the prevalence of the outcome is less than ten percent [20].

4.2.5. Multilevel and longitudinal data

An important assumption of the three methods compared in this paper is the independence of the observations [32]. This assumption is violated when the data has a multilevel or longitudinal structure. In that situation, the dependence among observations should be taken into account to avoid biased effect estimates. Instead of OLS regression, mixed models can be used to estimate a model based on multilevel or longitudinal data. When using the potential outcomes framework, the OLS regression models that serve as input for the potential outcomes framework also need to be replaced by mixed models [19]. When using SEM, multilevel SEM can be used to estimate mediation models based on multilevel or longitudinal data [32].

5. Conclusion

In this paper we demonstrated that OLS regression, SEM, and the potential outcomes framework yielded the same results when analyzing three mediation models with a continuous mediator and a continuous outcome variable: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Additionally, we discussed the way each method for statistical mediation analysis handles more other types of data situations, in order to support researchers with choosing the optimal method.

Conflict of interest

The authors declare that there is no conflict of interest.

Funding

This work was supported by the department of Epidemiology and Biostatistics of the VU University Medical Center.

Major category

Study Design, Statistical Design, Study Protocols.

Acknowledgements

We want to thank the board and participants of the Dutch Obesity Intervention in Teenagers Study for providing us their data for the real-life data example in this paper.

References

  • 1.Chalder T., Goldsmith K.A., White P.D., Sharpe M., Pickles A.R. Rehabilitative therapies for chronic fatigue syndrome: a secondary mediation analysis of the PACE trial. Lancet Psychiatry. 2015;2:141–152. doi: 10.1016/S2215-0366(14)00069-8. [DOI] [PubMed] [Google Scholar]
  • 2.Fletcher A., Wolfenden L., Wyse R., Bowman J., McElduff P., Duncan S. A randomised controlled trial and mediation analysis of the ‘Healthy Habits’, telephone-based dietary intervention for preschool children. Int. J. Behav. Nutr. Phys. Activity. 2013;10:1. doi: 10.1186/1479-5868-10-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sugiyama T., Steers W.N., Wenger N.S., Duru O.K., Mangione C.M. Effect of a community-based diabetes self-management empowerment program on mental health-related quality of life: a causal mediation analysis from a randomized controlled trial. BMC Health Serv. Res. 2015;15:1. doi: 10.1186/s12913-015-0779-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.MacKinnon D.P. Routledge; 2008. Introduction to Statistical Mediation Analysis. [Google Scholar]
  • 5.Judd C.M., Kenny D.A. Process analysis estimating mediation in treatment evaluations. Eval. Rev. 1981;5:602–619. [Google Scholar]
  • 6.MacKinnon D.P., Lockwood C.M., Brown C.H., Wang W., Hoffman J.M. The intermediate endpoint effect in logistic and probit regression. Clin. Trials. 2007;4:499–513. doi: 10.1177/1740774507083434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Baron R.M., Kenny D.A. The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 1986;51:1173. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
  • 8.Imai K., Keele L., Tingley D. A general approach to causal mediation analysis. Psychol. Methods. 2010;15:309–334. doi: 10.1037/a0020761. [DOI] [PubMed] [Google Scholar]
  • 9.Pearl J. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc.; 2001. Direct and indirect effects; pp. 411–420. [Google Scholar]
  • 10.Iacobucci D., Saldanha N., Deng X. A meditation on mediation: evidence that structural equations models perform better than regressions. J. Consum. Psychol. 2007;17:139–153. [Google Scholar]
  • 11.Singh A.S., Chinapaw M.J., Kremers S.P., Visscher T.L., Brug J., van Mechelen W. Design of the Dutch Obesity Intervention in Teenagers (NRG-DOiT): systematic development, implementation and evaluation of a school-based intervention aimed at the prevention of excessive weight gain in adolescents. BMC Public Health. 2006;6:304. doi: 10.1186/1471-2458-6-304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yιldιrιm M., Singh A., Velde S., Stralen M., MacKinnon D., Brug J. Mediators of longitudinal changes in measures of adiposity in teenagers using parallel process latent growth modeling. Obesity. 2013;21:2387–2395. doi: 10.1002/oby.20463. [DOI] [PubMed] [Google Scholar]
  • 13.Chin A., Paw M., Singh A.S., Brug J., van Mechelen W. Why did soft drink consumption decrease but screen time not? Mediating mechanisms in a school-based obesity prevention program. Int. J. Behav. Nutr. Phys. Activity. 2008;5:1. doi: 10.1186/1479-5868-5-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Seber G.A., Lee A.J. John Wiley & Sons; 2012. Linear Regression Analysis. [Google Scholar]
  • 15.MacKinnon D.P., Warsi G., Dwyer J.H. A simulation study of mediated effect measures. Multivar. Behav. Res. 1995;30:41–62. doi: 10.1207/s15327906mbr3001_3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hayes A.F., Scharkow M. The relative trustworthiness of inferential tests of the indirect effect in statistical mediation analysis does method really matter? Psychol. Sci. 2013;24:1918–1927. doi: 10.1177/0956797613480187. [DOI] [PubMed] [Google Scholar]
  • 17.Bollen K.A., Pearl J. Springer; Netherlands: 2013. Eight Myths about Causality and Structural Equation Models. Handbook of Causal Analysis for Social Research; pp. 301–328. [Google Scholar]
  • 18.VanderWeele T.J. Oxford University Press; 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. [Google Scholar]
  • 19.Tingley D., Yamamoto T., Hirose K., Keele L., Imai K. Mediation: R package for causal mediation analysis. J. Stat. Softw. 2014;59:1–38. [Google Scholar]
  • 20.Valeri L., VanderWeele T.J. Mediation analysis allowing for exposure–mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol. Methods. 2013;18:137. doi: 10.1037/a0031034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Team R.C.D. 2014. R: A Language and Environment for Statistical Computing. [Google Scholar]
  • 22.Rosseel Y. lavaan: an R package for structural equation modeling. J. Stat. Softw. 2012;48:1–36. [Google Scholar]
  • 23.Canty A., Ripley B. 2012. Boot: Bootstrap R (S-Plus) Functions. R Package Version; p. 1. [Google Scholar]
  • 24.Tofighi D., MacKinnon D.P. RMediation: an R package for mediation analysis confidence intervals. Behav. Res. Methods. 2011;43:692–700. doi: 10.3758/s13428-011-0076-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hayes A.F. Guilford Press; 2013. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-based Approach. [Google Scholar]
  • 26.Imai K., Keele L., Tingley D., Yamamoto T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am. Polit. Sci. Rev. 2011;105:765–789. [Google Scholar]
  • 27.Imai K., Keele L., Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat. Sci. 2010:51–71. [Google Scholar]
  • 28.MacKinnon D., Fairchild A., Fritz M. Mediation analysis. Annu. Rev. Psychol. 2007;58:593–614. doi: 10.1146/annurev.psych.58.110405.085542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gardner M.J., Altman D.G. Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ. 1986;292:746–750. doi: 10.1136/bmj.292.6522.746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Newman D.A. 2009. Missing Data Techniques and Low Response Rates. Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in the Organizational and Social Sciences; p. 7. [Google Scholar]
  • 31.Li S.D. Testing mediation using multiple regression and structural equation modeling analyses in secondary data. Eval. Rev. 2011;35:240–268. doi: 10.1177/0193841X11412069. [DOI] [PubMed] [Google Scholar]
  • 32.Preacher K.J., Zyphur M.J., Zhang Z. A general multilevel SEM framework for assessing multilevel mediation. Psychol. Methods. 2010;15:209. doi: 10.1037/a0020141. [DOI] [PubMed] [Google Scholar]

Articles from Contemporary Clinical Trials Communications are provided here courtesy of Elsevier

RESOURCES