Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: Multivariate Behav Res. 2019 Jun 10;55(2):188–210. doi: 10.1080/00273171.2019.1618545

Indirect Effects in Sequential Mediation Models: Evaluating Methods for Hypothesis Testing and Confidence Interval Formation

Davood Tofighi 1, Ken Kelley 2
PMCID: PMC6901816  NIHMSID: NIHMS1045929  PMID: 31179751

Abstract

Complex mediation models, such as a two-mediator sequential model, have become more prevalent in the literature. To test an indirect effect in a two-mediator model, we conducted a large-scale Monte Carlo simulation study of the Type I error, statistical power, and confidence interval coverage rates of 10 frequentist and Bayesian confidence/credible intervals (CIs) for normally and non-normally distributed data. The simulation included never-studied methods and conditions (e.g., Bayesian CI with flat and weakly informative prior methods, two model-based bootstrap methods, and two non-normality conditions) as well as understudied methods (e.g., profile-likelihood, Monte Carlo with maximum likelihood standard error [MC-ML] and robust standard error [MC-Robust]). The popular BC bootstrap showed inflated Type I error rates and CI under-coverage. We recommend different methods depending on the purpose of the analysis. For testing the null hypothesis of no mediation, we recommend MC-ML, profile-likelihood, and two Bayesian methods. To report a CI, if data has a multivariate normal distribution, we recommend MC-ML, profile-likelihood, and the two Bayesian methods; otherwise, for multivariate non-normal data we recommend the percentile bootstrap. We argue that the best method for testing hypotheses is not necessarily the best method for CI construction, which is consistent with the findings we present.

Keywords: indirect effect, confidence interval, sequential mediation, Bayesian credible interval


Theories hypothesizing and studies testing sequential mediation chains, in which two or more mediators are sequentially measured over time, have become prevalent across a variety of areas in psychology (e.g., Ato García, Vallejo Seco, & Ato Lozano, 2014; Bernier, McMahon, & Perrier, 2017; Deković, Asscher, Manders, Prins, & van der Laan, 2012; Koning, Maric, MacKinnon, & Vollebergh, 2015; Reh, Tröster, & Van Quaquebeke, 2018). We focus on the mediation model in Figure 1, which illustrates a sequential two-mediator chain. In particular, Figure 1 shows an empirical example in which there is a random assignment to drink-refusal training (X), which is hypothesized to improve resistance skills (M1), which is then hypothesized to reduce intention to drink alcohol (M2), which ultimately leads to reduced drinking following treatment (Y). Under a set of correct specification assumptions, including the assumption that there are no omitted variables that influence the posited variables in the mediation model and the mediators and outcome variable are continuously distributed, the magnitude of the specific indirect effect of X on Y through M1 and M2 is the product of the regression (path) coefficients, β1× β2 × β3 (VanderWeele, 2015).

Figure 1.

Figure 1.

A two-mediator sequential mediation chain in which the mediators are sequentially related. The model has one antecedent (independent) variable, X (drink refusal training), two sequential mediators, M1 (resistance skills) and M2 (intention to drink alcohol), and one outcome variable, Y (number of drinks per week). Rectangles show observed variables. An arrow between two variables indicates a linear regression effect of the variable on the left, on the other variable. Term β denotes a population coefficient (path) for a linear regression in which a dependent (endogenous) variable (e.g., an outcome variable or a mediator) is predicted by another endogenous variable or an independent (exogenous) variable. Term ε denotes a residual term for each dependent (endogenous) variable. Under the no-omitted-confounder assumption, a specific indirect effect of X on Y through M1 and M2 equals β1 β2 β3.

Two important outcomes of conducting a sequential mediation analysis are (a) the test of the null hypothesis of no indirect effect and (b) the confidence/credible interval (CI) for the population indirect effect. To evaluate the types of methods used to test indirect effects in sequential mediation analysis, we conducted a survey of published literature in several areas of psychology from 2017–2018 to investigate methods currently used and recommended (see the supplemental materials for details). In addition, we reviewed methodological journal articles and books (e.g., Falk & Biesanz, 2014; Fritz, Taylor, & MacKinnon, 2012; Hayes, 2013; MacKinnon, 2008; Preacher & Hayes, 2008; Shrout & Bolger, 2002; Taylor, MacKinnon, & Tein, 2008; Williams & MacKinnon, 2008) that advocate specific tests of indirect effects in sequential mediation analysis. Our survey identified several critical issues that have not been thoroughly addressed in the literature, concerning the test of an indirect effect in a sequential two-mediator model and CI formation for the population value of the indirect effect.

Among other things, our review highlighted a lack of comprehensive Monte Carlo simulation studies to evaluate six promising, but understudied methods of CI formation and for testing an indirect effect in sequential mediation model, single-mediator model, or both. These methods include two variations of Bayesian credible interval (Muthén & Asparouhov, 2012; Y. Yuan & MacKinnon, 2009), one with flat priors (Bayes-Flat; non-informative, uninformative) and one with weakly informative (Bayes-Weak) priors for regression coefficients (note that this is the default specification in the rstanarm package; Muth, Oravecz, & Gabry, 2018; Stan Development Team, 2018). Additionally, we evaluate other methods from the frequentist approach, such as (a) the profile-likelihood method (Neale & Miller, 1997; Pawitan, 2001; Pek & Wu, 2015), (b) the Monte Carlo CI with maximum likelihood (ML) standard errors method (MC-ML; MacKinnon, Lockwood, & Williams, 2004; Preacher & Selig, 2012; Tofighi & MacKinnon, 2016), (c) the robust Monte Carlo (MC-Robust) CI method, which is an extension of the MC-ML1, (d) a semi-parametric (model-based) bootstrap using Bollen-Stine bootstrap (BS; 1992), and (e) another semi-parametric bootstrap using the Yuan, Hayashi, and Yanagihara (YHY; 2007) method. To our knowledge, no Monte Carlo simulation study to date has examined the Bayes-Weak, BS, and YHY methods for any single-mediator or sequential mediation model. Previous Monte Carlo simulation studies have examined profile-likelihood CI, MC-Robust, and Bayes-Flat (Chen, Choi, Weiss, & Stapleton, 2014; Cheung, 2007; Falk, 2018; Falk & Biesanz, 2014) for single-mediator models; however, there is no published work for these methods in the context of a sequential mediation model.

Table 1.

Type I Error Rates for a Subset of Conditions where β1 = β2 and β3 = 0.

β1 = β2 N Bayes-F Bayes-W BC BCa BS MC-ML MC-Rob Percentile Profile YHY
Normality Condition
0.14 50 0.001 0.002 0.008 0.010 0.000 0.002 0.002 0.000 0.001 0.001
100 0.001 0.003 0.011 0.018 0.001 0.005 0.004 0.003 0.001 0.004
200 0.005 0.006 0.036 0.054 0.010 0.004 0.008 0.009 0.015 0.007
500 0.037 0.028 0.095 0.089 0.027 0.038 0.031 0.023 0.039 0.026
0.36 50 0.010 0.021 0.067 0.100 0.025 0.020 0.056 0.023 0.046 0.025
100 0.031 0.033 0.092 0.116 0.046 0.037 0.046 0.042 0.038 0.057
200 0.056 0.053 0.061 0.100 0.046 0.057 0.038 0.057 0.053 0.045
500 0.063 0.065 0.074 0.055 0.045 0.056 0.054 0.050 0.048 0.057
0.48 50 0.038 0.042 0.090 0.126 0.044 0.054 0.067 0.042 0.046 0.047
100 0.060 0.050 0.071 0.086 0.061 0.067 0.060 0.049 0.054 0.050
200 0.040 0.053 0.068 0.080 0.052 0.051 0.056 0.056 0.046 0.053
500 0.060 0.060 0.045 0.061 0.061 0.048 0.052 0.059 0.060 0.060
0.62 50 0.035 0.056 0.086 0.100 0.072 0.058 0.071 0.052 0.062 0.068
100 0.057 0.039 0.068 0.056 0.050 0.061 0.038 0.054 0.045 0.052
200 0.058 0.049 0.053 0.052 0.051 0.060 0.052 0.051 0.053 0.057
500 0.045 0.035 0.052 0.055 0.051 0.053 0.060 0.042 0.048 0.054
Moderate Non-normality Condition (skewness=2, kurtosis=7)
0.14 50 0.001 0.001 0.004 0.009 0.001 0.000 0.000 0.004 0.005 0.002
100 0.002 0.003 0.024 0.010 0.002 0.007 0.000 0.001 0.006 0.002
200 0.006 0.004 0.048 0.058 0.009 0.006 0.006 0.008 0.016 0.006
500 0.034 0.025 0.107 0.097 0.034 0.026 0.031 0.033 0.026 0.054
0.36 50 0.011 0.016 0.075 0.096 0.030 0.026 0.029 0.038 0.034 0.043
100 0.038 0.039 0.114 0.104 0.065 0.031 0.046 0.059 0.055 0.057
200 0.062 0.049 0.074 0.096 0.063 0.052 0.058 0.061 0.050 0.072
500 0.057 0.049 0.069 0.075 0.051 0.039 0.075 0.055 0.056 0.061
0.48 50 0.036 0.033 0.085 0.127 0.070 0.041 0.058 0.071 0.049 0.069
100 0.043 0.040 0.094 0.116 0.077 0.067 0.079 0.058 0.061 0.066
200 0.035 0.048 0.076 0.079 0.077 0.061 0.060 0.064 0.046 0.068
500 0.049 0.052 0.066 0.096 0.053 0.048 0.071 0.069 0.050 0.048
0.62 50 0.053 0.045 0.072 0.091 0.073 0.047 0.063 0.078 0.055 0.079
100 0.050 0.050 0.071 0.089 0.075 0.040 0.071 0.061 0.054 0.073
200 0.046 0.049 0.070 0.086 0.065 0.044 0.081 0.067 0.045 0.060
500 0.066 0.051 0.058 0.072 0.066 0.061 0.048 0.056 0.053 0.065
Extreme Non-normality Condition (skewness=3, kurtosis=21)
0.14 50 0.000 0.003 0.010 0.014 0.002 0.002 0.000 0.001 0.005 0.001
100 0.003 0.005 0.026 0.027 0.007 0.002 0.000 0.005 0.006 0.003
200 0.007 0.008 0.048 0.085 0.010 0.010 0.008 0.009 0.014 0.021
500 0.030 0.027 0.108 0.125 0.047 0.027 0.033 0.046 0.048 0.054
0.36 50 0.023 0.022 0.078 0.093 0.045 0.034 0.019 0.051 0.034 0.061
100 0.036 0.053 0.116 0.132 0.069 0.045 0.048 0.059 0.056 0.073
200 0.067 0.058 0.096 0.110 0.068 0.046 0.063 0.071 0.053 0.085
500 0.058 0.051 0.072 0.097 0.055 0.052 0.060 0.074 0.047 0.067
0.48 50 0.042 0.046 0.104 0.124 0.084 0.043 0.073 0.073 0.056 0.079
100 0.058 0.046 0.082 0.120 0.072 0.050 0.063 0.080 0.044 0.088
200 0.047 0.046 0.096 0.117 0.102 0.058 0.058 0.071 0.064 0.070
500 0.037 0.051 0.080 0.086 0.047 0.047 0.075 0.068 0.049 0.067
0.62 50 0.061 0.050 0.092 0.102 0.071 0.060 0.071 0.077 0.060 0.077
100 0.056 0.053 0.073 0.114 0.085 0.062 0.056 0.085 0.043 0.090
200 0.043 0.055 0.087 0.086 0.076 0.040 0.058 0.061 0.057 0.090
500 0.053 0.063 0.079 0.088 0.069 0.034 0.048 0.070 0.054 0.073

Note. BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap. Darkest shade of gray shows the inflate Type I error rate above Bradley’s (1978) upper limit (.075) while the lightest shade of gray shows the conservative Type I error rate below Bradley’s lower limit (.025). Medium gray (between light and dark gray) shows the accurate Type I error rate within the limit, [.025, .075].

The shape of the sampling distribution of the indirect effect in a sequential two-mediator model is different from that of the indirect effect in a single-mediator model. Recall that the indirect effect of a sequential two-mediator model is the product of three coefficients, whereas the indirect effect for a single-mediator model is the product of two coefficients. Because the performance of the methods to form an interval estimate depends on the shape of the sampling distribution of indirect effect, as well as the size of the parameters, and sample size, generalization of the statistical evaluation of these methods from a single-mediator to a sequential mediation model is premature. For example, Williams and MacKinnon (2008) concluded that the percentile, BC bootstrap, and MC-ML methods showed worse Type I error and coverage in a sequential two-mediator model than in a single-mediator model. Thus, because of the growing importance of sequential mediation, it is imperative to have a formal evaluation of the competing methods so that recommendations can be made to researchers.

In addition to the complications enumerated, the assumption of the normality of the residual terms, henceforth simply referred to as the assumption of normality, is often violated in psychological science data (Cain, Zhang, & Yuan, 2017; Micceri, 1989). When the assumption of normality is violated, ML estimates in large samples remain consistent, but are less efficient (Andreassen, Lorentzen, & Olsson, 2006; Olsson, Foss, Troye, & Howell, 2000). The standard errors for the model parameters and indirect effect estimates tend to be inconsistently estimated (Finch, West, & MacKinnon, 1997), and methods such as bias-corrected and accelerated (BCa) CI tend to show inconsistent coverage and inflated Type I error rate (Biesanz, Falk, & Savalei, 2010). Further, the likelihood-ratio test statistic might not have a chi-squared distribution for a smaller sample size, thus adversely impacting the performance of the chi-squared-based methods such as profile-likelihood CI and fit indices (West, Finch, & Curran, 1995). Even when the assumption of normality of residuals holds, the sampling distribution of an indirect effect is not normally distributed especially in smaller sample sizes and effect sizes (Craig, 1936; Springer & Thompson, 1966). All previous studies of sequential mediation model have evaluated the performance of tests of indirect effects with normal data (Taylor et al., 2008; Tofighi & MacKinnon, 2016; Williams & MacKinnon, 2008). For a single-mediator model, Finch et al. (1997) studied impact of various degrees of non-normality, “moderate” (skewness = 2, kurtosis = 7) and “extreme” (skewness = 3, kurtosis = 21), on standard error and bias of indirect effect, but not on CI coverage. Biesanz et al. (2010) studied the effect of moderate non-normality (skewness = 2, kurtosis = 7) of the outcome variable (Y), not the mediator (M), using the percentile and BCa bootstrap CI for performing a test of the null hypothesis. Using the latent mediator and outcome model, Falk (2018) studied the effect of non-normality (skewness = 1.98, kurtosis = 9.59, which is close to what the two previous studies considered “moderate” non-normality) of the indicators for the latent variables on the Type I error, power, and coverage of the percentile, BC, MC-ML, MC-Robust, and profile-likelihood methods. For a single-mediator model, the effect of non-normality on the following methods has not been considered: Bayes-Weak, Bayes-Flat, BS, and YHY.

The purpose of this article is to address these critical issues and to provide a solid foundation for making recommendations to researchers when interest concerns testing the null hypothesis of no sequential mediation and reporting a CI for the population indirect effect. To begin, we review 10 recently developed and existing methods for constructing a CI and testing indirect effects: (a) Bayesian credible interval with flat prior (Bayes-Flat), (b) Bayesian credible interval with weakly informative prior (Bayes-Weak), (c) Monte Carlo (parametric bootstrap) CI with ML standard error (MC-ML; MacKinnon et al., 2004; Preacher & Selig, 2012; Tofighi & MacKinnon, 2016), (d) robust Monte Carlo CI with Huber-White (Huber, 1967; White, 1980) standard errors (MC-Robust), (e) Bollen and Stine (BS; 1992) semi-parametric (model-based) bootstrap, (f) Yuan, Hayashi, and Yanagihara (YHY; 2007) semi-parametric (model-based) bootstrap, (g) profile likelihood (Neale & Miller, 1997; Pawitan, 2001; Pek & Wu, 2015), (h) percentile non-parametric bootstrap methods (Bollen & Stine, 1990; MacKinnon et al., 2004), (i) bias-corrected (BC) non-parametric bootstrap (Efron, 1987; MacKinnon et al., 2004; Shrout & Bolger, 2002), and (j) bias-corrected and accelerated (BCa) non-parametric bootstrap (Efron & Tibshirani, 1993). We then conduct a large-scale Monte Carlo simulation study examining the Type I error when there is no mediation, statistical power when mediation exists, and CI coverage for the 10 methods across combinations of sample sizes and values of regression coefficients for both multivariate normal and non-normal data. We focus on the two-mediator sequential model in Figure 1. Finally, we present an empirical example illustrating the application of the recommended frequentist methods. The empirical example is from a study by Sanchez et al. (2017) for which all materials and data are publicly available through Open Science Framework and can be accessed at (https://osf.io/g5fvw/). The code and detailed analysis results for the example is available in the supplemental materials. As mentioned earlier, not all these methods have been thoroughly examined in testing indirect effects in two-mediator sequential mediation chains for both multivariate normal and non-normal data. The results of the simulation study should help guide best practices for applications of sequential mediation models. We believe that this is one of the largest and most comprehensive Monte Carlo simulation studies evaluating sequential mediation models.

Tests of Indirect Effects in Sequential Mediation Models

Non-parametric Bootstrap

To compute a (1−α)100% CI for an indirect effect, denoted by θ = β1 β2 β3, the non–parametric bootstrap draws R repeated samples (i.e., bootstrap samples; R ≥ 1,000 is recommended, Shrout & Bolger, 2002) with replacement from the original data set (Bollen & Stine, 1990; Efron & Tibshirani, 1993). A mediation model is fitted to the original data to provide an estimate of the indirect effect, where θ^=β^1β^2β^3 denotes the ML estimate of the original sample. The indirect effect is also computed for each bootstrap sample resulting in θ1*, θ2*, , θR*, where θr* denotes the indirect effect estimate for the rth bootstrap sample, to approximate the sampling distribution of the estimated indirect effect and to compute a (1 − α) 100% CI for the population indirect effect. The percentile method uses α/2 and 1 − α/2 quantiles of the bootstrap samples to obtain the confidence limits.

Efron (1987) also proposed the bias-corrected (BC) bootstrap procedure to account for the median bias (difference between median and mean) of the bootstrap samples. In addition, bias-corrected and accelerated (BCa) bootstrap was proposed to correct for skewness and to yield more accurate coverage for smaller sample sizes (Chernick & LaBudde, 2011; Davison & Hinkley, 1997). Both methods compute adjusted percentiles α1 and α2 instead of α/2 and 1 − α/2. The adjusted percentiles α1' and α2 are then used to compute new confidence limits from the bootstrap sample. In both methods, the first step is to calculate the proportion of the bootstrap indirect effect estimates, denoted by p*, that are less than the original sample estimate θ^. Then, p* and α/2 quantiles of the standard normal distribution are obtained, and z0 = Φ−1(p*), zα/2 = Φ−1(α/2), and z1−α/2 = −zα/2 are computed, where Φ−1 denotes the inverse of the cumulative standard normal distribution function (e.g., = Φ−1 (.025) = −1.96). Note that z0 is an estimate of bias. Next, adjusted percentiles α1' and α2 are computed for each method. For the BC interval, the adjusted percentiles for the lower and upper CI limit are as follows: α1=Φ(2 z0 + zα/2) and α2=Φ(2 z0 + z1-α/2 ), where Φ is the cumulative standard normal distribution function. For the BCa interval, the adjusted percentiles are α1=Φ(z0+ z0 + zα/21-a (z0 + zα/2)) and α2=Φ( z0+ z0 + z1-α/2 1-a z0 + z1-α/2  ), where a is an “acceleration” constant that can be estimated during the bootstrapping process or using a jack-knife method and adjusts for skewness (Davison & Hinkley, 1997).

Parametric (Monte Carlo) Bootstrap

The MC-ML method, also known as the parametric bootstrap (Efron & Tibshirani, 1993), is a flexible method that can be extended to estimate CIs to test sequential mediation models (Tofighi & MacKinnon, 2016). To implement MC-ML, first the posited mediation model is estimated using the ML method. Then, R (≥ 1,000) random samples are drawn from a multivariate normal distribution whose mean equals the ML coefficient estimates from the fitted model and its covariance matrix equals the ML estimate covariance matrix of the coefficients. Monte Carlo sample of the indirect effect equals the product of the Monte Carlo sample of the coefficients that comprise the indirect effect. The limits of a (1- α)100% CI are the α/2 and 1-α/2 quantiles of the Monte Carlo sample of the indirect effects.

One variation of the Monte Carlo method that has not been studied for a two-mediator sequential model is MC-Robust (Falk, 2018). We will study the MC-Robust CI with robust Huber-White (Huber, 1967; White, 1980) standard errors (and robust covariance of the parameter estimates) to adjust for the potential non-normality of data. Falk (2018) studied MC-Robust with robust Satorra-Bentler (2010) standard error correction for single mediator model. The difference between MC-ML and robust MC-Robust is that the latter uses the robust estimates of standard errors (and covariances) of the parameter estimates rather than the default ML standard errors. “Robust standard errors are estimates of standard errors that are supposedly robust against non-normality.” (Kline, 2016, p. 238). To obtain the robust estimate of the standard errors, the ML estimate of the covariance of the parameter estimates is adjusted using Huber-White sandwich estimator to correct for potential non-normality. In the sandwich estimator, the “meat” is the correction matrix that is pre- and post-multiplied by the “bread,” which is the ML estimate of the covariance matrix (Huber, 1967; Savalei, 2014; White, 1980). As Freedman (2006) explains, “the sandwich algorithm, under stringent regularity conditions, yields variances for the MLE [maximum likelihood estimators] that are asymptotically correct even when the specification—and hence the likelihood function—are incorrect” (p. 302). Thus, it can be quite a useful way to approximate something that otherwise is unknown.

Model-Based (Semi-parametric) Bootstrap

One potential issue with the non-parametric bootstrap techniques is that the bootstrap samples are drawn from raw data without any consideration for the hypothesized mediation model. Bollen and Stine (1990) proposed a semi-parametric, model-based bootstrap, which is also known as Bollen-Stine (BS) bootstrap. In BS bootstrap, first the sample data is transformed to mimic the population data. Next, bootstrap samples are drawn from the transformed sample data. Then, each sample is used to estimate the model and obtain R bootstrap samples of the quantity of interest. Finally, a CI is obtained by finding the lower and upper quantiles of the bootstrap sample.

To discuss the transformation more specifically, let yi denote the p × 1 vector of observations for person i, let S be the sample covariance matrix, let Σ be the hypothesized “population” covariance matrix implied by the mediation model, and let Σ^ be the ML estimate of the hypothesized covariance matrix. The BS bootstrap data is transformed before resampling as follows: zi=  Σ^1/2S-1/2 yi; the superscript −1 denotes the inverse matrix and superscript ½ denotes a square root of the positive definite matrix, M, such that M1/2T M1/2=M. Note that the transformation is performed to ensure that the covariance matrix of the transformed data equals that of the estimated hypothesized population: covzi= Σ^. As a result, the transformation assumes an “exact” fit of data to the hypothesized population.

YHY method (K.-H. Yuan et al., 2007), an extension of BS bootstrap, was developed to accommodate an approximate fit between the sample and the hypothesized population model. That is, instead of using Σ^ to transform data, YHY method uses the following covariance matrix Sa=a S+1-a Σ^, 0<a<1, where a is a constant that is estimated through a numerical algorithm. Sa can be thought of as a weighted average between the sample covariance matrix and estimated population matrix. Data is transformed as follows: zi=  Sa1/2 S-1/2 yi. YHY method resamples the transformed data to achieve an approximate fit between the sample and hypothesized covariance matrix instead of the exact fit between the two.

Profile Likelihood

The profile-likelihood approach (Cheung, 2007; Meeker & Escobar, 1995; Pawitan, 2001; Pek & Wu, 2015) produces a CI using the likelihood function, which is the product of the likelihoods for each data point given a specified probability distribution. To compute a profile-likelihood CI, the maximum likelihood estimates are obtained, which is done by maximizing the logarithm of the likelihood function for the model (software does this). Let the vector θ contain the hypothesized mediation model parameters and L(θ) denote the likelihood function. The log-likelihood function is defined as LL(θ) = log L(θ) and the maximum of the log-likelihood function is denoted by LL1,

LL1=max LLθ=LLθ^ ,

where θ^ is the ML estimator of θ.

Next, the profile log-likelihood is formed by first assuming that the magnitude of the indirect effect is known, LL(ψ|IE), where IE = β1 β2 β3 stands for indirect effect; and then the profile log-likelihood function is maximized over the unknown “nuisance” parameters denoted by ψ:

LL0 IE=max LLψ | IE =LL ψ^0  IE),

where the nuisance parameters are the other parameters in the models that are not presented in computing the indirect effect quantities (e.g., β4 and β5). The function LL0(IE) is a profile log-likelihood function that depends on the fixed, but unknown values of IE. Note that the ML estimate ψ^0 depends on fixed but unknown values of IE. The profile log-likelihood function can be treated as any log-likelihood function. For example, one can compare the profile log-likelihood function to the original log-likelihood function, LL. The parameter space in a profile log-likelihood function is a subset of the original model parameter space because the value of indirect effect was assumed to be fixed. Asymptotically, the following expression has a chi-squared distribution with one degree of freedom (D. R. Cox & Hinkley, 2000):

-2LL0 IE-LL1~χ21.

Finally, the lower and upper bounds for the profile-likelihood (1-α)100% CI correspond to the minimum and maximum of the indirect effect that satisfy the following inequality: -2LL0IE-LL1χα2(1), where χα2(1) denotes upper α critical value of the chi-squared distribution with one degree of freedom.

Bayesian Approach

The Bayesian approach yields a credible interval for the product of coefficients using the posterior distribution of the indirect effect (Muthén & Asparouhov, 2012; Y. Yuan & MacKinnon, 2009). Each parameter has a prior distribution, which is the researcher’s belief about the distribution of the parameter before data collection. If there are prior estimates of the coefficients from previous studies (e.g., a meta-analysis), the estimates may be used to form prior distributions (called an informative prior). If there is no information available, one may use a distribution that carries vague or general information about the parameters (called a non-informative, uninformative, flat, or diffuse prior)2. A weakly informative (regularizing) prior is used to improve computational stability and inference about a parameter (McElreath, 2016). The Bayesian approach combines model parameter estimates from the current study with the prior distributions to estimate the posterior distribution of all model parameters. A posterior distribution is a conditional probability distribution that combines the prior distribution with the likelihood from the observed data. Posterior distribution of the parameter is used to compute an interval estimate for each of the parameters. The mechanism of combining the prior distribution and the observed data is known as Bayes’ theorem.

When an analytic approach to estimating a posterior distribution of the parameters is not available, the Bayesian approach simulates a random sample from the posterior distribution using Markov Chain Monte Carlo procedures (MCMC; Gilks, Richardson, & Spiegelhalter, 1998; Metropolis & Ulam, 1949). Consider the use of the Bayesian method to estimate the sequential mediation model shown in Figure 1. Random draws (usually in the thousands) from the posterior distribution of all the parameters in the sequential mediation chain are taken. The product of the corresponding coefficients in the indirect effect is computed for each random draw. To create a (1-α)100% credible interval, the quantiles that correspond to the lower and upper α/2 of the draws from the posterior distribution of the indirect effect are located. Note that MCMC describes a general family of techniques used to draw random samples from a posterior distribution. A few specific MCMC algorithms include Metropolis-Hastings (MH; Hastings, 1970; Metropolis & Ulam, 1949), Gibbs sampling (Geman & Geman, 1984), and Hamiltonian Monte Carlo (Duane, Kennedy, Pendleton, & Roweth, 1987; Neal, 2011). For our simulation study, we use the rstanarm package (Stan Development Team, 2018) within R, which implements Hamiltonian Monte Carlo algorithm. One advantage of Hamiltonian Monte Carlo compared to MH and Gibbs sampling algorithms is that it requires a fewer number of draws from the posterior distribution (McElreath, 2016).

In the simulation study, we consider two sets of priors for the coefficients that are available in rstanarm to estimate the Bayesian 95% credible interval: a weakly informative prior and a flat (non-informative) prior. A weakly informative prior is designed to “provide some information on the relative a priori plausibility of the possible parameter values, for example when we know enough about the variables in our model that we can essentially rule out extreme positive or negative values.” (Muth et al., 2018, p. 150). Weakly informative priors have been recommended because they could provide computational stability by regulating the range of the parameter values to prevent extreme values (McElreath, 2016; Stan Development Team, 2018). We assume the priors to be independent for each coefficient of the indirect effect such that p(β1, β2, β3) = p(β1) p(β2) p(β3), where p(β1, β2, β3) is the joint prior distribution. The weakly informative prior for each coefficient is a normal distribution with the mean of 0 and standard deviation of 2.5, N(0, 2.52), which is the default prior in rstanarm package. The standard deviation of the prior is automatically adjusted based on the actual range of the dependent variables to ensure that the rescaled prior is weakly informative (Gabry & Goodrich, 2018). We also considered a flat (non-informative) prior for the regression coefficients that assumes a wide range of positive and negative values to be equally likely for the coefficients. In rstanarm, the default flat prior for regression coefficients is a normal distribution with the mean of 0 and standard deviation of 10, N(0, 102); rstanarm rescales the standard deviation based on the actual range of the dependent variables in the model to ensure that the rescaled prior covers a wide range of parameter values (Gabry & Goodrich, 2018). Because of the large variance, the density over the most likely parameter values is approximately flat; this distribution presumably conveys little information about the coefficients and can be thought of as non-informative. For the residual standard deviations of the residual terms (σM1, σM2, & σY), we used the exponential distribution, denoted by exp(λ = 1), where λ is a rate parameter that equals one. Note that this is a default weakly informative prior in rstanarm. The parameter λ is also automatically rescaled based on the range of the dependent variables.

Simulation

The purpose of the Monte Carlo simulation study was to empirically assess the Type I error rate, statistical power, and coverage rates of 10 methods of constructing a 95% CI to test the indirect effect (H0: β1 β2 β3 = 0) for the two-mediator sequential model shown in Figure 1. Based on the review of previous simulation studies of mediation models, we manipulated the following four factors in a fully factorial design: (a) distribution, (b) coefficients, (c) sample size, and (d) method of testing (i.e., the 10 tests) the indirect effect. We describe the levels of each factor below.

Distribution.

Based on the previous simulation studies of different levels of non-normality, we considered three multivariate distributions in which we generated multivariate data to obtain three levels of skewness and kurtosis for each variable (Curran, West, & Finch, 1996; Finch et al., 1997; Nevitt & Hancock, 2001). We first considered a multivariate normal distribution that implies skewness = kurtosis = 0 for each variable, which represents an ideal condition in which all the methods are expected to show their optimal statistical properties. The second condition represents a “moderate” multivariate non-normal distribution with the marginal univariate skewness of 2 and kurtosis of 7 for each variable. The third condition corresponds to an “extreme” multivariate non-normal distribution with the univariate skewness of 3 and kurtosis of 21.

Coefficients.

The second factor manipulated in the Monte Carlo simulation study was the combination of three coefficients, β1, β2, and β3, which were computed using effect size values of semi-partial R2 for values of the endogenous variables (mediators and outcome variables). For the two- mediator model in Figure 1, there are three semi-partial R2s for the endogenous variables: RM12, RM22, and RY2. Following Thoemmes et al. (2010), we used these effect sizes to compute the corresponding population values of the β coefficients used to compute the indirect effects (i.e., β1 β2 β3). Previous simulation studies (e.g., Biesanz et al., 2010; Taylor et al., 2008) used Cohen’s (1988) guidelines on R2 effect sizes: 0.02 (small), 0.13 (medium), and 0.23 (large). In addition, our review of the effect sizes of mediation studies in the literature reported semi-partial R2 larger than 0.23, for example, 0.334 (Adamczyk, 2018) and 0.439 (Huertas-Valdivia, Llorens-Montes, & Ruiz-Moreno, 2017). Thus, we chose the following values for the coefficients: 0, 0.14, 0.36, 0.48, and 0.6. Note that to study the empirical Type I error, one or more of the β coefficients was set to zero while for the power study, none of the coefficients were zero. More specifically, for the Type I error simulation studies, we considered three conditions for the β coefficients: one coefficient equals zero where β1 ≠ 0, β2 ≠ 0, and β3 = 0; two coefficients equal zero where β1 ≠ 0 and β2 = β3 = 0; and all coefficients equal zero where β1 = β2 = β3 = 0. We did not consider other possible combinations such that one coefficient equals zero or two coefficients equal zero because our preliminary simulation studies indicated the results did not depend on the order of the coefficient being zero. For example, for the condition where one coefficient equals zero, preliminary simulation results were virtually the same for the following conditions: β1 ≠ 0 and β2 = β3 = 0, β2 ≠ 0 and β1 = β3 = 0, and β3 ≠ 0 and β1 = β2 = 0.

Sample size.

The third factor manipulated in the Monte Carlo simulation study was sample size (N), which took on the following values: 50, 100, 200, and 500 across all the other manipulated factors. These values of sample size bracket the most commonly used values of sample size seen in empirical studies in psychology and other behavioral sciences. The median, first quartile, and third quartile of the sample sizes from our literature were 209, 110.5, and 359, respectively. A sample size of 50 is generally too small to provide an adequate test of mediation. We used a sample size value of 50 as one of the smaller sizes found in our literature review (e.g., Graham, Martin Ginis, & Bray, 2017). A sample size of 200 roughly equals the median sample size used in the behavioral sciences in studies of regression and SEM (Jaccard & Wan, 1995; MacCallum & Austin, 2000), which is also close to the median sample size of 209 in our literature review. We chose N=500, which was the 85th percentile of the sample sizes in our literature review, as the upper limit because our preliminary simulation study results showed that sample sizes larger than 500 did not provide additional insight about the performance of the methods.

Method.

The fourth factor manipulated in the Monte Carlo simulation study, method, was the 10 methods of calculating a 95% CI test of an indirect effect.

Study Designs and Data Generation

The outcomes of simulation study were the Type I error rate, statistical power, and CI coverage. The Type I error rate was measured as the proportion of times the CI does not include zero and hence falsely rejects the null hypothesis of zero indirect effects, when the population value of the indirect effect is in fact zero. Statistical power was measured as the proportion of times a test correctly rejected the null hypothesis of zero indirect effect when the population value of the indirect effect is in fact non-zero. CI coverage is the portion of times the CI included the population value for the indirect effect. For the Type I error rate assessment, at least one of the coefficients must be zero (i.e., no mediation), which resulted in studying a total of 6,120 conditions. For statistical power, none of the three coefficients are zero, which resulted in 7,680 combinations of non-zero coefficients, distribution, sample size, and method. For the coverage study, we thus considered all 13,800 conditions. We know of no other Monte Carlo simulation study on mediation that examined as many conditions. Thus, the findings we report are the most comprehensive that we are aware of.

Consistent with previous simulation studies of a two-mediator sequential model (e.g., Cheung, 2007), we chose a model in which the relationship between the independent variable on the outcome variable was fully mediated through both M1 and M2; we assumed the following coefficients to be zero in Figure 1, β4 = β5 = β6 = 0. Because we use semi-partial R2 to generate data, zero versus non-zero values of the coefficients β4, β5, and β6 do not change the results in terms of the statistical properties of the tests of indirect effect (Williams & MacKinnon, 2008), which also were supported by our pilot simulation study. We generated data using the population model in Figure 1 based on the combinations of the β coefficients, sample sizes, and skewness and kurtosis values as mentioned earlier. We used simulateData function in the lavaan (Version 0.6–1) package (Rosseel, 2012) to generate multivariate normal and non-normal data such that each variable has the specified skewness and kurtosis value (Vale & Maurelli, 1983). The independent variable (X) had a standard normal distribution. The intercepts (not depicted in Figure 1) were all set to zero. For each combination of factors (i.e., Distribution × β × N × Method), 1,000 independent replication datasets were generated.

For each replication dataset, we estimated the two-mediator model in Figure 1 in which β4, β5, and β6 were constrained to zero. For percentile, BC, BS, and YHY bootstrap, we used lavaan built-in functions with 1,000 bootstrap samples as recommended by Shrout and Bolger (2002). For BCa method, we first estimated 1,000 bootstrap samples in lavaan and then used the boot package (Canty & Ripley, 2017) in R to compute the confidence limits. To calculate MC-ML and MC-Robust CIs, we first estimated the mediation model in lavaan and OpenMx with regular ML standard error and robust Huber-White standard error, respectively. The parameter estimates and their covariance matrices (ML and Huber-White method) from the estimated mediation models were input into the ci function in the RMediation (Version 1.1.4) package. We used R = 100,000 Monte Carlo samples to compute the Monte Carlo CIs, which assured a minimum desired accuracy of 0.00001.3 OpenMx (Version 2.9.6) was used to compute a profile-likelihood CI for the indirect effects (Neale et al., 2016).

For Bayesian credible intervals, we used two sets of priors for the regression coefficients that are available in rstanarm (Stan Development Team, 2018): Bayes-Weak, where the weakly informative prior is N(0, 2.52), and Bayes-Flat where the flat prior is N(0, 102). Then, as previously noted, rstanarm rescales the standard deviations based on the actual range of values of the endogenous variables to ensure that the rescaled prior is weakly informative and non-informative, respectively (Gabry & Goodrich, 2018). For the standard deviations of the residual terms we used the default weakly informative prior, which is the exponential distribution, exp(λ = 1), in rstanarm; the rate parameter λ is automatically rescaled based on the range of the endogenous variables to make the prior weakly informative.

Results

Because of the large number of conditions, to save space we only show a subset of the results. More tables and figures are shown in the supplemental materials. Nevertheless, the results we present are indicative of the general set of results.

Type I Error Rate and Accuracy

Multivariate normal distribution.

Table 1 shows the Type I error rates for a subset of parameters β1=β2, and β3 = 0 (see the supplemental materials for more tables as well as the trellis plots showing trend in the Type I error rate for different sample sizes). To assess the accuracy of the Type I error rates, we use Bradley’s (1978) liberal criterion, .025 and .075. The darkest shade of gray shows the inflated Type I error rate above Bradley’s upper limit, whereas the lightest shade of gray shows the conservative Type I error rate below Bradley’s lower limit. Medium gray shade (between light and dark gray) shows the accurate Type I error rate within the limit. It appears that except for BC and BCa, all the other methods exhibit comparable Type I error rates. BC and BCa showed inconsistent Type I error rates with multiple instances below and above Bradley’s criteria for N ≤ 200. For N = 500, BC and BCa both showed inflated Type I error rates for smaller values of non-zero βs. All other methods tend to become more accurate as the sample size and magnitude of non-zero βs increases. The stacked bar-graph in Figure 2 provides the proportion of times a method showed inflated, accurate, and conservative Type I error rates according to Bradley’s criterion. Except for BC and BCa, all the other eight methods provide comparable performance in terms of the highest proportion of accurate Type I error rates and the lowest proportion of inflated Type I error rates. As the sample size and magnitude of non-zero coefficients increased, the eight methods tend to become more accurate and less conservative. For N = 50, the BC (BCa) bootstrap showed the worst performance with the highest inflation rate of 31% (50%) and lowest accuracy of 38% (25%) compared to the other methods. For N = 100, 200, and 500, the BC (BCa) bootstrap method showed inflated Type I error percentages of 38% (44%), 12% (38%), and 19% (38%), respectively. When two of the coefficients were zero with another coefficient non-zero, such as when β2 = 0, β3 = 0, and β1 ≠ 0, all 10 methods were conservative (e.g., empirical Type I error rate around .0025 instead of the nominal Type I error rate of .05), showing the Type I error rate below the lower limit of Bradley’s interval.

Figure 2.

Figure 2.

Stacked bar graphs showing accuracy of the Type I Error rate proportions to test an indirect effect, β1 β2 β3, where β3 = 0. Marginal proportions are calculated for each combination method and sample size averaging across all combinations of non-zero βs factors (4 × 4 × 1,000 replications). Using Bradley’s (1978) robustness interval for α = .05, [.025, .075], marginal proportion of inflated Type I errors is the proportion of the simulation replications for each combination of method and sample size that exceeds 0.075. Marginal proportion of accurate Type I errors is the proportion of the replications that fall within Bradley’s interval. Marginal proportion of conservative Type I errors is the proportion of the replications that are less than .025. BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap.

Multivariate non-normal distribution.

For a subset of conditions, Table 1 also shows the Type I error rate for the moderate multivariate non-normality condition (skewness = 2, kurtosis = 7) and extreme non-normality condition (skewness = 3, kurtosis = 21), respectively (see the supplemental materials for additional tables and graphs). Compared to the normal condition results, BC and BCa showed more instances of inflated Type I error rates for the moderate multivariate normality condition, which got worse for the extreme normality condition. For the moderate multivariate non-normality condition, MC-Robust, percentile, BS, and YHY methods showed multiple instances of inflated Type I error rates; the frequency of the inflated Type I error rate increased for the extreme non-normality condition. The rest of the methods provided comparable Type I error rates across the conditions. As the sample size and magnitude of non-zero coefficients increased, all methods except for BC and BCa tend to become more accurate. Similar to the multivariate normal condition results, when two of the coefficients were zero, β2 = β3 = 0 and β1 ≠ 0, all 10 methods were conservative, showing a Type I error rate below the lower limit of Bradley’s (1978) interval.

Statistical Power

We present power results for a subset of conditions where β1 = β2 = β3 in Table 2 with horizontal data bars. Data bars are similar to a bar chart in that each bar represents a relative height equal to the power value in a cell (see the supplemental materials for more results). Data bars make it easier to compare ranges of values. We do not discuss BC and BCa bootstrap further in the power study because they did not meet the necessary condition of showing an accurate Type I error rate when the null hypothesis was true, thus violating the principles of statistical hypothesis testing (Davison, 2003; Lehmann & Romano, 2005).

Table 2.

Power to Detect Indirect Effect for a Subset of Conditions where β1 = β2 = β3.

graphic file with name nihms-1045929-t0001.jpg

Note. BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap.

Multivariate normal distribution.

For N ≥ 200 and βs ≥ .36, the power for all eight methods exceeded .96. In addition, as the sample size and size of the indirect effect increased, power for all methods appeared to increase or to remain the same (note that there is a ceiling effect for power). Also, for larger sample sizes and effect sizes, difference in power between the methods tended to decrease. For the small effect size (β1 = β2 = β3 = 0.14), one needs at least 500 observations to achieve a minimum power of .64 with either of the Bayesian methods and a maximum power of .70 with the profile-likelihood method; maximum power for N = 200 was .14 with the profile-likelihood method. The difference between the eight methods did not exceed .09. The maximum difference of .09 occurred for the medium effect size (β1 = β2 = β3 = 0.36) and N = 50 with the maximum of .46 for the profile-likelihood method and the minimum of .37 for the two Bayesian methods and BS bootstrap.

Multivariate non-normal distribution.

For the moderate multivariate non-normality condition (skewness = 2 and kurtosis = 7), the largest difference in power between the eight methods was .14, which occurred for medium effect size (β1 = β2 = β3 = 0.36) and N = 50; for this condition, the profile-likelihood method had a maximum power of .42 and the MC-ML method had a minimum power of .28. The second largest power difference was .13, which occurred for N = 100 and the medium effect sizes; for this condition, the MC-ML method had a minimum power of .76 while the percentile bootstrap, BS, and YHY methods had a maximum power of .88. For the extreme non-normality condition (skewness = 3 and kurtosis = 21), the difference in power between the methods increased compared to moderate multivariate non-normality and normality conditions. The largest power difference was .33, which occurred for medium effect size and N = 100; the MC-ML had a minimum power of .61 and the percentile and YHY methods had a maximum power of .93.

Coverage

Table 3 shows coverage values for the 10 methods for a subset of values for β-coefficients (β1 = β2 = β3) and sample sizes for multivariate normality, moderate multivariate non-normality, and extreme multivariate non-normality conditions, respectively. To facilitate interpretation, we use Bradley’s (1978) criterion to identify under-coverage (<.925), over-coverage (>.975), and accurate coverage (between .925 and .975). Cells with an upward pointing arrow indicate over-coverage while a downward pointing arrow indicates under-coverage; cells with no arrows indicate accurate coverage. Also, Figures 35 show jittered dot plots of coverage values for all 10 methods as a function of sample sizes, which are collapsed (averaged) across coefficients values, for multivariate normality, moderate multivariate non-normality, and extreme multivariate non-normality conditions, respectively. The dot plots show the distribution of the coverage values and facilitate understanding of how coverage values are distributed in relation to the Bradley’s interval, whose limits are drawn with solid lines in each plot. In discussing coverage, under-coverage is considered the less ideal outcome compared to over-coverage (Falk & Biesanz, 2014). Thus, ideally, confidence/credible interval must show empirical coverage that equals or exceeds the nominal value of 1-α (.95) while exhibiting low frequency of under-coverage. Also, there are different degrees of under-coverage in terms of how far coverage falls below the lower limit of Bradley’s (1978) criterion.

Table 3.

Coverage of 95% Intervals for a Subset of Conditions where β1 = β2 = β3.

graphic file with name nihms-1045929-t0002.jpg

Note. BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS= Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap. Upward arrow and italic font show over-coverage (> .975); downward arrow and bold font show under-coverage (< .925).

Figure 3.

Figure 3.

Jittered dotplot of coverage for multivariate normal condition. BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap. Solid horizontal lines show the limits of Bradley’s (1978) interval, [.925, .975].

Figure 5.

Figure 5.

Jittered dotplot of coverage for the extreme multivariate non-normality condition (skewness = 3, kurtosis = 21). BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap. Solid horizontal lines show the limits of Bradley’s (1978) interval, [.925, .975].

Multivariate normal distribution.

As shown in Table 3 and Figure 3, when N = 50, the profile likelihood and both Bayesian methods showed the best coverage in terms of accuracy and no under-coverage while BC and BCa frequently showed under-coverage with coverage values of .90 and .89, respectively. For N = 100, the profile-likelihood, MC-ML, and both Bayesian methods showed the best coverage followed closely by the percentile, BS, and YHY bootstrap. BC and BCa showed under-coverage, with the lowest coverage of .89 and .87, respectively; however, compared to N = 50, the occurrence of under-coverage was less frequent. For N = 200, all 10 methods exhibited comparably accurate coverage. Overall, for normal condition, the profile-likelihood and Bayesian methods showed the best coverage followed by the MC-ML, percentile, BS, and YHY methods. We do not recommend BC and BCa, especially for N ≤ 100.

Multivariate non-normal distribution.

For both multivariate non-normality conditions (Table 3 and Figures 45), we arranged the methods into three groups sorted from best to worst coverage: (a) percentile bootstrap, YHY, and BS method; (b) BC, BCa, and MC-Robust; (c) profile-likelihood, MC-ML and two Bayesian methods. For both multivariate non-normality conditions, the methods in the first group showed the best coverage although coverage was worse compared to the normality condition for N = 50 and 100. These three methods showed comparable coverage within and across the multivariate non-normality conditions. The minimum coverage for the percentile, YHY, and BC method ranged from .89 to .90 for both multivariate non-normality conditions. In the second-best group, for moderate multivariate non-normality condition, all three methods showed comparable coverage that tended to improve as sample size, size of the indirect effect, or both increased. The minimum coverage for the methods ranged from .85 to .86. For the extreme multivariate non-normality condition, the coverage for BC and BCa remained the same. However, coverage of the MC-Robust degraded by 3% on average. In the third group, all four methods showed the lowest coverage. For the moderate multivariate non-normality condition, the minimum coverage ranged from .81 to .82; for the extreme multivariate non-normality condition, the minimum coverage ranged from .67 to .68. Poorer performance could be because these four methods rely more heavily on multivariate normality distribution of the coefficient estimates without any adjustment for non-normality. In addition, an interesting result was that coverage for these methods got worse for the larger sizes of indirect effect, sample size, or both; one possible explanation for the poor coverage could be as the sample size increased, the standard errors decreased, and CIs became narrower, thus coverage worsened.

Figure 4.

Figure 4.

Jittered dotplot of coverage for the moderate multivariate non-normality condition (skewness = 2, kurtosis = 7). BC = Bias corrected bootstrap; BCa = Bias corrected and accelerated bootstrap; BS = Bollen-Stine semi-parametric bootstrap; MC-Rob = Monte Carlo with robust standard errors; MC-ML = Monte Carlo with ML standard errors; Bayes-F = Bayesian credible interval with flat (non-informative) prior; Percentile = Percentile bootstrap; Profile = Profile-likelihood; Bayes-W = Bayesian credible interval with weakly informative prior; YHY = Yuan-Hayashi-Yanagihara semi-parametric bootstrap. Solid horizontal lines show the limits of Bradley’s (1978) interval, [.925, .975].

Empirical Example

The empirical example is part of a study by Sanchez et al. (2017), for which all of the data and study materials are available to the public via the Open Science Framework (please follow the link https://osf.io/g5fvw/ to access the materials). Sanchez et al. conducted five studies of stigma by transfer with different stigmatized group (e.g., White women). Stigma by transfer means that a member of the stigmatized group is more likely to view racists as sexists and vice versa. We focus on one of the mediation models in Study 1 (there are 5 studies), in which the authors examined indirect effects of viewing the profiles of racist individuals (Treatment=Racist profile vs control) on gender identity threat (Gender Stigma) via the two sequential mediators (a) perceived social dominance orientation (Perceived SDO) and (b) perceived sexism (Perceived Sexism), as shown in Figure 6.

Figure 6.

Figure 6.

A sequential two-mediator model. The independent variable, Treatment, denotes a random assignment that takes on 1 (Racism), or 0 (Control). The two sequential mediators are perceived social dominance orientation (SDO) and Perceived Sexism. The dependent (outcome) variable is Gender Stigma. The variable Liking is a covariate. The quantity of interest is the indirect effect of Treatment on Gender Stigma that sequentially transmits through both Perceived SDO and Perceived Sexism controlling for the effect of Liking. Under the no-omitted-confounder assumption, the specific indirect effect of Treatment on Gender Stigma through Perceived SDO and Perceived Sexism equals the product of three coefficients, b1b2b3

The data from Study 1 consists of a subset of participants, where Study 1 uses only the Females participants (N=100). The female participants were randomly assigned to view responses to the Modern Racism Scale and the Old Fashioned Racism Scale (McConahay, 1986) from an individual with evidence of a “moderate” racist attitude (Racism condition) and to the neutral profile (e.g., with responses to personality measures) with no evidence of a sexist or a racist attitude (Control condition). To measure Perceived Sexism, participants responded to question on a 5-item scale (e.g., “How likely is it that this person treats women fairly?”) to evaluate the profiled person, in which 1 indicated “very slightly or not at all” and 5 indicated “extremely or a lot” (α = .967).

To measure Perceived SDO, the participants completed a 16-item SDO scale (Pratto, Sidanius, Stallworth, & Malle, 1994) as the profiled person would have done (α = .979). Response to each item (e.g., “Some groups of people are simply inferior to other groups.”) ranged from 1 (very negative or strongly disagree) to 7 (very positive or strongly agree). To measure Gender Stigma (α =.974), respondents answered the question about the profile person, “How much would you be concerned that this person would judge you based on the following characteristics?”, where the characteristics were “My gender”, “My sex”, and “My being a woman”. The answers ranged from 1 indicating “not at all” to 7 indicating “a great deal”. Finally, to measure Liking, a 3-item scale was used, with an example question, “If you were in a room with this person, would you have a lot of things to talk about?” (α= .778). The answers ranged from 1 indicating “very slightly or not at all” to 5 indicating “extremely or a lot”. For Perceived SDO, Perceived Sexism, and Liking, composite scores of the respected scales were used in the final analysis.

We fit three ordinary least-squares (OLS) regression equations corresponding to the dependent variables (two mediators and one outcome variable) in Figure 6. The OLS regression allows us to compute case residuals,4 which we examined using plots (e.g., qq plot) as well as testing for multivariate normality using Henze and Zirkler (HZ; 1990) method, which has been recommended in the literature (Mecklin & Mundfrom, 2005). We also check for univariate normality of the residuals because it is a necessary condition for multivariate normality. Case residuals can also be used to check for outliers using t-test with Bonferroni adjusted p-value (Cohen, Cohen, West, & Aiken, 2003; Fox, 2016). Skewness (kurtosis) for residuals associated with SDO, Perceived Sexism, and Gender Stigma was –0.9 (1.7), 0.2 (0.2), and –1.1 (3.9), respectively. Mardia’s (1970) multivariate measures of skewness and kurtosis for the residuals were 2.57 and 21, respectively. The result of the HZ test, statistic= 1.307, p =.001 indicates that we reject the hypothesis of multivariate normality because the p-value is very small. We flagged two observations as outliers using t-test of the studentized residuals for observation 35, Bonferroni p <.001, and observation 19, Bonferroni p =.036. We removed the two observations from the data and refit the regression equations5. Skewness (kurtosis) for the new residuals associated with SDO, Perceived Sexism, and Gender Stigma was –0.7 (1.4), 0.3 (0.2), and 0.02 (–0.3), respectively. As can be seen, removing the outliers helped reduce (the absolute value of) skewness and kurtosis of the residuals for Gender Stigma. Outliers can be one reason for non-normality. The result of the HZ test, a test statistic = 1.042, p = 0.03, indicated we reject the hypothesis of multivariate normality. Examining the skewness and kurtosis values, as well as looking at the qq plots of residuals along with p-value = .03, it is unclear if the evidence against violation of the multivariate normality is as strong as before when the two cases were included. As a result, in conducting mediation analysis, we would consider two scenarios: one scenario where the multivariate normality seems reasonable (i.e., where we did not reject that null hypothesis of normality) and one scenario where the multivariate normality is violated (i.e., where we reject the null hypothesis of normality).

To compute the 95% CIs for the model in Figure 6, we use lavaan and OpenMx, each of which have built-in functions to estimate bootstrap and profile-likelihood for an indirect effect, respectively. The results are shown in Table 4. To conduct mediation analysis, our recommendation is based on the purpose (significance testing, reporting an interval estimate, and reporting a model fit) of the mediation analysis and the assumption about the distribution of data (see the Discussion section for additional detail). If one were to assume that the assumption of multivariate normality is not violated, we would use the profile-likelihood, MC-ML, Bayes-Flat or Bayes-Weak methods to conduct both significance testing and to compute CI. We compute the profile likelihood 95% CI: [0.25 0.88] and MC-ML 95% CI: [0.24, 0.87]). Based on the CIs, it appears that indirect effect is different from zero at α = .05. Also, it appears that the indirect effect ranges from 0.24 to 0.87 using the MC-ML CI. However, if a researcher were to err on the side of caution and assume violation of the multivariate normality, she would use the profile-likelihood or MC-ML method for significance testing, but the percentile bootstrap 95% CI [0.23, 0.90] for the interval estimate. In this case, it appears that the indirect effect ranges from .24 to 0.87. Note that the percentile bootstrap CI is wider than either profile-likelihood or MC-ML method because it is non-parametric and thus does not assume a specific distribution about the data or the residuals.

Table 4.

Estimates for the Two-Mediator Sequential Model of the Empirical Example (N=100)

Variables 95% CI
Dependent Predictor Estimate SE z p LL UL Semi-partial R2
Perceived SDO
Treatment 1.51 (b1) 0.27 5.50 <.001 0.97 2.04 0.162
Liking −0.71 0.15 −4.88 <.001 −1.00 −0.43 0.128
Perceived Sexism
Perceived SDO 0.26 (b2) 0.06 4.64 <.001 0.15 0.37 0.086
Treatment 0.11 (b4) 0.17 0.66 0.51 −0.23 0.45 0.002
Liking −0.52 0.09 −5.80 <.001 −0.69 −0.34 0.135
Gender Stigma
Perceived Sexism 1.31 (b3) 0.18 7.48 <.001 0.97 1.65 0.194
Perceived SDO 0.05 (b5) 0.11 0.51 0.61 −0.16 0.26 0.001
Treatment 0.18 (b6) 0.30 0.59 0.56 −0.41 0.77 0.001
Liking −0.17 0.18 −0.94 0.35 −0.52 0.18 0.003

Note. SDO= social dominance orientation; LL=lower limit; UL= upper limit; Treatment= 1 (Racism), 0 (Control).

Discussion

We conducted a large-scale, comprehensive simulation study to evaluate the Type I error rate, statistical power, and coverage of 10 emerging and existing confidence/credible intervals to test an indirect effect in a two-mediator sequential model: (a) Bayesian credible interval with flat prior (Bayes-Flat), (b) Bayesian credible interval with weakly informative prior (Bayes-Weak), (c) Monte Carlo CI with the ML standard errors (MC-ML), (d) Monte Carlo CI with robust Huber-White (Huber, 1967; White, 1980) standard errors (MC-Robust), (e) Bollen and Stine (BS) bootstrap, (f) Yuan, Hayashi, and Yanagihara (YHY) bootstrap, (g) profile likelihood, (h) percentile bootstrap, (i) bias-corrected (BC) bootstrap, and (j) bias-corrected and accelerated (BCa) bootstrap. A wide range of conditions including sample sizes, regression coefficients, and multivariate normal and non-normal data based on our survey of the published literature were examined in the Monte Carlo simulation study.

For ideal conditions, when the data had a multivariate normal distribution, one key finding was that when N = 50, the profile likelihood and both Bayesian methods showed the best coverage and accurate Type I error rates while for N ≥100 all methods except for BC and BCa bootstrap showed comparable performance. Popular BC and BCa bootstrap methods frequently showed under-coverage and inflated Type I error rates for tests of indirect effects, especially for smaller sample sizes. The BCa bootstrap reached the maximum Type I error rate of 12.6% when β1 = β2 = 0.48, β3 = 0, and N = 50 while the BC bootstrap showed the maximum Type I error rate of 9.8% when β1 = 0.48, β2 = 0.62, β3 =0, and N = 50. The lowest coverage for BC and BCa was 87.4%, which occurred when β1 = 0.36, β2 = 0.62, β3 = 0.14, and N=50 for BC and β1 = β2 = β3 = 0.14, and N=100 for BCa. All methods except for BC and BCa showed comparable power across conditions. We also considered two multivariate non-normality conditions: moderate multivariate non-normality (skewness = 2, kurtosis = 7) and extreme multivariate non-normality (skewness = 3, kurtosis = 7). For both multivariate non-normality conditions, profile-likelihood method, MC-ML, Bayes-Flat, and Bayes-Weak showed the most accurate Type I error rate followed by MC-Robust, percentile bootstrap, YHY, and BC, which showed multiple instances of inflated Type I error rate; power for all methods was comparable across conditions. In terms of coverage, however, the best performing methods were the percentile bootstrap, YHY, and BS method, followed by BC, BCa, and MC-Robust method. The profile-likelihood, MC-ML, and both Bayesian methods showed under-coverage for moderate multivariate non-normality condition that worsened for extreme multivariate non-normality condition.

The results of our simulation study complement previous studies. Because of the inflated Type I error rates and under-coverage, we do not recommend BC and BCa. Our recommendation is consistent with the recent studies of BC and BCa for a single-mediator model (Biesanz et al., 2010; Falk, 2018; Falk & Biesanz, 2014; Hayes & Scharkow, 2013). For example, Hayes and Scharkow’s (2013) recommended the percentile bootstrap as a compromise between liberal BC bootstrap when mediation occurs and the MC-ML or the distribution-of-the-product approach (MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002) when mediation does not occur. Of course, knowing when mediation occurs or not is not straightforward. However, our recommendation is inconsistent with the recommendation from the previous studies of sequential two-mediator model. Williams and MacKinnon (2008) and Taylor et al. (2008) studied the BC bootstrap CI for normally distributed data, and recommended BC bootstrap; however, these two studies averaged the Type I error rate and coverage across several conditions. The Type I error rates and coverage for specific combination of factors (that were not averaged or aggregated), such as the maximum Type I error rates and the minimum coverage rate, were not reported. Thus, the severity of the inflated Type I error rate and under-coverage rate were not fully explored. For example, for N = 50, Taylor et al. reported the highest mean (average across conditions) Type I error rate for BC to be .074, which is within Bradley’s interval, while in our simulation study the maximum Type I error rate for N = 50 was .098, which is outside of Bradley’s interval. Similarly, when only one of the coefficients was zero, Williams and MacKinnon reported mean Type I error rates, averaged across coefficient values, to be .08093 (N = 50), .07820 (N = 100), and .07860 (N = 200). By comparison, the maximum Type I error rates in our simulation study were .098 (N = 50), .092 (N = 100), and .089 (N = 200). Had we used the average Type I error rate, then BC and BCa would have shown accurate Type I errors across sample sizes. However, the average Type I error rates mask the severity of inflation of the Type I error rates as well as frequency of the inflation of the Type I error rates, as shown in Figure 2.

One important consideration regarding the Bayesian methods considered in the current study is that both likelihood and priors were based on multivariate/univariate normal distributions. Although not in the context of mediation models, according to Zhang (2016), one possible reason that the Bayesian methods did not perform well under multivariate non-normality conditions is that both likelihood and prior distributions were multivariate/univariate normal distributions. Zhang showed that using a non-normal distribution to model error terms in latent growth curve models would improve efficiency of standard error estimates for the model parameters. Future studies needed to further study the effect of using univariate/multivariate non-normal distributions on the performance of the Bayesian credible intervals for indirect effect.

Unlike previous studies of tests of indirect effect in mediation models, our recommendation is based on the purpose (significance testing, reporting an interval estimate, and reporting a model fit, although model fit was not included in our simulation study) of the mediation analysis and the assumption about the distribution of data. If multivariate normality can be assumed, then one may use (a) profile likelihood, (b) MC-ML, (c) Bayes-Flat, or (d) Bayes-Weak to compute a CI and to conduct significance testing. For multivariate non-normality conditions, however, we recommend that researchers use different methods to conduct significance testing and to report a CI. For significance testing, we recommend the (a) profile likelihood, (b) MC-ML, (c) Bayes-Flat, or (d) Bayes-Weak method. If the goal is to build a CI without testing for a model fit, then we recommend the percentile bootstrap. Moreover, if one would like to be practical in not choosing the best method in a certain condition but seeking a compromise in choosing only one method for both significance testing and computing a CI, regardless of the distribution of the data, we recommend the percentile bootstrap CI. The percentile bootstrap offers overall accurate (not the most accurate) Type I error, comparable power, and good coverage across conditions. If the researcher’ goal is to build a CI for the fit indices that are based on likelihood-ratio chi-squared tests, we recommend using the YHY bootstrap, or with a caveat, the BS bootstrap. Our caveat is that for smaller sample sizes (N < 200), BS appeared to produce large standard errors for the coefficients under non-normality conditions, and many fitted models had convergence problems (Nevitt & Hancock, 2001); note that, however, our simulation study did not find these problems with the two-mediator sequential mediation model. On the other hand, YHY has been recommended for testing a model fit (X. Zhang & Savalei, 2016). To illustrate the application of the recommended methods we presented an empirical example from a study by Sanchez et al. (2017) whose materials and data are publicly available at (https://osf.io/g5fvw/). We provided code for mediation analysis of the example in the supplemental materials.

We made several simplifying assumptions in designing our simulation studies. First, we considered a single sequential mediation chain. However, the sequential mediation chain could be part of a larger model with inclusion of covariates. Inclusion of the additional covariates may improve the estimates of the parameters and their standard errors needed to calculate a CI for an indirect effect. The results of our simulation study are still applicable to such models, that is, the models with covariates and with non-zero β4, β5, and β6 paths in Figure 1. For such models one could, for example, use semi-partial R2 for endogenous variables and then look up corresponding power in the tabulated power results. We also assumed that the variables, most importantly the mediators, were measured without error. Although this is a common assumption, in practice it will often be violated. In the two-mediator sequential models considered here, measurement errors could attenuate (decrease) the magnitude of indirect effects and inflate (increase) their standard errors (Cohen et al., 2003). We surmise that the results of our simulation studies hold for structural equation models with latent variables used to model measurement errors, assuming the model is correctly specified and appropriately fitting. In addition, we used Vale and Maurelli’s (1983) approach to generate multivariate non-normal data. Vale and Maurelli’s approach has been criticized in the literature for underestimating values of skewness and kurtosis in smaller sample sizes (Astivia & Zumbo, 2015), and thus our recommendations regarding the values of skewness and kurtosis should be considered with some degree of caution (particularly in small samples). Nevertheless, the Vale and Maurelli’s approach has been widely used in methodological research, and to the extent the method may not produce perfectly defined levels of nonnormality, we are confident that our conclusions hold. Finally, we assumed that the population model in Figure 1 is correctly specified. This assumption implies that the model correctly represents the true causal order of the variables, there are no omitted confounders in the model, and the functional form of the causal relationships is linear. The validity of such assumptions in practice is unknown. Researchers should make attempts to evaluate the effects of violations of the assumptions on their results (M. G. Cox et al., 2013; Holland, 1988; Imai et al., 2010; MacKinnon & Pirlott, 2015; Tofighi et al., 2019; Tofighi & Kelley, 2016).

A strength of the simulation studies was that we studied both frequentist and two Bayesian intervals (Bayes-Flat and Bayes-Weak) across a wide range of effect sizes as well as multivariate normality and non-normality conditions for a more complex mediation model that has two sequential mediators compared to a single mediator. We hope our work helps researchers make more informed decisions regarding how to test for sequential mediation, a method that is becoming more important in psychology and related disciplines.

Supplementary Material

Supplemental

Acknowledgments

This work was partially supported by NIAAA (R01AA025539, D. Tofighi and K. Witkiewitz, PIs). The authors would like to thank Benjamin B. Dunford, Krannert School of Management, Purdue University, and Katie Witkiewitz, Department of Psychology, University of New Mexico, for helpful comments on earlier versions of this work.

Footnotes

1

We will use the MC-Robust CI with Huber-White (Huber, 1967; White, 1980) standard errors (and robust covariance of the parameter estimates) to adjust for the potential non-normality of data; however, Falk (2018) used MC-Robust with robust Satorra-Bentler (2010) standard error correction.

2

Although the terms “diffuse” or “uninformative” might be more appropriate in referring to a noninformative prior in our context, we use the term “flat” prior to be consistent with the terminology used in the rstanarm package. In our context, a flat prior for a regression coefficient does not mean a uniform prior, but it is a normal distribution with the mean of 0 and standard deviation of 10.

3

To our knowledge, there is no established guideline for the number of the Monte Carlo samples in mediation analysis. We used RMediation to calculate the desired precision of the estimates of the standard errors of the indirect effect. We then conducted preliminary analyses to decide on the number of Monte Carlo samples, conservatively choosing 100,000 Monte Carlo samples to insure stable results.

4

To our knowledge, software packages such as OpenMx and lavaan do not have built-in functions to produce case residuals. Instead, these packages compute a variety of the residuals that are a function of the difference between the sample and model implied covariance between the dependent (endogenous) variables in the model.

5

Generally, we do not recommend removing outliers when robust estimators that down weight the outliers are available. To date, OpenMx and lavaan do not have an estimator robust to outliers.

Contributor Information

Davood Tofighi, Department of Psychology, University of New Mexico.

Ken Kelley, Mendoza College of Business, University of Notre Dame.

References

  1. Adamczyk K (2018). Direct and indirect effects of relationship status through unmet need to belong and fear of being single on young adults’ romantic loneliness. Personality and Individual Differences, 124, 124–129. 10.1016/j.paid.2017.12.011 [DOI] [Google Scholar]
  2. Andreassen TW, Lorentzen BG, & Olsson UH (2006). The impact of non-normality and estimation methods in SEM on satisfaction research in marketing. Quality & Quantity, 40, 39–58. 10.1007/s11135-005-4510-y [DOI] [Google Scholar]
  3. Astivia OLO, & Zumbo BD (2015). A cautionary note on the use of the Vale and Maurelli method to generate multivariate, nonnormal data for simulation purposes. Educational and Psychological Measurement, 75(4), 541–567. 10.1177/0013164414548894 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ato García M, Vallejo Seco G, & Ato Lozano E (2014). Classical and causal inference approaches to statistical mediation analysis. Psicothema, 26(2), 252–259. 10.7334/psicothema2013.314 [DOI] [PubMed] [Google Scholar]
  5. Bernier A, McMahon CA, & Perrier R (2017). Maternal mind-mindedness and children’s school readiness: A longitudinal study of developmental processes. Developmental Psychology, 53(2), 210–221. 10.1037/dev0000225 [DOI] [PubMed] [Google Scholar]
  6. Biesanz JC, Falk CF, & Savalei V (2010). Assessing mediational models: Testing and interval estimation for indirect effects. Multivariate Behavioral Research, 45, 661–701. 10.1080/00273171.2010.498292 [DOI] [PubMed] [Google Scholar]
  7. Bollen KA, & Stine R (1990). Direct and indirect effects: Classical and bootstrap estimates of variabilty. Sociological Methodology, 20, 115–140. 10.2307/271084 [DOI] [Google Scholar]
  8. Bollen KA, & Stine RA (1992). Bootstrapping goodness-of-fit measures in structural equation models. Sociological Methods & Research, 21, 205–229. 10.1177/0049124192021002004 [DOI] [Google Scholar]
  9. Bradley JV (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152. 10.1111/j.2044-8317.1978.tb00581.x [DOI] [Google Scholar]
  10. Cain MK, Zhang Z, & Yuan K-H (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods, 49, 1716–1735. 10.3758/s13428-016-0814-1 [DOI] [PubMed] [Google Scholar]
  11. Canty A, & Ripley BD (2017). boot: Bootstrap R (S-Plus) Functions (Version 1.3–20).
  12. Chen J, Choi J, Weiss BA, & Stapleton L (2014). An empirical evaluation of mediation effect analysis with manifest and latent variables using Markov Chain Monte Carlo and alternative estimation methods. Structural Equation Modeling: A Multidisciplinary Journal, 21, 253–262. 10.1080/10705511.2014.882688 [DOI] [Google Scholar]
  13. Chernick MR, & LaBudde RA (2011). An introduction to bootstrap methods with applications to R. Hoboken, NJ: Wiley. [Google Scholar]
  14. Cheung MWL (2007). Comparison of approaches to constructing confidence intervals for mediating effects using structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 14, 227–246. 10.1080/10705510709336745 [DOI] [Google Scholar]
  15. Cohen J (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, N.J.: Erlbaum. [Google Scholar]
  16. Cohen J, Cohen P, West SG, & Aiken LS (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Erlbaum. [Google Scholar]
  17. Cox DR, & Hinkley DV (2000). Theoretical statistics. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
  18. Cox MG, Kisbu-Sakarya Y, Miočević M, & MacKinnon DP (2013). Sensitivity plots for confounder bias in the single mediator model. Evaluation Review, 37, 405–431. 10.1177/0193841X14524576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Craig CC (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7, 1–15. 10.1214/aoms/1177732541 [DOI] [Google Scholar]
  20. Curran PJ, West SG, & Finch JF (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29. 10.1037/1082-989X.1.1.16 [DOI] [Google Scholar]
  21. Davison AC (2003). Statistical models. Cambridge, UK: Cambridge University Press. [Google Scholar]
  22. Davison AC, & Hinkley DV (1997). Bootstrap methods and their application. New York, NY: Cambridge University Press. [Google Scholar]
  23. Deković M, Asscher JJ, Manders WA, Prins PJM, & van der Laan P (2012). Within-intervention change: Mediators of intervention effects during multisystemic therapy. Journal of Consulting and Clinical Psychology, 80, 574–587. 10.1037/a0028482 [DOI] [PubMed] [Google Scholar]
  24. Duane S, Kennedy AD, Pendleton BJ, & Roweth D (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222. 10.1016/0370-2693(87)91197-X [DOI] [Google Scholar]
  25. Efron B (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82, 171–185. 10.2307/2289144 [DOI] [Google Scholar]
  26. Efron B, & Tibshirani RJ (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall. [Google Scholar]
  27. Falk CF (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? Structural Equation Modeling: A Multidisciplinary Journal, 25, 244–266. 10.1080/10705511.2017.1367254 [DOI] [Google Scholar]
  28. Falk CF, & Biesanz JC (2014). Inference and interval estimation methods for indirect effects with latent variable models. Structural Equation Modeling: A Multidisciplinary Journal, 1–15. 10.1080/10705511.2014.93526631360054 [DOI] [Google Scholar]
  29. Finch JF, West SG, & MacKinnon DP (1997). Effects of sample size and nonnormality on the estimation of mediated effects in latent variable models. Structural Equation Modeling: A Multidisciplinary Journal, 4, 87–107. 10.1080/10705519709540063 [DOI] [Google Scholar]
  30. Fox J (2016). Applied regression analysis and generalized linear models (3rd ed.). Los Angeles, CA: SAGE. [Google Scholar]
  31. Freedman DA (2006). On the so-called “Huber Sandwich Estimator” and “Robust Standard Errors.” The American Statistician, 60(4), 299–302. 10.1198/000313006X152207 [DOI] [Google Scholar]
  32. Fritz MS, Taylor AB, & MacKinnon DP (2012). Explanation of two anomalous results in statistical mediation analysis. Multivariate Behavioral Research, 47, 61–87. 10.1080/00273171.2012.640596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gabry J, & Goodrich B (2018, April 13). Prior distributions for rstanarm models. Retrieved September 26, 2018, from Prior distributions for rstanarm models website: http://mc-stan.org/rstanarm/articles/priors.html#how-to-specify-flat-priors-and-why-you-typically-shouldnt
  34. Geman S, & Geman D (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721–741. 10.1109/TPAMI.1984.4767596 [DOI] [PubMed] [Google Scholar]
  35. Gilks WR, Richardson S, & Spiegelhalter DJ (Eds.). (1998). Markov chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall. [Google Scholar]
  36. Graham JD, Martin Ginis KA, & Bray SR (2017). Exertion of self-control increases fatigue, reduces task self-efficacy, and impairs performance of resistance exercise. Sport, Exercise, and Performance Psychology, 6, 70–88. 10.1037/spy0000074 [DOI] [Google Scholar]
  37. Hastings WK (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109. [Google Scholar]
  38. Hayes AF (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. New York, NY: The Guilford Press. [Google Scholar]
  39. Hayes AF, & Scharkow M (2013). The relative trustworthiness of inferential tests of the indirect effect in statistical mediation analysis does method really matter? Psychological Science, 24, 1918–1927. 10.1177/0956797613480187 [DOI] [PubMed] [Google Scholar]
  40. Henze N, & Zirkler B (1990). A class of invariant consistent tests for multivariate normality. Communications in Statistics - Theory and Methods, 19(10), 3595–3617. https://doi.org/10/bwng9v [Google Scholar]
  41. Holland PW (1988). Causal inference, path analysis, and recursive structural equations models. Sociological Methodology, 18, 449–484. 10.2307/271055 [DOI] [Google Scholar]
  42. Huber PJ (1967). The behavior of maximum likelihood estimates under nonstandard conditions Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, I, 221–233. Berkeley, CA: University of California Press. [Google Scholar]
  43. Huertas-Valdivia I, Llorens-Montes FJ, & Ruiz-Moreno A (2017). Achieving engagement among hospitality employees: A serial mediation model. International Journal of Contemporary Hospitality Management, 30, 217–241. 10.1108/IJCHM-09-2016-0538 [DOI] [Google Scholar]
  44. Imai K, Keele L, & Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. 10.1037/a0020761 [DOI] [PubMed] [Google Scholar]
  45. Jaccard J, & Wan CK (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348–357. 10.1037/0033-2909.117.2.348 [DOI] [Google Scholar]
  46. Judd CM, & Kenny DA (1981). Process analysis. Evaluation Review, 5, 602–619. 10.1177/0193841X8100500502 [DOI] [Google Scholar]
  47. Kline RB (2016). Principles and practice of structural equation modeling (4th ed.). In Methodology in the Social Sciences (4th ed.). New York, NY: Guilford. [Google Scholar]
  48. Koning IM, Maric M, MacKinnon D, & Vollebergh WAM (2015). Effects of a combined parent–student alcohol prevention program on intermediate factors and adolescents’ drinking behavior: A sequential mediation model. Journal of Consulting and Clinical Psychology, 83(4), 719–727. 10.1037/a0039197 [DOI] [PubMed] [Google Scholar]
  49. Lehmann EL, & Romano JP (2005). Testing statistical hypotheses (3rd ed.). New York, NY: Springer. [Google Scholar]
  50. MacCallum RC, & Austin JT (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201–226. 10.1146/annurev.psych.51.1.201 [DOI] [PubMed] [Google Scholar]
  51. MacKinnon DP (2008). Introduction to statistical mediation analysis. New York, NY: Erlbaum. [Google Scholar]
  52. MacKinnon DP, Lockwood CM, Hoffman JM, West SG, & Sheets V (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83–104. 10.1037//1082-989X.7.1.83 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. MacKinnon DP, Lockwood CM, & Williams J (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99–128. 10.1207/s15327906mbr3901_4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. MacKinnon DP, & Pirlott AG (2015). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review, 19, 30–43. 10.1177/1088868314542878 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Mardia KV (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. 10.1093/biomet/57.3.519 [DOI] [Google Scholar]
  56. Maxwell SE, & Cole DA (2007). Bias in cross-sectional analyses of longitudinal mediation. Psychological Methods, 12, 23–44. 10.1037/1082-989X.12.1.23 [DOI] [PubMed] [Google Scholar]
  57. McConahay JB (1986). Modern racism, ambivalence, and the Modern Racism Scale In Dovidio JF & Gaertner SL (Eds.), Prejudice, discrimination, and racism. (pp. 91–126). Orlando, FL: Academic Press. [Google Scholar]
  58. McElreath R (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Boca Raton, FL: CRC Press. [Google Scholar]
  59. Mecklin CJ, & Mundfrom DJ (2005). A Monte Carlo comparison of the Type I and Type II error rates of tests of multivariate normality. Journal of Statistical Computation and Simulation, 75(2), 93–107. https://doi.org/10/c4x9gm [Google Scholar]
  60. Meeker WQ, & Escobar LA (1995). Teaching about approximate confidence regions based on maximum likelihood estimation. The American Statistician, 49, 48 10.2307/2684811 [DOI] [Google Scholar]
  61. Metropolis N, & Ulam S (1949). The Monte Carlo method. Journal of the American Statistical Association, 44(247), 335–341. [DOI] [PubMed] [Google Scholar]
  62. Micceri T (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. 10.1037/0033-2909.105.1.156 [DOI] [Google Scholar]
  63. Muth C, Oravecz Z, & Gabry J (2018). User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. The Quantitative Methods for Psychology, 14(2), 99–119. 10.20982/tqmp.14.2.p099 [DOI] [Google Scholar]
  64. Muthén BO, & Asparouhov T (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313–335. 10.1037/a0026802 [DOI] [PubMed] [Google Scholar]
  65. Neal RM (2011). MCMC using Hamiltonian dynamics In Brooks S, Gelman A, Jones GL, & Meng X-L (Eds.), Handbook of Markov Chain Monte Carlo (pp. 113–162). Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
  66. Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR, Kirkpatrick RM, … Boker SM (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535–549. 10.1007/s11336-014-9435-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Neale MC, & Miller MB (1997). The use of likelihood-based confidence intervals in genetic models. Behavior Genetics, 27, 113–120. [DOI] [PubMed] [Google Scholar]
  68. Nevitt J, & Hancock G (2001). Performance of bootstrapping approaches to model test statistics and parameter standard error estimation in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 8, 353–377. 10.1207/S15328007SEM0803_2 [DOI] [Google Scholar]
  69. Olsson UH, Foss T, Troye SV, & Howell RD (2000). The performance of ML, GLS, and WLS estimation in structural equation modeling under conditions of misspecification and nonnormality. Structural Equation Modeling: A Multidisciplinary Journal, 7, 557–595. 10.1207/S15328007SEM0704_3 [DOI] [Google Scholar]
  70. Pawitan Y (2001). In all likelihood: Statistical modelling and inference using likelihood. New York, NY: Oxford University Press. [Google Scholar]
  71. Pearl J (2014). Interpretation and identification of causal mediation. Psychological Methods, 19, 459–481. 10.1037/a0036434 [DOI] [PubMed] [Google Scholar]
  72. Pek J, & Wu H (2015). Profile likelihood-based confidence intervals and regions for structural equation models. Psychometrika, 80, 1123–1145. 10.1007/s11336-015-9461-1 [DOI] [PubMed] [Google Scholar]
  73. Pratto F, Sidanius J, Stallworth LM, & Malle BF (1994). Social dominance orientation: A personality variable predicting social and political attitudes. Journal of Personality and Social Psychology, 67(4), 741–763. https://doi.org/10/fq8zch [Google Scholar]
  74. Preacher KJ, & Hayes AF (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879–891. 10.3758/BRM.40.3.879 [DOI] [PubMed] [Google Scholar]
  75. Preacher KJ, & Selig JP (2012). Advantages of Monte Carlo Confidence Intervals for Indirect Effects. Communication Methods and Measures, 6(2), 77–98. 10.1080/19312458.2012.679848 [DOI] [Google Scholar]
  76. Reh S, Tröster C, & Van Quaquebeke N (2018). Keeping (future) rivals down: Temporal social comparison predicts coworker social undermining via future status threat and envy. Journal of Applied Psychology, 103, 399–415. 10.1037/apl0000281 [DOI] [PubMed] [Google Scholar]
  77. Rosseel Y (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. [Google Scholar]
  78. Sanchez DT, Chaney KE, Manuel SK, Wilton LS, & Remedios JD (2017). Stigma by prejudice transfer: Racism threatens White women and sexism threatens men of color. Psychological Science, 28, 445–461. https://doi.org/10/f942p7 [DOI] [PubMed] [Google Scholar]
  79. Satorra A, & Bentler PM (2010). Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika, 75, 243–248. 10.1007/s11336-009-9135-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Savalei V (2014). Understanding robust corrections in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 21, 149–160. 10.1080/10705511.2013.824793 [DOI] [Google Scholar]
  81. Shrout PE, & Bolger N (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7, 422–445. [PubMed] [Google Scholar]
  82. Springer MD, & Thompson WE (1966). The distribution of products of independent random variables. SIAM Journal on Applied Mathematics, 14, 511–526. [Google Scholar]
  83. Stan Development Team. (2018). rstanarm: Bayesian applied regression modeling via Stan (Version 2.17.4). Retrieved from http://mc-stan.org/
  84. Taylor AB, MacKinnon DP, & Tein J-Y (2008). Tests of the three-path mediated effect. Organizational Research Methods, 11, 241–269. 10.1177/1094428107300344 [DOI] [Google Scholar]
  85. Thoemmes F, MacKinnon DP, & Reiser MR (2010). Power analysis for complex mediational designs using Monte Carlo methods. Structural Equation Modeling : A Multidisciplinary Journal, 17, 510–534. 10.1080/10705511.2010.489379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Tofighi D, Hsiao Y-Y, Kruger ES, MacKinnon DP, Van Horn ML, & Witkiewitz K (2019). Sensitivity analysis of the no-omitted confounder assumption in latent growth curve mediation models. Structural Equation Modeling: A Multidisciplinary Journal, 26, 94–109. 10.1080/10705511.2018.1506925 [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Tofighi D, & Kelley K (2016). Assessing omitted confounder bias in multilevel mediation models. Multivariate Behavioral Research, 51, 86–105. 10.1080/00273171.2015.1105736 [DOI] [PubMed] [Google Scholar]
  88. Tofighi D, & MacKinnon DP (2016). Monte Carlo confidence intervals for complex functions of indirect effects. Structural Equation Modeling: A Multidisciplinary Journal, 23(2), 194–205. 10.1080/10705511.2015.1057284 [DOI] [Google Scholar]
  89. Vale CD, & Maurelli VA (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465–471. 10.1007/BF02293687 [DOI] [Google Scholar]
  90. VanderWeele TJ (2015). Explanation in causal inference: Methods for mediation and interaction. New York, NY: Oxford University Press. [Google Scholar]
  91. West SG (2011). Editorial: Introduction to the special section on causal inference in cross sectional and longitudinal mediational models. Multivariate Behavioral Research, 46, 812–815. 10.1080/00273171.2011.606710 [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. West SG, Finch JF, & Curran PJ (1995). Structural equation models with nonnormal variables: Problems and remedies In Hoyle RH (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56–75). Thousand Oaks, CA: SAGE. [Google Scholar]
  93. White H (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48, 817–838. 10.2307/1912934 [DOI] [Google Scholar]
  94. Williams J, & MacKinnon DP (2008). Resampling and distribution of the product methods for testing indirect effects in complex models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 23–51. 10.1080/10705510701758166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Yuan K-H, Hayashi K, & Yanagihara H (2007). A class of population covariance matrices in the bootstrap approach to covariance structure analysis. Multivariate Behavioral Research, 42, 261–281. 10.1080/00273170701360662 [DOI] [PubMed] [Google Scholar]
  96. Yuan Y, & MacKinnon DP (2009). Bayesian mediation analysis. Psychological Methods, 14, 301–322. 10.1037/a0016972 [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Zhang X, & Savalei V (2016). Bootstrapping confidence intervals for fit indexes in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23, 392–408. 10.1080/10705511.2015.1118692 [DOI] [Google Scholar]
  98. Zhang Z (2016). Modeling error distributions of growth curve models through Bayesian methods. Behavior Research Methods, 48, 427–444. https://doi.org/10/f8vzvp [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental

RESOURCES