Abstract
Analytical solutions for point and variance estimators of the mediated effect, the ratio of the mediated to the direct effect, and the proportion of the total effect that is mediated were studied with statistical simulations. We compared several approximate solutions based on the multivariate delta method and second order Taylor series expansions to the empirical standard deviation of each estimator and theoretical standard error when available. The simulations consisted of 500 replications of three normally distributed variables for eight sample sizes (N = 10, 25, 50, 100, 500, 1000, and 5000) and 64 parameter value combinations. The different solutions for the standard error of the indirect effect were very similar for sample sizes of at least 50, except when the independent variable was dichotomized. A sample size of at least 500 was needed for accurate point and variance estimates of the proportion mediated. The point and variance estimates of the ratio of the mediated to nonmediated effect did not stabilize until the sample size was 2,000 for the all continuous variable case. Implications for the estimation of mediated effects in experimental and nonexperimental studies are discussed.
A mediator is a variable that accounts for all or part of the relation between a predictor and an outcome (Baron & Kenny, 1986). More formally mediation occurs when an independent variable causes a mediator which causes a dependent variable (see Sobel, 1990). Hypotheses regarding mediated or indirect effects have a long and important history in social science research (Alwin & Hauser, 1975; West & Wicklund, 1980). A well-known example of mediation in psychology is the extent to which intentions mediate the effects of attitudes on behavior (Ajzen & Fishbein, 1980). The effect of father’s socioeconomic status on son’s socioeconomic status that is mediated by son’s educational achievement has received considerable attention in sociology (Duncan, Featherman, & Duncan, 1972). Similar mediational hypotheses are present in other social science and psychological research (Baron & Kenny, 1986; James & Brett, 1984).
Mediational analyses are especially useful in studies where several mediators are targeted by an experimental manipulation, for example, mediators of the effect of threat on protective behavior (Breznitz, 1984) and mediators of the effects of different types of influence on persuasion (Cialdini, 1984). If the mediator is the only construct targeted by the manipulation, then the manipulation potentially allows for examination of the causal relationship between the mediator and the outcome. Often the manipulation changes other mediators in addition to the targeted mediator. And in many studies, the experimental manipulation is designed to change many mediators rather than one of them. More of these types of research studies are appearing in the research literature and statistical tests of the mediated effect are conducted in some of them (Bierman, 1986; Guerra & Slaby, 1990; Hansen & Graham, 1991, MacKinnon, et al., 1991).
One of the best examples of a manipulation targeting several mediators is the estimation of mediated effects in the experimental evaluation of health related prevention programs. Prevention and intervention programs are designed to change critical mediating constructs thought to be causally related to health outcomes (MacKinnon & Dwyer, 1993). For example, the Multiple Risk Factor Intervention Trial (MRFIT) was designed to reduce smoking, lower cholesterol, and lower blood pressure to prevent heart disease (Multiple Risk Factor Intervention Trial Research Group, 1990). Drug prevention programs seek to reduce drug use by increasing skills to resist drug offers and engendering norms less tolerant of drug use (Flay, 1985). AIDS and sexually transmitted disease prevention programs are designed to increase knowledge about early detection of disease, reduce barriers to screening, and change norms regarding screening (Murray et al., 1986; Shapiro, 1976). In the prevention case, mediation analysis assesses the extent to which the program changed the mediator which in turn changed the outcome variable. Mediational analysis in prevention studies is important because the processes that lead to behavior change are studied (MacKinnon, 1994). Estimation of mediated effects in these experimental prevention studies may differ from correlational studies because the independent variable is typically binary (0 = control group and 1 = treated group) rather than continuous.
The three variable mediation model, shown in Figure 1, is studied in this report. This one mediator model was studied because it facilitates the analytical and simulation tasks necessary for this research. The path from the program to the mediator to the outcome is the process of mediation. The indirect or mediated effect is equal to αβ. Other effects in the model include the direct effect, τ′, and the total effect τ = τ′ + αβ. These mediation effect measures can be supplemented with two measures of relative effect when the direct effect is nonzero. The proportion of the total effect that is mediated (αβ/(τ′ + αβ)) and the ratio of the mediated effect to the nonmediated effect (αβ/τ′) provide important information on the relative magnitude of the mediated effect. With these measures, for example, a researcher could state that half of an effect was mediated or that 30% of the total effect was explained by a particular mediator.
Until Sobel (1982, 1986) and Folmer (1981) derived the standard error of the mediated effect using the multivariate delta method, researchers had used a series of hypothesis tests to provide evidence for mediation (Baron & Kenny, 1986; Judd & Kenny, 1981) or had calculated mediated effects without a confidence interval for the effect. Stone and Sobel (1990) conducted a simulation study of the variance of the indirect effect assuming multivariate normal continuous measures and found that a sample size of at least 200 was required for adequate mediated effect variance estimation in the recursive model studied. Stone and Sobel did not study the case of mediation in experimental studies with categorical independent variables defining treatment conditions. MacKinnon and Dwyer (1993) conducted a simulation study for both binary and continuous independent variables but for one set of parameter values only and obtained results similar to Stone and Sobel. The point and variance estimators of the ratio of the mediated to the nonmediated effect and the proportion mediated have not been studied although Sobel (1982) provides the first order multivariate delta solution for their variance. There has been prior work on the densities of functions of random variables such as product and ratio however (Lominicki, 1967; Springer, 1979).
The purpose of this article is to describe the results of a simulation study investigating two major aspects of estimating mediated effects. First, the performance of point and variance estimators of mediated effect measures when the independent variable is binary (0 for the control group or 1 for the prevention program group) was investigated, as this is the typical case in the analysis of mediation in experimental studies. Second, simulation studies were conducted to compare point and variance estimators of mediated effect measures to be described in the Method section. For the mediated effect, these estimators were: (a) the multivariate delta method approximation to the standard error of the mediated effect (Sobel 1982, 1986), (b) second order or exact variance of a product, (c) unbiased estimator of the exact variance of a product (Goodman, 1960), and (d) a method based on an alternative point estimator of mediation (McGuigan & Langholtz, 1988). First and second order Taylor series solutions for the variance estimators of the proportion mediated and the ratio of the indirect to the direct effect were studied. The behavior of all estimators was examined in a wide range of sample sizes and combinations of parameter values.
Method
Point Estimation
The mediated effect can be calculated in two ways. In the first method, the following two regression equations are estimated.
Model 1 |
Model 2 |
Y0 is the outcome variable, Xp is the program or independent variable, XM is the mediator, τ codes the relationship between the program to the outcome in the first equation, τ′ is the coefficient relating the program to the outcome adjusted for the effects of the mediator, ε1 and ε2 code unexplained variability, and the intercept is assumed to be zero.
In the first regression, the outcome variable is regressed on the independent variable. In the second regression, the outcome is regressed on the independent variable and the mediator. The value of the mediated or indirect effect equals the difference in the program coefficients (τ − τ′) in the two regression models (Judd & Kenny, 1981). If the treatment coefficient (τ′) is zero when the mediator is included in the model, then the program effect is entirely mediated by the mediating variable.
A second method also involves estimation of two regression equations, and is illustrated in Figure 1. First, the coefficient in the model relating the mediator to the outcome is estimated (β) in Model 2 above. Second, the coefficient (α) relating the program to the mediating variable is estimated.
Model 3 |
The product of these two parameters (αβ) is the mediated or indirect effect. The coefficient relating the treatment variable to the outcome adjusted for the mediator (τ′) is the nonmediated or direct effect. The rationale behind this method is that mediation depends on the extent to which the program changes the mediator (α) and the extent to which the mediator affects the outcome variable (β).
Equivalence of the αβ and τ − τ′ Measures
As described above, the τ − τ′ and αβ methods are alternative computational approaches to the estimation of the mediated effect. As shown in the derivation below, the two methods yield identical estimates of mediation when the dependent variable is continuous and ordinary regression is used to estimate Model 2 and Model 3 above (τ′ = direct effect; τ = total effect; τ − τ′ is the difference in the total and direct effects; αβ = indirect or mediated effect). Although the derivation is based on population values, in every sample we have examined, c − c′ = ab, where a, b, c and c′ are the estimators of α, β, τ and τ′ respectively.
(1) |
(2) |
(3) |
(4) |
(5) |
Interval Estimation
The variance of the product of two independent random variables such as estimators a and b of coefficients α and β is discussed in several mathematical statistics texts (Mood, Graybill, & Boes, 1974; Rice, 1988). The estimators a and b are independent as described in Sobel (1982) and can be derived based on the asymptotic covariance matrix among parameter estimators described in Bollen (1989, pp. 107–109). The variance of the product of two independent random variables a and b is equal to: , where μ denotes the mean and σ denotes the standard deviation of the random variable. The sample variance estimator is , where sa is the sample standard deviation of a and ā is the sample mean of a. The asymptotic variance of the indirect or mediated effect was derived by Sobel (1982, 1986) using the multivariate delta method. The multivariate delta method is a general method to determine the variance of functions of random variables that follow a multivariate normal distribution (Bishop, Fienberg, & Holland, 1975). The method consists of pre- and post-multiplying the covariance among the relevant random variables by the partial derivatives of the new functions of the random variables. Sobel’s variance estimator based on the multivariate delta method does not include the term because it is claimed to be small compared to the other two terms and the analytic solution is based on first order derivatives. Goodman (1960) has shown that the unbiased estimator of the variance of the product of two random variables subtracts rather than adding this value.
McGuigan and Langholtz (1988) derived the variance of the indirect effect for the τ − τ′ method of determining mediation. The statistical test of mediation is given in the formula , where c and c′ are the estimators of τ and τ′, s is the sample standard deviation and the covariance between c and c′, (rscsc′) is the Mean Square Error in Model 2 divided by the sum of the squares of the independent variable.
Mediated Effect Estimators
Four analytical solutions for the variance of the mediated or indirect effect were studied:
-
First order Taylor series or the multivariate delta method solution estimator (Sobel 1982, 1986).
-
Second order Taylor series and exact variance under independence yields the following estimator (Goodman 1960; Mood, Graybill, & Boes 1974).
-
Unbiased variance estimator (Goodman, 1960).
-
Variance of τ − τ′ method estimator (McGuigan & Langholtz, 1988).
where r is the correlation between c and c′, and the covariance between c and c′ (rscsc′) is the Mean Square Error in Model 2 divided by the sum of the squares of the independent variable. Each of the four estimators were compared to the analytical variance based on true variances of a and b (Hanushek & Jackson, 1977).
Ratio and Proportion Mediated Estimators
The ratio of indirect to the direct effect is defined as αβ/τ′ and the proportion mediated is αβ/(τ′ + αβ). The estimators of the ratio of indirect to the direct effect and the proportion mediated are ab/c′ and ab/(c′ + ab), respectively. Four analytical solutions for the variance of the estimators of the proportion mediated and the ratio of the indirect to the direct effect were compared. No exact solutions are available so we use approximations in MacKinnon and Warsi (1991).
-
First order Taylor series expansion, b and c′ uncorrected.
-
First order Taylor series expansion, b and c′ correlated.
Second order Taylor series expansion, b and c′ uncorrected.
Second order Taylor series expansion, b and c′ correlated.
The second order Taylor series expansions for the ratio and the proportion are available by writing to the first author.
Simulation Description
The SAS© (Statistical Analysis System) programming language was used to conduct the statistical simulations. The data were generated from the normal distribution (Box & Muller, 1958) transformation in the RANNOR function. The current time was used as the seed for each simulation. Eight different sample sizes of 10, 25, 50, 100, 200, 500, 1000 and 5000 were chosen to reflect sample sizes in social science studies. For each of the parameters α, β and τ, four different values, .1, .3, .5 and .7 were used for a total of 43 = 64 different parameter combinations. The parameter values were chosen to reflect a variety of values that also reflect commonly observed relationships. Eight sample sizes for each of 64 parameter combination, yields 512 simulations. The population value of the error variances was equal to 1. Each simulation was replicated 500 times. Simulations for the binary and continuous independent variable were identical except that the independent variable was dichotomized prior to the regression analysis in the binary independent variable case.
Simulation Outcome Measures
The performance of each analytical solution was assessed with measures of bias and relative bias (Stone & Sobel, 1990) Mean Square Error, and empirical confidence limits. We use the same measures of approximate bias (βi) and relative bias (Rβi,) used in Stone and Sobel to compare estimates of the mediated effect (ŵ) to true values (w) or approximate true value when the exact true value is not known,
where ŵi and wi are the estimate and approximate true value at each replication. The Mean Squared Error of the estimators was obtained by squaring the bias measure. The same measures were used to evaluate estimates of the proportion mediated and ratio of mediated to direct effect and their standard errors.
Confidence intervals were examined by determining the proportion of times confidence intervals were to the left or right of the value of the mediated effect. The large sample 95% confidence limits were constructed using the mediated effect estimate plus and minus 1.96 times the estimate of the standard error of the mediated effect. It is expected that 2.5% of the confidence intervals will be to the left of the true value of the mediated effect and 2.5% will be to the right of the true value for a total of 5% of the confidence limits that will not include the true value.
The simulation outcome measures were computed for point estimators of the mediated effect (αβ), the ratio (αβ/τ′) of mediated to nonmediated effect and the proportion of mediated to the total effect (αβ/(τ′ + αβ)). These measures were also computed for four different estimators of the standard error of the mediated effect and two estimators for the proportion mediated and the ratio of the mediated to nonmediated effect.
Following recommendations that simulations be treated like any other experimental study (Hauck & Anderson, 1984), we conducted an analysis of variance with sample size, binary or continuous independent variable, parameter values, and interactions to model the variability in the simulation outcome measures. ANOVA effects on bias and relative bias measures were estimated. The difference between the number of confidence intervals to the left minus the number of confidence intervals to the right of the value of the effect was also determined.
Results
Study 1
We conducted the entire simulation described below with 100 replications, but then increased the number to 500 to see if the small number of replications may have altered the results. There were no major differences between the results for 100 and 500 replications, so we only present the results for 500 replications.
Study 2
Tables 1–9 pool the results across the 64 parameter value combinations so the mean for αβ = .16, αβ/τ′ = .67, αβ/(τ′ + αβ) = .30. Differences for specific parameter values are discussed below when the complete simulation results are analyzed with ANOVA.
Table 1.
Sample Size | 10 | 25 | 50 | 100 | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|---|
Continuous Independent Variable | ||||||||
Mean | 0.15740 | 0.15936 | 0.15944 | 0.15983 | 0.16019 | 0.15997 | 0.15998 | 0.15999 |
Bias | −0.00259 | −0.00064 | −0.00056 | −0.00016 | −0.00019 | −0.00003 | −0.00002 | −0.00001 |
Relative bias | −0.04821 | −0.00635 | −0.00875 | −0.00330 | −0.00103 | 0.00039 | −0.00004 | −6.84E-6 |
MSE | 0.08782 | 0.02154 | 0.00942 | 0.00443 | 0.00213 | 0.00084 | 0.00041 | 0.00008 |
Binary Independent Variable | ||||||||
Mean | 0.16451 | 0.15998 | 0.15972 | 0.16011 | 0.16013 | 0.16013 | 0.15998 | 0.16016 |
Bias | 0.00451 | −0.00001 | −0.00027 | 0.00011 | 0.00013 | 0.00013 | −0.00002 | 0.00016 |
Relative bias | 0.08711 | −0.00511 | −0.01595 | 0.00410 | 0.00392 | −0.00211 | 0.00153 | 0.00152 |
MSE | 0.20641 | 0.05297 | 0.02323 | 0.01101 | 0.00536 | 0.00212 | 0.00105 | 0.00021 |
Note. MSE is the mean square error. The average value of the mediated effect = 0.16 across 64 parameter combinations.
Table 9.
10 |
25 |
50 |
100 |
|||||
---|---|---|---|---|---|---|---|---|
L | R | L | R | L | R | L | R | |
First order | .0054 | .0481 | .0056 | .0496 | .0073 | .0446 | .0098 | .0448 |
Sc−c′ | .0054 | .0839 | .0051 | .0627 | .0069 | .0476 | .0094 | .0458 |
Second order | .0048 | .0190 | .0051 | .0322 | .0067 | .0326 | .0095 | .0362 |
200 |
500 |
1000 |
5000 |
|||||
---|---|---|---|---|---|---|---|---|
L | R | L | R | L | R | L | R | |
First order | .0128 | .0401 | .0169 | .0361 | .0192 | .0334 | .0216 | .0278 |
Sc−c′ | .0126 | .0389 | .0169 | .0350 | .0189 | .0329 | .0216 | .0276 |
Second order | .0126 | .0335 | .0168 | .0342 | .0189 | .0326 | .0216 | .0277 |
10 |
25 |
50 |
100 |
|||||
---|---|---|---|---|---|---|---|---|
L | R | L | R | L | R | L | R | |
First order | .0035 | .0291 | .0034 | .0415 | .0056 | .0433 | .0086 | .0440 |
Sc−c′ | .0017 | .0025 | .0004 | .0002 | .0003 | .0002 | .0000 | .0002 |
Second order | .0031 | .0067 | .0030 | .0191 | .0051 | .0273 | .0081 | .0320 |
200 |
500 |
1000 |
5000 |
|||||
---|---|---|---|---|---|---|---|---|
L | R | L | R | L | R | L | R | |
First order | .0122 | .0421 | .0157 | .0374 | .0176 | .0347 | .0222 | .0266 |
Sc−c′ | .0001 | .0000 | .0002 | .0000 | .0001 | .0000 | .0002 | .0001 |
Second order | .0116 | .0346 | .0155 | .0329 | .0174 | .0333 | .0222 | .0265 |
Note. For a given type and sample size, L and R indicate the proportion of C.I.s that were to the right and left of the true value of the mediated effect. For example, for sample size 10 and C.I.s based on the first order approximation of the standard error, 175/32000 = .0054 of the C.I.s were to the right and 1540/32000 = .0481 were to the left of the true value for a total of .0054+.0481 = .0535 (1715 out of 32000) of the C.I.s that did not contain the true value.
Point Estimates of Effects and Their Standard Error
As shown in Table 1, the point estimates of αβ had very little bias for any sample size. In contrast, as shown in Tables 2 and 3, the estimates of the ratio and the proportion were quite different from the true values, and often did not even have the correct sign for small sample sizes (10, 25, 50 and 100). These sample sizes are deleted from Tables 2 and 3. The bias of these estimates is slightly higher when the independent variable is dichotomized except for sample size 200 for the ratio case. The proportion point estimator ab/(c′+ ab) appears to stabilize at a sample size of 500, but is less stable for the case of a dichotomous independent variable. The point estimator of the ratio, ab/c′, does not appear to stabilize until the sample size is 5000. The stability of the estimator ab/(c′ + ab) does depend on values of α and β.
Table 2.
Sample Size | 10 | 25 | 50 | 100 | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|---|
Continuous Independent Variable | ||||||||
Mean | −1.84786 | −47.26400 | 0.47744 | 0.58065 | 2.32224 | 0.65510 | 0.76244 | 0.68457 |
Bias | −2.51834 | −47.93400 | −0.19304 | −0.09893 | 1.65176 | −0.01537 | 0.09196 | 0.01409 |
Relative bias | −8.95143 | −17.78700 | 0.00219 | −2.78400 | 1.32980 | 0.11650 | 0.06294 | 0.00945 |
MSE | 68786.0 | 71569579 | 757.950 | 232392.0 | 93982.0 | 2840.100 | 23.9970 | 0.05135 |
Binary Independent Variable | ||||||||
Mean | 2.95030 | 0.98031 | 0.68516 | 0.58759 | 0.83640 | 0.12761 | 1.00776 | 0.71409 |
Bias | 2.27983 | 0.30983 | 0.01468 | −0.08289 | 0.16592 | −0.54287 | 0.33782 | 0.04362 |
Relative bias | 7.80617 | 0.28871 | 2.20663 | −0.17272 | 0.28804 | −0.17998 | 0.26837 | 0.03570 |
MSE | 252828 | 12256.00 | 6432.200 | 2185.700 | 16245.00 | 2796.60 | 2669.80 | 8.84381 |
Note. MSE is mean squared error. The average true value of the ratio = 0.67 across 64 parameter combinations.
Table 3.
Sample Size | 10 | 25 | 50 | 100 | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|---|
Continuous Independent Variable | ||||||||
Mean | −0.46703 | 0.17914 | 0.26629 | 0.31823 | 0.31024 | 0.30010 | 0.29983 | 0.29767 |
Bias | −0.76424 | −0.11807 | −0.03092 | 0.02102 | 0.01303 | 0.00289 | 0.00262 | 0.00046 |
Relative bias | −7.21160 | −0.31934 | 0.00082 | 0.00300 | 0.02329 | 0.00165 | 0.00745 | 0.00145 |
MSE | 14097 | 245.1400 | 199.170 | 9.61714 | 1.51032 | 0.26244 | 0.00372 | 0.00063 |
Binary Independent Variable | ||||||||
Mean | 0.06099 | 0.21904 | 0.37595 | 0.25459 | 0.40435 | 0.32461 | 0.34249 | 0.29926 |
Bias | −0.23621 | −0.07817 | 0.07874 | −0.04261 | 0.10714 | 0.02740 | 0.04528 | 0.00206 |
Relative bias | 4.73063 | −0.28335 | 0.55254 | 0.29025 | 0.59644 | 0.05205 | 0.22638 | 0.00797 |
MSE | 2952.500 | 114.840 | 521.4900 | 261.250 | 186.430 | 8.17000 | 210.280 | 0.00190 |
Note. MSE is mean squared error. The average true value of the proportion = 0.297 across 64 parameter combinations.
Although the standard error estimators of the mediated effect were quite similar for all sample sizes (see Table 4), there was one major difference. For the case of the binary independent variable, the McGuigan and Langholtz (1988) estimator was approximately two to three times larger than the true standard error. There were several small differences among the estimators, as shown in Tables 4 and 5. Generally, the first order Taylor series estimator in Sobel (1982) performs the best. As in Stone and Sobel (1990), the standard error of the mediated effect is overestimated for smaller sample sizes.
Table 4.
Sample Size | 10 | 25 | 50 | 100 | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|---|
Continuous Independent Variable | ||||||||
Analytical | 0.26411 | 0.13718 | 0.09132 | 0.06269 | 0.04368 | 0.02738 | 0.01923 | 0.00861 |
Sc−c′ | 0.30997 | 0.14722 | 0.09497 | 0.06385 | 0.04410 | 0.02748 | 0.01934 | 0.00862 |
First Order | 0.27926 | 0.13958 | 0.09222 | 0.06287 | 0.04376 | 0.02738 | 0.01930 | 0.00862 |
Second Order | 0.31299 | 0.14789 | 0.09516 | 0.06392 | 0.04413 | 0.02748 | 0.01933 | 0.00862 |
Unbiased | 0.26952 | 0.13441 | 0.08991 | 0.06198 | 0.04341 | 0.02729 | 0.01927 | 0.00861 |
Sampling | 0.28701 | 0.14026 | 0.09187 | 0.06275 | 0.04352 | 0.02726 | 0.01913 | 0.00859 |
Binary Independent Variable | ||||||||
Analytical | 0.44206 | 0.22057 | 0.14447 | 0.09826 | 0.06812 | 0.04257 | 0.02998 | 0.01336 |
Sc−c′ | 0.77373 | 0.46102 | 0.32112 | 0.22558 | 0.15900 | 0.10033 | 0.07093 | 0.03169 |
First Order | 0.44844 | 0.22141 | 0.14427 | 0.09800 | 0.06811 | 0.04254 | 0.02998 | 0.01336 |
Second Order | 0.51772 | 0.24097 | 0.15164 | 0.10073 | 0.06909 | 0.04279 | 0.03007 | 0.01337 |
Unbiased | 0.43697 | 0.21256 | 0.13995 | 0.09597 | 0.06727 | 0.04229 | 0.02989 | 0.01335 |
Sampling | 0.44309 | 0.21867 | 0.14282 | 0.09768 | 0.06786 | 0.04245 | 0.02996 | 0.01334 |
Table 5.
Sample Size | 10 | 25 | 50 | 100 | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|---|
Continuous Independent Variable | ||||||||
Mean Square | ||||||||
sc−c′ | .03123 | .00280 | .00059 | .00013 | .000032 | 4.94E-6 | 1.22E-6 | 4.86E-8 |
First Order | .02323 | .00236 | .00053 | .00013 | .000031 | 4.87E-6 | 1.21E-6 | 4.84E-8 |
Second Order | .02418 | .00231 | .00052 | .00012 | .000031 | 4.84E-6 | 1.20E-6 | 4.84E-8 |
Unbiased | .02304 | .00244 | .00055 | .00013 | .000031 | 4.92E-6 | 1.22E-6 | 4.85E-10 |
Bias | ||||||||
sc−c′ | .04566 | .01004 | .00364 | .00116 | .000425 | 0.000095 | 0.000032 | 4.98E-6 |
First Order | .01514 | .00239 | .00089 | .00018 | .000079 | 1.94E-6 | −4.36E-6 | 2.23E-6 |
Second Order | .04887 | .01072 | .00384 | .00123 | .000453 | 0.000096 | 0.000033 | 5.18E-6 |
Unbiased | .00540 | −.00277 | .00142 | −.00071 | −.000263 | −0.000091 | −0.000034 | −7.22E-6 |
Relative Bias | ||||||||
sc−c′ | .20113 | .10244 | .06029 | .03085 | .015479 | 0.006722 | 0.002565 | 0.000866 |
First Order | .06685 | .02611 | .01417 | .00495 | .001993 | 0.000606 | −0.000467 | 0.000269 |
Second Order | .21218 | .10601 | .06193 | .03245 | .017021 | 0.006899 | 0.002738 | 0.000908 |
Unbiased | .03435 | −.01189 | −.01472 | −.01333 | −.010098 | −0.005480 | −0.003733 | −0.000372 |
Binary Independent Variable | ||||||||
Mean Square | ||||||||
sc−c′ | .17679 | .06614 | .03360 | .01707 | .00861 | .003459 | 0.00173 | 0.00035 |
First Order | .05571 | .00687 | .00165 | .00043 | .00011 | .000017 | 4.31E-6 | 1.74E-6 |
Second Order | .05404 | .00641 | .00155 | .00041 | .00010 | .000017 | 4.28E-6 | 1.74E-9 |
Unbiased | .05690 | .00724 | .00176 | .00045 | .00011 | .000018 | 4.36E-6 | 1.74E-6 |
Bias | ||||||||
sc−c′ | .33167 | .24044 | .17666 | .12732 | .09087 | .05776 | .04095 | .01833 |
First Order | .00637 | .00083 | −.00019 | −.00026 | −.00001 | −.00003 | 8.35E-6 | 1.08E-6 |
Second Order | .07565 | .02039 | .00717 | .00247 | .00097 | .00022 | .00009 | 8.99E-6 |
Unbiased | −.00509 | −.00801 | −.00491 | −.00229 | −.00085 | −.00027 | −.00008 | −7.84E-7 |
Relative Bias | ||||||||
sc−c′ | .82268 | 1.31105 | 1.56660 | 1.74664 | 1.86352 | 1.94681 | 1.98129 | 2.00740 |
First Order | .02418 | .01615 | .00577 | .00112 | .00233 | −.00111 | .00087 | .00038 |
Second Order | .19698 | .12882 | .07922 | .04592 | .02777 | .01011 | .00656 | .00153 |
Unbiased | .00379 | −.02122 | −.02881 | −.02375 | −.01557 | −.011147 | −.00484 | −.00077 |
As shown in Table 6, for both ab/c′ and ab/(c′ + ab), variance estimators obtained from first order Taylor series approximations were quite large, especially for small sample sizes. Reasonably accurate estimates were obtained at sample sizes of 500 or higher for the proportion mediated, and 5000 for the ratio. The second order Taylor series estimator was superior to the first order solution and estimators incorporating a possible correlation between the b and c′ estimates were better than estimators that assumed no correlation between b and c′.
Table 6.
Sample Size | 200 | 500 | 1000 | 5000 |
---|---|---|---|---|
Continuous Independent Variable | ||||
Ratio | ||||
First Ind. | 47700 | 716.04 | 3.16199 | .09543 |
First Cor. | 47700 | 716.06 | 3.18207 | .10344 |
Second Ind. | .26134 | .24360 | .21682 | .09423 |
Second Cor. | .28251 | .26272 | .23320 | .10212 |
Sampling Variability | 50.110 | 15.272 | 1.47196 | .10634 |
Proportion | ||||
First Ind. | 5.19027 | .43727 | .04345 | .01857 |
First Cor. | 4.68443 | .43747 | .04722 | .02031 |
Second Ind. | .09855 | .06234 | .04294 | .01856 |
Second Cor. | .10791 | .06787 | .04675 | .02029 |
Sampling Variability | .41096 | .15952 | .04843 | .02047 |
Binary Independent Variable | ||||
Ratio | ||||
First Ind. | 5851.8 | 747.16 | 1291.1 | .75952 |
First Cor. | 5851.0 | 747.16 | 1291.1 | .76480 |
Second Ind. | .28013 | .25812 | .23750 | .17819 |
Second Cor. | .28916 | .26049 | .24520 | .18307 |
Sampling Variability | 34.4960 | 14.2519 | 13.8050 | .67327 |
Proportion | ||||
First Ind. | 205.41 | 18.9648 | 375.57 | .03284 |
First Cor. | 199.03 | 18.6359 | 373.96 | .03400 |
Second Ind. | .14747 | .10198 | .07437 | .03250 |
Second Cor. | .15438 | .10596 | .07713 | .03367 |
Sampling Variability | 3.6507 | .77386 | 2.5521 | .03442 |
To more clearly identify when the point and standard error estimators of the ratio and proportion stabilized, we conducted the same simulations with100 replications and sample sizes of 2000, 3000, and 4000, as shown in Tables 7 and 8. The point and variance estimators of the ratio appear to stabilize around a sample size of 2,000 for the case of continuous measures. For the binary independent variable case, the ratio stabilizes at 4,000. As described earlier, the proportion measure stabilized at a 500 sample size and becomes more accurate at larger sample sizes.
Table 7.
Sample Size | 2000 | 3000 | 4000 |
---|---|---|---|
Continuous Independent Variable | |||
Ratio | .71403 | .69417 | .69201 |
Bias | .04355 | .02369 | .02154 |
Relative Bias | .02928 | .01634 | .01354 |
Proportion | .29879 | .29754 | .29816 |
Bias | .00157 | .00033 | .00095 |
Relative Bias | .00551 | .00289 | .00192 |
Binary Independent Variable | |||
Ratio | .77014 | .86134 | .73139 |
Bias | .09967 | .19087 | .06091 |
Relative Bias | .09492 | .07656 | .04369 |
Proportion | .30209 | .30084 | .29998 |
Bias | .00487 | .00362 | .00277 |
Relative Bias | .01311 | .01281 | .00623 |
Table 8.
Sample Size | 2000 | 3000 | 4000 |
---|---|---|---|
Continuous Independent Variable | |||
Ratio | |||
First Ind. | .18759 | .13189 | .11059 |
First Cor. | .20098 | .14243 | .11969 |
Second Ind. | .16235 | .12575 | .10864 |
Second Cor. | .17484 | .13596 | .11756 |
Sampling | .22168 | .15660 | .12584 |
Proportion | |||
First Ind. | .02979 | .02410 | .02088 |
First Cor. | .03252 | .02634 | .02281 |
Second Ind. | .02974 | .02407 | .02086 |
Second Cor. | .03247 | .02632 | .02280 |
Sampling | .03191 | .02628 | .02311 |
Binary Independent Variable | |||
Ratio | |||
First Ind. | 11.92990 | 3.37655 | .55326 |
First Cor. | 11.93859 | 3.38465 | .55934 |
Second Ind. | .22602 | .21098 | .20275 |
Second Cor. | .23261 | .21681 | .20809 |
Sampling | 2.60843 | 1.37892 | .50584 |
Proportion | |||
First Ind. | .05705 | .04712 | .03709 |
First Cor. | .05881 | .04858 | .03838 |
Second Ind. | .05323 | .04279 | .03663 |
Second Cor. | .05506 | .04431 | .03794 |
Sampling | .05957 | .04909 | .03917 |
The proportion of confidence limits based on the three estimators of the standard error of the mediated effect to the left and right of the true value is shown in Table 9. At all sample sizes, the percentage of confidence intervals to the left and right of the true value is approximately 5%. As in Stone and Sobel (1990), however, at smaller samples more confidence intervals are to the left of the true value. As sample size is increased the proportions on either side become more similar.
An ANOVA on the relative bias dependent measure was conducted to determine whether the estimators were affected by sample size, binary or continuous independent variable, and the value of parameters α, β and τ′. Relative bias decreased with increasing sample size. The binary independent variable was associated with significantly more relative bias in point estimates and standard errors. In several cases, relative bias was a function of parameter values. The relative bias in the mediated effect point and variance estimators decreased as the α and β parameters increased.
The relative bias in point and variance estimators of the ratio and proportion mediated decreased as τ′ increases. There was also evidence that relative bias in the standard error of the ratio increased when the α parameter increased. The relative bias in the standard error of the proportion mediated was dependent on all the parameter values and their interactions. The relative bias of the standard error of the proportion decreased as the α, β, and τ′ parameters increased, but the pattern of interaction effects suggested a complicated relationship among the parameter values. For example, for α = .1 and .3, relative bias increases or stays the same as β increases, but at α = .5 and .7, relative bias decreases with larger values of β.
To determine whether sample size, binary or continuous independent variable and the parameter values had any effect on the asymmetry of percentage of confidence intervals to the left and right of the true value of the mediated effect, an ANOVA was performed on the difference of the left and right values as shown in Table 9. The parameter τ′ had no significant effect while the sample size, the parameters α and β and its interaction αβ had significant effect. In general, the asymmetry decreased with increasing sample size. For the McGuigan and Langholtz (1988) estimator, the scale of the independent variable whether binary or continuous had an effect. The statistically significant effects of the parameter values and their interaction suggests that the results of the simulation may have been different if only one or a few sets of parameter values were used.
Discussion
Point estimates for the mediated effect had very little bias for all sample sizes studied, and all estimators of the standard errors for the mediated effect are quite close for sample sizes greater than 50. The first and second order Taylor series estimators were quite similar. When the independent variable is coded 0 or 1 (control versus intervention), the standard errors of the mediated effect are larger. Even when the independent variable is binary, the standard errors are quite close to the theoretical standard error for all but the τ − τ′ standard error. The standard error estimator described in McGuigan and Langholtz (1988) should not be used in the analysis of studies with binary coding of the independent variable such as experimental studies.
Ratio and proportion point estimates did not stabilize until sample size of 500 for the proportion and 5000 (for a binary independent variable) for the ratio. The required sample size also varies with the magnitude of effects. The standard error did not stabilize until sample size of 500 for the proportion. Similarly, the second order Taylor series solutions for the ratio and proportion performed slightly better than the first order solution. The standard errors of both ab/c′ and ab/(c′ + ab) are quite large and unpredictable, except at very large sample sizes and for certain parameter values. Since the bias in point and variance estimates are quite large, researchers should be very cautious in interpreting the ratio value. For the proportion, sample size of at least 500 appears to be necessary, although it was a function of parameter values. The proportion and ratio are likely to be unstable because they are ratios of random variables rather than the product as for ab. The proportion mediated is probably more stable than the ratio because the denominator, c′ + ab, will be larger than c′, the denominator for the ratio.
A simple completely specified mediation model with no latent variables was studied here. The normal distribution was assumed for all variables, except Xp, in which case the least squares and the maximum likelihood estimates are identical. We are now examining mediation in more complicated mediation models, including non-normal distributions, multiple mediators, latent variables, logistic and probit regression, and longitudinal models (MacKinnon, Dwyer & Warsi, 1992). The preliminary results of these studies of more complicated models are generally consistent with those presented here for the three variable mediator model. The results for the three variable model studied here are important because the model provides information in deciding how an experimental manipulation such as a prevention program achieved its effects.
Acknowledgments
This research was supported by a Public Health Service grant (DA06211). Part of this work was presented at the 1991 Psychometric Society Meeting. We thank Michele Nowling for manuscript preparation and Michael Sobel, Leona Aiken, Sanford Braver, and Steve West for comments on the manuscript. The following derivations can be obtained by writing to the first author: the independence of the estimators a and b, true variances of a and b, and the first and second order Taylor series solutions for the proportion and ratio measures. The first author may be reached at the Department of Psychology, Arizona State University, Tempe, Arizona 85287-1104.
Footnotes
Statistical Analysis System is a registed trademark of SAS Institute, Inc., Cary, North Carolina.
Contributor Information
David P. MacKinnon, Arizona State University
Ghulam Warsi, Arizona State University.
James H. Dwyer, University of Southern California
References
- Ajzen I, Fishbein M. Understanding attitudes and predicting social behavior. Englewood Cliffs, NJ: Prentice Hall; 1980. [Google Scholar]
- Alwin DF, Hauser RM. The decomposition of effects in path analysis. American Sociological Review. 1975;40:37–47. [Google Scholar]
- Baron RM, Kenny DA. The moderator-mediator distinction in social psychological research: Conceptual, Strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51(6):1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
- Bierman KL. Process of change during social skills training with preadolescents and its relation to treatment outcome. Child Development. 1986;57:230–240. doi: 10.1111/j.1467-8624.1986.tb00023.x. [DOI] [PubMed] [Google Scholar]
- Bishop YM, Fienberg SE, Holland PW. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press; 1975. [Google Scholar]
- Bollen KA. Structural equations with latent variables. New York: John Wiley & Sons; 1989. pp. 107–109. [Google Scholar]
- Box GEP, Muller ME. A note on the generation of random normal deviates. Annals of Mathematical Statistics. 1958;29:610–611. [Google Scholar]
- Brenitz S. Cry Wolf: The psychology of false alarms. New York: Erlbaum; 1984. [Google Scholar]
- Cialdini RB. Influence: The new psychology of modern persuasion. New York: Quill; 1984. [Google Scholar]
- Duncan OD, Featherman DL, Duncan B. Socioeconomic background and achievement. New York: Seminar Press; 1972. [Google Scholar]
- Flay BR. Psychosocial approaches to smoking prevention: A review of findings. Health Psychology. 1985;4(5):449–488. doi: 10.1037//0278-6133.4.5.449. [DOI] [PubMed] [Google Scholar]
- Folmer H. Measurement of the effects of regional policy instruments by means of linear structural equation models and panel data. Environment and Planning A. 1981;13:1435–1448. [Google Scholar]
- Goodman LA. On the exact variance of products. Journal of the American Statistical Association. 1960;55:708–713. [Google Scholar]
- Guerra NG, Slaby RG. Cognitive mediators of aggression in adolescent offenders: 2. Intervention. Developmental Psychology. 1990;26(2):269–277. [Google Scholar]
- Hansen WB, Graham JG. Preventing, alcohol, marijuana, and cigarette use among adolescents: Peer pressure resistance training versus establishing conservative norms. Preventive Medicine. 1991;20:414–430. doi: 10.1016/0091-7435(91)90039-7. [DOI] [PubMed] [Google Scholar]
- Hanushek EA, Jackson JE. Statistical methods for social scientists. New York: Academic Press; 1977. [Google Scholar]
- Hauck WW, Anderson S. A survey regarding the reporting of simulation studies. American Statistician. 1984;38:214–216. [Google Scholar]
- James LR, Brett JM. Mediators, moderators and tests for mediation. Journal of Applied Psychology. 1984;69(20):307–321. [Google Scholar]
- Judd CM, Kenny DA. Process Analysis: Estimating mediation in treatment evaluations. Evaluation Review. 1981;5(5):602–619. [Google Scholar]
- Lominicki ZA. On the distribution of the products of random variables. Journal of the Royal Statistical Society Series B. 1967;29:513–524. [Google Scholar]
- MacKinnon DP. Analysis of mediating variables in prevention and intervention studies. In: Beatty L, Cezares A, editors. Scientific methods in prevention research. Washington, DC: U.S. Government Printing Office; 1994. pp. 127–153. National Institute on Drug Abuse, Monograph #139. DHHS Publication No. 94-3631. [Google Scholar]
- MacKinnon DP, Dwyer JH. Estimating mediating effects in prevention studies. Evaluation Review. 1993;17(2):144–158. [Google Scholar]
- MacKinnon DP, Dwyer JH, Warsi G. Estimating mediating effects in logistic and probit regression. Paper presented at the annual meeting of the Psychometric Society; Columbus, Ohio.. Jul, 1992. [Google Scholar]
- MacKinnon DP, Johnson CA, Pentz MA, Dwyer JH, Hansen WB, Flay BR, Wang E. Mediating mechanisms in a school-based drug prevention program: First year effects of the Midwestern Prevention Project. Health Psychology. 1991;10(3):164–172. doi: 10.1037//0278-6133.10.3.164. [DOI] [PubMed] [Google Scholar]
- MacKinnon DP, Warsi G. Mediator Project Technical Report. Tempe, AZ: Arizona State University; 1991. On the variance of measures of mediation. [Google Scholar]
- McGuigan K, Langholtz B. A note on testing mediation paths using ordinary least-squares regression. 1988. Unpublished note. [Google Scholar]
- Mood A, Graybill FA, Boes DC. Introduction to the theory of statistics. New York: McGraw-Hill; 1974. [Google Scholar]
- The Multiple Risk Factor Intervention Trial Research Group. Mortality rates after 10.5 years for participants in the Multiple Risk Factor Intervention Trial: Findings relate to apriori hypotheses of the trial. Journal of the American Medical Association. 1990;263(13):1795–1801. doi: 10.1001/jama.1990.03440130083030. [DOI] [PubMed] [Google Scholar]
- Murray DM, Luepker RV, Pirie PL, Grimm RH, Bloom E, Davis MA, Blackburn H. Systematic risk factor screening and education: A community-wide approach to prevention of coronary heart disease. Preventive Medicine. 1986;15:661–672. doi: 10.1016/0091-7435(86)90071-x. [DOI] [PubMed] [Google Scholar]
- Rice JA. Mathematical statistics and data analysis. Monterey, CA: Brooks/Cole; 1988. [Google Scholar]
- Shapiro S. Statistical evidence for mass screening for breast cancer and some remaining issues. Cancer Detection and Prevention. 1976;1:347–363. [Google Scholar]
- Sobel ME. Asymptotic confidence intervals for indirect effects in structural equation models. In: Leinhardt S, editor. Sociological Methodology 1982. Washington, DC: American Sociological Association; 1982. pp. 290–312. [Google Scholar]
- Sobel ME. Some new results on indirect effects and their standard errors in covariance structure models. In: Tuma N, editor. Sociological Methodology 1986. Washington, DC: American Sociological Association; 1986. pp. 159–186. [Google Scholar]
- Sobel ME. Effect analysis and causation in linear structural equation models. Psychometrika. 1990;55(3):495–515. [Google Scholar]
- Springer MD. The algebra of random variables. New York: John Wiley & Sons; 1979. [Google Scholar]
- Stone CA, Sobel ME. The robustness of estimates of total indirect effects in covariance structure models estimated by maximum likelihood. Psychometrika. 1990;55(2):337–352. [Google Scholar]
- West SG, Wicklund RA. A primer of social psychological theories. Monterey, CA: Brooks/Cole; 1980. [Google Scholar]