A Simple Statistic for Comparing Moderation of Slopes and Correlations

Michael Smithson

doi:10.3389/fpsyg.2012.00231

. 2012 Jul 9;3:231. doi: 10.3389/fpsyg.2012.00231

A Simple Statistic for Comparing Moderation of Slopes and Correlations

Michael Smithson ^1,^*

PMCID: PMC3408110 PMID: 22866042

Abstract

Given a linear relationship between two continuous random variables X and Y that may be moderated by a third, Z, the extent to which the correlation ρ is (un)moderated by Z is equivalent to the extent to which the regression coefficients β_y and β_x are (un)moderated by Z iff the variance ratio $σ_{y}^{2} ∕ σ_{x}^{2}$ is constant over the range or states of Z. Otherwise, moderation of slopes and of correlations must diverge. Most of the literature on this issue focuses on tests for heterogeneity of variance in Y, and a test for this ratio has not been investigated. Given that regression coefficients are proportional to ρ via this ratio, accurate tests, and estimations of it would have several uses. This paper presents such a test for both a discrete and continuous moderator and evaluates its Type I error rate and power under unequal sample sizes and departures from normality. It also provides a unified approach to modeling moderated slopes and correlations with categorical moderators via structural equations models.

Keywords: moderator effects, interaction effects, heteroscedasticity, regression, correlation

Introduction

Let X and Y have a bivariate normal distribution, $X ~ N (μ_{x}, σ_{x}^{2}),$ and $Y ~ N (μ_{y}, σ_{y}^{2}) .$ Suppose also that the correlation between X and Y is a function of a moderator variable Z. Under homogeneity of variance (HoV), moderation of correlations implies moderation of regression coefficients (or means, in ANOVA), and vice versa. For example, establishing the existence of a moderator effect from Z in a linear regression model with X and Z predicting Y by finding a significant regression coefficient for the product term X × Z suffices to infer a corresponding moderator effect of Z on the correlation between X and Y.

Heterogeneity of variance (HeV) due to Z, however, can alter moderator effects so that correlation and regression coefficients are not equivalently moderated. We may have moderation of slopes, for instance, without moderation of correlations, moderation of correlations with no moderation of slopes, moderation of slopes and correlations in opposite directions, or even moderation of regression coefficients in opposite directions (e.g., what appears to be a positive moderator effect when X predicts Y becomes a negative effect when Y predicts X).

Although some scholars have warned about the impacts of heteroscedasticity on the analysis of variance (e.g., Grissom, 2000) and linear regression, most contemporary textbook advice and published evidence on this matter comforts researchers with the notion that ANOVA and regression are fairly robust against it. Howell (2007, p. 316), for instance, states that despite Grissom’s pessimistic outlook “the homogeneity of variance assumption can be violated without terrible consequences” and advises that for symmetrically distributed populations and equal numbers of participants in each cell, the validity of ANOVA is likely if the ratio of the largest to the smallest variance is no greater than 4. Tabachnick and Fidell (2007, pp. 121–123) are even more relaxed, recommending an upper limit on this ratio of 10 before raising an alarm. A recent investigation into the robustness of one-way ANOVA against violations of normality (Schmider et al., 2010) also is relatively reassuring on that count. A fairly recent comparison of several tests of homogeneity of variance (Correa et al., 2006) generally finds in favor of the Levene test but leaves the issue of the impact of HoV on moderator effects unexamined.

Nevertheless, this problem is well-known. Arnold (1982) drew a distinction between the “form” and “degree” of moderator effects, whereby the “form” is indexed by moderation of slopes (or means, in ANOVA) whereas the “degree” is indexed by moderation of correlations. He argued from first principles and demonstrated empirically that it is possible to find a significant difference between correlations from two independent-samples but fail to find a corresponding significant regression interaction term, and vice versa. A related treatment was presented independently by Sharma et al. (1981), who referred to “degree” moderators as “homologizers” (a term taken from Zedeck, 1971). They pointed out that homologizers that act through the error-term in a regression instead of through the predictor itself.

Stone and Hollenbeck (1984) dissented from Arnold (1982), arguing that only moderated regression is needed to assess moderating effects, regardless of whether they are of form or degree. Their primary claims were that moderated slopes also can be interpreted as differing strengths of relationship, and that the subgrouping method advocated by Arnold raises concerns about how subgroups are created if the moderator is not categorical. Arnold (1984) rebutted their claim regarding the slope as a measure of relationship strength, reiterating the position that slopes, and correlations convey different types of information about such relationships. He also declared that both moderated regression and tests of differences between correlation coefficients are essentially “subgroup” methods. At the time there was no way to unify the examination of moderation of correlations and slopes. The present paper describes and demonstrates such an approach for categorical moderators, via structural equations models.

In a later paper, Stone and Hollenbeck (1989) reprised this debate and recommended variance-stabilizing and homogenizing transformations as a way to eliminate the apparent disagreement between moderation of correlations and moderation of slopes. These include not only transformations of the dependent variable, but also within-groups standardization and/or normalization. They also, again, recommended abandoning the distinction between degree and form moderation and focusing solely on form (i.e., moderated regression). The usual cautions against routinely transforming variables and objections to applying different transformations to subsamples aside, we shall see that transforming the dependent variable is unlikely to eliminate the non-equivalence between moderation of slopes and correlations. Moreover, other investigators of this issue do not arrive at the same recommendation as Stone and Hollenbeck when it comes to a “best” test.

Apparently independently of the aforementioned work, and extending the earlier work of Dretzke et al. (1982), Alexander and DeShon (1994) demonstrated severe effects from heterogeneity of error-variance (HeEV) on power and Type I error rates for the F-test of equality of regression slopes. In contrast to Stone and Hollenbeck (1989), they concluded that for a categorical moderator, the “test of choice” is the test for equality of correlations across the moderator categories, provided that the hypotheses of equal correlations and equal slopes are approximately identical.

These hypotheses are equivalent if and only if the ratio of the variance in X to the variance in Y is equal across moderator categories (Arnold, 1982; Alexander and DeShon, 1994). The reason for this is clear from the textbook equation between correlations and unstandardized regression coefficients. For the ith category of the moderator,

β_{y i} = ρ_{i} \frac{σ_{y i}}{σ_{x i}}

(1)

For example, a simple algebraic argument shows that if the σ_yi/σ_xi ratio is not constant for, say, i = 1 and i = 2 then β₁ = β₂ ⇒ ρ₁ ≠ ρ₂, and likewise ρ₁ = ρ₂ ⇒ β₁ ≠ β₂. More generally,

\frac{σ_{y 1} σ_{x 2}}{σ_{x 1} σ_{y 2}} > (<) 1 \Leftrightarrow |\frac{β_{1}}{β_{2}}| > (<) |\frac{ρ_{1}}{ρ_{2}}| .

(2)

The condition for correlations and slopes to be moderated in opposite directions follows immediately: β₁ > β₂ but ρ₂ > ρ₁ if when ρ₂ > ρ₁, it is also true that

\frac{σ_{y 1} σ_{x 2}}{σ_{x 1} σ_{y 2}} > \frac{ρ_{2}}{ρ_{1}} .

The same implication holds if the inequalities are changed from > to <.

The position taken in this paper is that in multiple linear regression there are three distinct and valid types of moderator effects. First, in multiple regression equation (1) generalizes to a version where standardized regression coefficients replace correlation coefficients:

β_{y i} = B_{y i} \frac{σ_{y i}}{σ_{x i}}

(3)

where B_yi is a standardized regression coefficient. Thus, we have moderation of unstandardized versus standardized regression coefficients (or correlations when there is only one predictor), which are equivalent if and only if the aforementioned variance ratio is equal across moderator categories. Otherwise, the assumption that moderation of one implies equivalent moderation of the other is mistaken. This is a simple generalization of Arnold’s (1982) and Sharma et al.’s (1981) distinction.

Second, the semi-partial correlation coefficient, ν_xi, is a simple function of B_yi and tolerance. In the ith moderator category, the tolerance of a predictor, X, is $T_{x i} = 1 - R_{x i}^{2},$ where $R_{x i}^{2}$ is the squared multiple correlation for X regressed on the other predictors included in the multiple regression model. The standardized regression coefficient, semi-partial correlation, and tolerance are related by

ν_{x i} = B_{y i} \sqrt{T_{x i}} .

Equation (3) therefore may be rewritten as

β_{y i} = ν_{x i} \frac{σ_{y i}}{σ_{x i} \sqrt{T_{x i}}} .

(4)

Thus, we have a distinction between the moderation of the unique contribution of a predictor to the explained variance of a dependent variable and moderation of regression coefficients (whether standardized or not). Equivalence with moderation of standardized coefficients (or simple correlations) hinges on whether tolerance is constant across moderator categories (an issue not dealt with in this paper), while equivalence with moderation of unstandardized coefficients depends on both constant tolerance and constant variance ratios.

In a later paper, DeShon and Alexander (1996) proposed alternative procedures for testing equality of regression slopes under HeEV, but they and both earlier and subsequent researchers appear to have neglected the idea of testing for equal variance ratios (EVR) across moderator categories. This is understandable, given that HeEV is a more general concern in some respects and the primary object of most regression (and ANOVA) models is prediction.

Nevertheless, it is possible for HeEV to be satisfied when EVR is not. An obvious example is when there is HoV for Y and equality of correlations across moderator categories but HeV for X. These conditions entail HeEV but also imply that slopes cannot be equal across categories. This case seems to have been largely overlooked in the literature on moderators. More generally, HeEV is ensured when, for all i and j,

\frac{σ_{y i}^{2}}{σ_{y j}^{2}} = \frac{1 - ρ_{j}^{2}}{1 - ρ_{i}^{2}},

(5)

which clearly has no bearing on whether EVR holds or not.

Thus, a test of EVR would provide a guide for determining when equality of slopes and equality of correlations are equivalent null hypotheses and when not. Given that it is not uncommon for researchers to be interested in both moderation of slopes (or means) and moderation of correlations, this test could be a useful addition to data screening procedures.

It might seem that if researchers are going to test for both moderation of slopes and correlations, a test of EVR is superfluous. However, the joint outcome of the tests of equal correlations and equal slopes does not render the question of EVR moot or irrelevant. The reason this should interest applied researchers is that the tests of equal correlations and equal slopes will not inform them of whether the moderation of slopes is equivalent to the moderation of correlations, whereas a test of EVR would do exactly that. Suppose, for example, the test for equality of slopes yields p = 0.04 (so we reject the null hypothesis) whereas the corresponding test for correlations yields p = 0.06 (so we fail to reject). An EVR test would tell us whether these two outcomes are genuinely unequal or whether their apparent difference may be illusory. Thus, an EVR test logically should take place before tests of equality of slopes or correlations, because it will indicate whether both of the latter tests need to be conducted or just one will suffice.

Furthermore, an estimate of the ratio of the variance ratios along with its standard error provides an estimate of (and potentially a confidence interval for) a ratio comparison between moderation of slopes and moderation of correlations. From equations (1) and (2), for the ith and jth moderator categories, we immediately have

\frac{σ_{y i} ∕ σ_{x i}}{σ_{y j} ∕ σ_{x j}} = \frac{β_{y i} ∕ β_{y j}}{ρ_{i} ∕ ρ_{j}} .

(6)

Finally, equation (3) tells us that an EVR test can be used to assess the equivalence between the moderation of standardized and unstandardized regression coefficients, thereby expanding its domain of application into multiple regression.

All said and done, it is concerning that numerous articles in the foremost journals in psychology routinely report tests of interactions in ANOVAs, ANCOVAs, and regressions with no mention of prior testing for either HeV or HeEV. Moreover, reviews of the literature on metric invariance by Vandenberg and Lance (2000) and DeShon (2004) indicated considerable disagreement on the importance of HeEV for assessments of measurement invariance across samples in structural equations models. Researchers are unlikely to be strongly motivated to use a test for EVR unless it is simple, readily available in familiar computing environments, robust, and powerful. We investigate such a test with these criteria in mind.

A Test of EVR for Categorical Moderators

An obvious candidate for a test of EVR is a parametric test based on the log-likelihood of a bivariate normal distribution for X and Y conditional on a categorical moderator Z. We employ submodels for the standard deviations using the log link. Using the first category of the moderator as the “base” category, the submodels may be written as

\begin{array}{rcl} σ_{x i} = exp (\sum_{i} z_{i} δ_{x i}), & (7) \\ σ_{y i} = exp (\sum_{i} z_{i} δ_{y i}), \end{array}

where z₁ = 1 and for i > 1 z_i is an indicator variable for the ith category of Z, and the δ parameters are regression coefficients. Under the hypothesis that EVR holds between the ith and first categories, the relevant test statistic is

θ_{i} = δ_{y i} - δ_{x i},

(8)

for i > 1, with

var (θ_{i}) = var (δ_{y i}) + var (δ_{x i}) - 2 cov (δ_{y i}, δ_{x i}),

(9)

and the assumption that δ_yi and δ_yi are asymptotically bivariate normally distributed. Immediately we have a confidence interval for θ_i, namely ${\hat{θ}}_{i} α t_{α ∕ 2} \sqrt{v \hat{ar}} (θ_{i}),$ where t_α/2 is the 1-α/2 quantile of the t distribution with the appropriate degrees of freedom for an independent-samples test. We also have

\exp (θ_{i}) = \frac{β_{y i} / β_{y 1}}{ρ_{i} / ρ_{1}},

(10)

and we may exponentiate the limits of this confidence interval to obtain a confidence interval for the right-hand expression in this equation, i.e., for the ratio comparison between the ratio of moderated regression coefficients and the ratio of moderated correlations.

The hypothesis that θ_i = 0 is equivalent to a restricted model in which, for i > 1, δ_xi = δ_yi. The modeling approaches outlined later in this paper make use of this equivalence. More complex EVR hypotheses may require different design matrices from the setup proposed in this introductory treatment. First, however, we shall examine the properties of θ, including Type I error rates and power under unequal sample sizes, and the effects of departures from normality for X and Y.

Assessing Type I Error Accuracy and Power

We begin with simulations testing null hypothesis rejection rates for EVR when the null hypotheses of EVR and unmoderated correlations and slopes are true. Simulations using a two-category moderator (20,000 runs for each condition) were based on DeShon and Alexander, 1996; Table 1), with constant variance ratio of 2, $ρ_{x y} = 1 ∕ \sqrt{2}$ , and β_y = 1 for both categories. Three pairs of sample sizes were used (again based on DeShon and Alexander, 1996): 70 for both samples, 45 for one and 155 for the second, and 90 for one and 180 for the second. Three pairs of variances also were used, to ascertain any impact from the sizes of the variances. All runs used a Type I error criterion of α = 0.05.

Table 1.

Type I error: two-groups simulations.

Skew	N₁	N₂	σ_xi = 1,1	σ_xi = 1,2	σ_xi = 1,4
			σ_yi = 2,2	σ_yi = 2,4	σ_yi = 2,8
NORMAL
0	70	70	0.0518	0.0532	0.0511
0	45	155	0.0578	0.0553	0.0547
0	90	180	0.0513	0.0531	0.0503
SKEWED
2	70	70	0.0710	0.0704	0.0680
4	70	70	0.0768	0.0767	0.0713
2	45	155	0.0681	0.0686	0.0687
4	45	155	0.0778	0.0776	0.0774
2	90	180	0.0679	0.0708	0.0714
4	90	180	0.0755	0.0728	0.0723

Open in a new tab

The top half of Table 1 shows the EVR rejection rates for random samples from normally distributed X and Y. Unequal sample sizes have little impact on rejection rates, with the effect appearing to diminish in the larger-sample (90–180) condition. The rates are slightly higher than 0.05, but are unaffected by the sizes of the variances.

The lower half of Table 1 shows simulations under the same conditions, but this time with X and Y sampled from the Azzalini skew-normal distribution (Azzalini, 1985). The standard skew-normal pdf is

f (x, λ) = \frac{e^{- x^{2}} / 2}{\sqrt{2 π}} (1 + E r f [\frac{λ x}{\sqrt{2}}]) .

The simulations had the skew parameter λ set to 2 and 4, the pdfs for which are shown in Figure 1. Skew increased the rejection rates to 0.068–0.078, rendering the test liberal but not dramatically so.

**Azzalini Skew-normal distributions with λ = 0,2,4**.

We now turn to investigating the power of the EVR test. Simulations testing its power were conducted for two situations: moderated slopes but unmoderated correlations, and moderated correlations but unmoderated slopes. Both batches of simulations were run with four combinations of sample sizes (70–70, 40–140, 140–140, and 80–280) and three variance ratio combinations (1–1.5, 1–2, 1–4). In the unmoderated correlations setup ρ = 0.5 for all conditions, and in the unmoderated slopes setup β_y = 0.5 for all conditions. These tests also require modeling the moderation of correlations. The correlation submodel uses the Fisher link, i.e.

log (\frac{1 + ρ_{i}}{1 - ρ_{i}}) = \sum_{i} w_{i} δ_{r i} .

(11)

Note that we allow a different set of predictors for the correlation from those in equation (7). However, in this paper we will impose the restriction w_i = z_i.

Table 2 shows the simulation results for unequal variance ratios with unmoderated correlations. The table contains rejection rates of the EVR and moderation of correlation null hypotheses. The resultant moderated slopes and error-variances are displayed for each condition. Note that HeV and HeEV do not have discernible effects on either of the rejection rates. As in the preceding simulations, the rejection rates for the unmoderated correlations are only slightly above the 0.05 criterion. The rejection rates for the EVR test l and in the 0.85–1.0 range in the conditions where the combined sample sizes are 280 and the ratio of the variance ratios is 2:1 or for both combined sizes when the ratio is 4:1.

Table 2.

Power: moderated slopes and unmoderated correlations.

N₁	N₂	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$
		$σ_{y}^{2} = 2$	$σ_{y}^{2} = 3$	$σ_{y}^{2} = 2$	$σ_{y}^{2} = 4$	$σ_{y}^{2} = 2$	$σ_{y}^{2} = 8$
		σ_xy = 1	$σ_{x y} = \sqrt{3 ∕ 2}$	σ_xy = 1	$σ_{x y} = \sqrt{2}$	σ_xy = 1	σ_xy = 2
		$σ_{e}^{2} = 1.5$	$σ_{e}^{2} = 2.25$	$σ_{e}^{2} = 1.5$	$σ_{e}^{2} = 3$	$σ_{e}^{2} = 1.5$	$σ_{e}^{2} = \sqrt{8} ∕ 2$
		β_y = 0.5	$β_{y} = \sqrt{3 ∕ 8}$	β_y = 0.5	$β_{y} = \sqrt{2} ∕ 2$	β_y = 0.5	β_y = 1
		δ_r	θ	δ_r	θ	δ_r	θ
70	70	0.0556	0.2810	0.0603	0.6321	0.0576	0.9939
40	100	0.0566	0.2478	0.0569	0.5706	0.0566	0.9875
140	140	0.0549	0.4841	0.0532	0.9032	0.0537	1.000
80	200	0.0529	0.4311	0.0497	0.8524	0.0522	0.9999

Open in a new tab

Table 3 shows the rejection rates of the EVR and moderation of correlation null hypotheses when there are unequal variance ratios and moderated correlations. The resultant moderated correlations and error-variances are displayed for each condition. As before, HeV and HeEv do not affect either of the rejection rates. Likewise, as expected, the EVR rejection rates are very similar to those in Table 2. It is noteworthy that rejection rates for the unmoderated correlations hypothesis are considerably smaller than those for the EVR hypothesis, even though the correlations differ fairly substantially. It is well-known that tests for moderation of slopes and correlations have rather low power. These results, and the fact that the ratios of the variance ratios do not exceed Howell’s benchmark of 4:1, suggest that the EVR test has relatively high power.

Table 3.

Power: unmoderated slopes and moderated correlations.

N₁	N₂	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$	$σ_{x}^{2} = 2$
		$σ_{y}^{2} = 2$	$σ_{y}^{2} = 3$	$σ_{y}^{2} = 2$	$σ_{y}^{2} = 4$	$σ_{y}^{2} = 2$	$σ_{y}^{2} = 8$
		σ_xy = 1	σ_xy = 1	σ_xy = 1	σ_xy = 1	σ_xy = 1	σ_xy = 1
		$σ_{y}^{2} = 1.5$	$σ_{y}^{2} = 2.5$	$σ_{y}^{2} = 1.5$	$σ_{y}^{2} = 3.5$	$σ_{y}^{2} = 1.5$	$σ_{e}^{2} = 6$
		ρ_xy = 0.5	$ρ_{x y} = 1 / \sqrt{6}$	ρ_xy = 0.5	$ρ_{x y} = 1 / \sqrt{8}$	ρ_xy = 0.5	ρ_xy = 0.25
		δ_r	θ	δ_r	θ	δ_r	θ
70	70	0.0944	0.2635	0.1525	0.5925	0.3426	0.9864
40	100	0.0992	0.2394	0.1700	0.5476	0.3558	0.9826
140	140	0.1296	0.4575	0.2483	0.8771	0.5844	1.000
80	200	0.1444	0.4201	0.2878	0.8326	0.6036	0.9999

Open in a new tab

Structural Equations Model Approach

When the moderator variable is categorical, the EVR test can be incorporated in a structural equations model (SEM) approach that permits researchers not only to compare an EVR model against one that relaxes this assumption, but also to test simultaneously for HeV, HeEV, moderation of correlations and moderation of slopes. Figure 2 shows the regression (left-hand side) and correlation (right-hand side) versions of this model. The latter follows Preacher’s (2006) strategy for a multi-group SEM for correlations. The regression version models the error-variances $σ_{e i}^{2}$ rather than the variances $σ_{y i}^{2} .$ Instead, $σ_{y i}^{2}$ is modeled in the correlation version. The only addition to the correlation SEM required for incorporating EVR tests is to explicitly model variance ratios for each of the moderator variable categories. Two SEM package that can do so are lavaan (Rosseel, 2012) in R and MPlus (Muthén and Muthén, 2010). Examples in lavaan and MPlus are available at http://dl.dropbox.com/u/1857674/EVR_moderator/EVR.html, as are EVR test scripts in SPSS and SAS.

**Moderated regression and correlation structural equations models**.

Simulations were run using lavaan in model comparisons for samples with moderated slopes but unmoderated correlations, and samples with moderated correlations but unmoderated slopes. As before, each simulation had 20,000 runs.

Simulations from bivariate normal distributions with ρ_xy = 0.05 for both groups (Table 4) indicated that moderately large samples and slope differences are needed for reasonable power. However, there was little impact on power from unequal group sizes. Rejection rates for the unmoderated correlations hypothesis were at appropriate levels, 0.0493–0.0559.

Table 4.

Moderated regression coefficients.

N₁	N₂	β_y = 0.50	β_y = 0.50	β_y = 0.50
		β_y = 0.61	β_y = 0.71	β_y = 1.00
70	70	0.1086	0.2030	0.5659
40	100	0.1012	0.2026	0.6031
140	140	0.1633	0.3668	0.8566
80	200	0.1499	0.3549	0.8875

Open in a new tab

Likewise, simulations from bivariate normal distributions with β_y = 0.5 for both groups (Table 5) indicated that moderately large samples and correlation differences are needed for reasonable power. There was a slight to moderate impact from unequal group sizes, somewhat greater than the impact in Table 4. Rejection rates for the unmoderated slopes hypothesis were appropriately 0.0484–0.0538.

Table 5.

Moderated correlations.

N₁	N₂	ρ_xy = 0.50	ρ_xy = 0.50	ρ_xy = 0.50	ρ_xy = 0.50
		ρ_xy = 0.41	ρ_xy = 0.35	ρ_xy = 0.25	ρ_xy = 0.17
70	70	0.1159	0.1984	0.4406	0.6390
40	100	0.1031	0.1728	0.3610	0.5487
140	140	0.1760	0.3521	0.7215	0.9008
80	200	0.1541	0.2960	0.6312	0.8395

Open in a new tab

SEM Example

Consider a population with two normally distributed variables X, political liberalism, and Y, degree of belief in global warming. Suppose that they are measured on scales with means of 0 and standard deviations of 1, and the correlation between these two scales is ρ = 0.45. Suppose also that if members of this population are exposed to a video debate highlighting the arguments for and against the reality of global warming, it polarizes belief in global warming by increasing the degree of belief of those who already tend to believe it and decreasing the degree of belief of those who already are skeptical. Thus, the standard deviation doubles from 1 to 2. However, the mean remains at 0 and the correlation between belief in global warming and political liberalism also is unchanged, remaining at 0.45.

In a two-condition experiment with half the participants from this population assigned to a condition where they watch the video and half to a “no-video” condition, the experimental conditions may be regarded as a two-category moderator variable Z. We have ρ = 0.45 and σ_x = 1 regardless of Z, and σ_y1 = 2 whereas σ_y2 = 1. It is also noteworthy that when X predicts Y HeEV is violated whereas when Y predicts X it is not.

We randomly sample 600 people from this population and randomly assign 300 to each condition, representing the video condition with Z = 1 and the no-video condition with Z = −1. As expected, the sample correlations in each subsample do not differ significantly: r₁ = 0.458, r₂ = 0.463, and Fisher’s test yields z = 0.168 (p = 0.433). However, a linear regression with Y predicted by X and Z that includes an interaction term (Z × X) finds a significant positive interaction coefficient (z = 3.987, p < 0.0001). Taking the regression on face value could mislead us into believing that because the slope between X and Y differs significantly between the two categories of Z, Z also moderates the association between X and Y. Of course, it does not. Seemingly more puzzling is the fact that linear regression with Y predicting X yields a significant negative interaction term (Z × Y) with z = −3.859 (p = 0.0001). So the regression coefficient is moderated in opposite directions, depending on whether we predict Y or X.

The scatter plots in Figure 3 provide an intuitive idea of what is going on. Clearly the slope for Y (belief in global warming) predicted by X (liberalism) appears steeper when Z = 1 than when Z = −1. Just as clearly, the slope for X predicted by Y appears less steep when Z = 1 than when Z = −1. The oval shapes of the data distribution in both conditions appear similar to one another, giving the impression that the correlations are similar.

**Scatter plots for the two-condition experiment**.

We now demonstrate that the SEM approach can clarify and validate these impressions, using Mplus 6.12. We begin with the moderation of slopes models. Because σ_x1 = σ_x2 (i.e., X has HoV) we may move from the saturated model to one that restricts those parameters to be equal. The model fit is χ²(1) = 0.370 (p = 0.543). This baseline model also reproduces the slopes estimates in OLS regression. Now, a model removing HoV for X and imposing the EVR restriction yields χ²(1) = 82.246 (p < 0.0001), so clearly we can reject the EVR hypothesis. Fitting another model with HoV in X and HeV in Y but where we set β_y1 = β_y2, the fit is χ²(2) = 15.779 (p = 0.0004), and the model comparison test is χ²(1) = 15.779 − 0.370 = 15.409 (p < 0.0001). We conclude there is moderation of slopes but EVR does not hold, so we expect that the moderation of correlations will differ from that of the slopes, and the moderation of slopes will differ when X predicts Y versus when Y predicts X. Indeed, if we fit models with Y predicting X we also can reject the equal slopes model, and the slopes differ in opposite directions across the categories of Z. When X predicts Y β_y1 = 0.496 and β_y2 = 0.978, whereas when Y predicts X β_x1 = 0.219 and β_x2 = 0.423.

Turning to correlations, we start with a model that sets σ_x1 = σ_x2 (i.e., assuming that X has HoV) and leaves all other parameters free. The fit is χ²(1) = 0.370 (p = 0.543), identical to the equivalent baseline model described above. This model closely reproduces the sample correlations (the parameter estimates are 0.452 and 0.469, versus the sample correlations 0.458 and 0.463). Moreover, a model adding the EVR restriction yields χ²(1) = 82.246, again identical to the equivalent regression model. Now if we set ρ₁ = ρ₂, the fit is χ²(2) = 0.453 (p = 0.797) and the model comparison test is χ²(1) = 0.083 (p = 0.773). Thus, there is moderation of slopes but not of correlations.

Continuous Moderators

Continuous moderators pose considerably greater challenges than categorical ones, because of the many forms that HeV and HeEV can take. Arnold (1982) sketched out a treatment of this problem that is not satisfactory, namely correlating correlations between X and Y with values of the continuous moderator Z. In an innovative paper, Allison et al. (1992) extended a standard approach to assessing heteroscedasticity to test for homologizers when the moderator variable, Z, is continuous. Their technique is simply to compute the correlation between Z and the absolute value of the residuals from the regression equation that already includes both the main effect for Z and the interaction term. This is a model of moderated error, akin to modeling error-variance, which is useful in itself but not equivalent to testing for EVR. In their approach and the simulations that tested it, Allison et al. assumed HeV for their predictor, thereby ignoring the fact that EVR can be violated even when HeEV is satisfied.

The approach proposed here generalizes the model defined by equations (7) and (11), with the z_i now permitted to be continuous. This model is

\begin{array}{l} \log (σ_{x}) = \sum_{i} z_{i} δ_{x i}, \\ \log (σ_{y}) = \sum_{i} z_{i} δ_{y i}, \\ \log (\frac{1 + ρ_{x y}}{1 - ρ_{x y}}) = \sum_{i} z_{i} δ_{r i}, \end{array}

(12)

where z₁ = 1 and for i > 1 the z_i are continuous random variables. The δ_xi, δ_yi, and δ_ri coefficients can be simultaneously estimated via maximum likelihood, using the likelihood function of a bivariate normal distribution conditioned by the z_i. Scripts for maximum likelihood estimation in R, SPSS, and SAS are available via the link cited earlier. This model can be made more flexible by introducing polynomial terms in the z_i, but we do not undertake that extension here.

To begin, simulations (20,000 runs each) for a single-moderator model took samples for Z from a N(0, 1) population. X and Y were sampled from bivariate normal distributions with δ_r1 = 0, δ_x1 = δ_y1 = {0, 0.5, 1.0}, and δ_r0 = {0, 0.5, 1.0}. Table 6 displays their results. Rejection rates are somewhat too high for δ_r1 but only slightly too high for θ₁ unless sample sizes are over 200 or so.

Table 6.

Unmoderated continuous moderator simulations.

N	δ_r0 = 0.0	δ_r0 = 0.5	δ_r0 = 1.0
δ_r1
70	0.0715	0.0712	0.0685
140	0.0619	0.0589	0.0571
280	0.0543	0.0554	0.0545
θ₁
70	0.0610	0.0627	0.0616
140	0.0548	0.0564	0.0556
280	0.0533	0.0536	0.0528

Open in a new tab

Simulations also were run under the same conditions as Table 6 but with samples from a skew-normal distribution with skew parameter λ = 2. These results are shown in Table 7. There, it can be seen that Type I error rates are inflated by skew almost independently of sample size, much more so for δ_r1 than θ₁. Both are affected by size of the correlation’s moderation effect.

Table 7.

Simulations from Azzalini distribution with λ = 2.

N	δ_r0 = 0.0	δ_r0 = 0.5	δ_r0 = 1.0
δ_r1
70	0.0724	0.0874	0.1001
140	0.0682	0.0801	0.0925
280	0.0665	0.0767	0.0930
θ₁
70	0.0554	0.0673	0.0679
140	0.0519	0.0689	0.0645
280	0.0514	0.0676	0.0636

Open in a new tab

To investigate power, simulations were run with δ_r1 = {0, 0.2007, 0.6190, 1.0986, 1.7346} (correlation differences of {0,0.1,0.3,0.5,0.7} when z = 1) and θ₁ = {0.1116, 0.2027, 0.3466, 0.5493, 0.6931, 0.8047} (variance ratios of {1.25, 1.5, 2, 3, 4, 5} when z = 1). Thus, there were 30 simulations for each of three sample sizes (70,140, and 280). The results are displayed in Figure 4. Power for θ₁ attains high levels even for moderate sample sizes when the variance ratio is 2 or more. However, power also is higher the more strongly correlations are moderated, whereas power for δ_r1 is unaffected by moderation of the variance ratio. Power for δ_r1 does not become high unless correlations differ by at least 0.3, and the results for a correlation difference of 0.1 are in line with those for categorical moderators (see Table 3).

The simulation results were examined for evidence of estimation bias. Both ${\hat{δ}}_{r 1}$ and θ₁ were slightly biased upward, and most strongly for smaller samples and larger effect-sizes. The maximum average bias for ${\hat{δ}}_{r 1}$ and ${\hat{θ}}_{1}$ was 0.04 and 0.03 respectively. For both estimators, doubling the sample size approximately halved the bias.

Discussion

This paper has introduced a simple test of equal variance ratios (EVR), whose purpose is to determine when moderation of correlations and slopes are not equivalent. The test can be inverted to produce an approximate confidence interval for the ratio comparison of these two kinds of moderator effects. This test also may be extended easily to assessing whether the moderation of standardized and unstandardized regression coefficients are unequal.

Simulation results indicated that when EVR holds, Type I error rates are reasonably accurate but slightly high. Skew inflates Type I error rates somewhat, but not dramatically. When EVR does not hold, moderately large samples and effect-sizes are needed for high power, but HeV, HeEV, and unequal group sizes are not problematic for testing EVR or modeling the moderation of variance ratios. There is evidence that the EVR test has fairly high power, relative to the power to detect moderator effects.

Variance ratios for continuous moderators can be modeled via maximum likelihood methods, although no single model can deal with all forms of variance ratio moderation or HeV. The model presented here uses the log link for the standard deviation submodel and the Fisher link for the correlation submodel, with possibly different predictors in each submodel and, potentially, polynomial terms for the predictors. Bayesian estimation methods also may be used, but that extension is beyond the scope of this paper. When EVR holds and correlations are unmoderated, Type I error rates are somewhat too high for δ_r1 and slightly too high for θ₁ unless sample sizes are over 200 or so. Skew inflates Type I error rates for δ_r1 but only slightly for θ₁. For moderated variance ratios and correlations, maximum likelihood estimates are only slightly upward-biased for both δ_r1 and θ₁, and in the usual fashion this bias decreases with increasing sample size. Moderately large samples and effect-sizes are needed for high power, but apparently no more so than for categorical moderators.

Tests of EVR for categorical moderators can be entirely dealt with using multi-groups SEM, and Mplus and the lavaan package in R are able to incorporate these tests via appropriate model comparisons. It also is possible to fit such models via scripts in computing environments such as SAS and SPSS possessing appropriate inbuilt optimizers. The SEM approach makes it possible to test complex hypotheses regarding the (non)equivalence of moderation of slopes and correlations, and to obtain a clear picture of both kinds of moderator effects. The online supplementary material for this paper includes a four-category moderator example where EVR holds for two pairs of categories but not for all four. In fact, the SEM approach elaborates conventional moderated regression into a combination of models for moderated slopes and moderated correlations. In principle it may be extended to incorporate tests for equality of tolerance across groups, which would enable modeling the moderation of semi-partial correlations.

All told, for categorical moderators the EVR test comes reasonably close to fulfilling the criteria of simplicity, availability, robustness and power. Considerable work remains to be done before the same can be said for continuous moderators. Nevertheless, the EVR test proposed here is highly relevant for both experimental and non-experimental research in mainstream psychology, and would seem to be a worthy addition to the researcher’s toolkit.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

I thank the two anonymous reviewers for their informative and helpful criticisms and suggestions regarding this paper. Any remaining errors or lacunae are my responsibility.

References

Alexander R. A., DeShon R. P. (1994). The effect of error variance heterogeneity on the power of tests for regression slope differences. Psychol. Bull. 115, 308–314 [Google Scholar]
Allison D. B., Heshka S., Pierson R. N. J., Wang J., Heymsfield S. B. (1992). The analysis and identification of homologizer/moderator variables when the moderator is continuous: an illustration with anthropometric data. Am. J. Hum. Biol. 4, 775–782 10.1002/ajhb.1310040609 [DOI] [PubMed] [Google Scholar]
Arnold H. J. (1982). Moderator variables: a clarification of conceptual, analytic, and psychometric issues. Organ. Behav. Hum. Perform. 29, 143–174 10.1016/0030-5073(82)90254-9 [DOI] [Google Scholar]
Arnold H. J. (1984). Testing moderator variable hypotheses: a reply to Stone and Hollenbeck. Organ. Behav. Hum. Perform. 34, 214–224 10.1016/0030-5073(84)90004-7 [DOI] [Google Scholar]
Azzalini A. (1985). A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 [Google Scholar]
Correa J. C., Iral R., Rojas L. (2006). Estudio de potencias de prueba de homogeneidad de varianza (a study on homogeneity of variance tests). Revista Colombiana de Estadistica 29, 57–76 [Google Scholar]
DeShon R. P. (2004). Measures are not invariant across groups without error variance homogeneity. Psychol. Sci. 46, 137–149 [Google Scholar]
DeShon R. P., Alexander R. A. (1996). Alternative procedures for testing regression slope homogeneity when group error variances are unequal. Psychol. Methods 1, 261–277 10.1037/1082-989X.1.3.261 [DOI] [Google Scholar]
Dretzke B. J., Levin J. R., Serlin R. C. (1982). Testing for regression homogeneity under variance heterogeneity. Psychol. Bull. 91, 376–383 10.1037/0033-2909.91.2.376 [DOI] [Google Scholar]
Grissom R. (2000). Heterogeneity of variance in clinical data. J. Consult. Clin. Psychol. 68, 155–165 10.1037/0022-006X.68.1.155 [DOI] [PubMed] [Google Scholar]
Howell D. (2007). Statistical Methods for Psychology, 6th Edn Belmont, CA: Thomson Wadsworth [Google Scholar]
Muthén L., Muthén B. (2010). Mplus User’s Guide, 6th Edn Los Angeles, CA: Muthén and Muthén [Google Scholar]
Preacher K. (2006). Testing complex correlational hypotheses with structural equations models. Struct. Equ. Modeling 13, 520–543 10.1207/s15328007sem1304_2 [DOI] [Google Scholar]
Rosseel Y. (2012). lavaan: Latent Variable Analysis. R package version 0.4–13. Available at: http://CRAN.R-project.org/package=lavaan
Schmider E., Ziegler M., Danay E., Beyer L., Bühner M. (2010). Is it really robust? reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology (Gott) 6, 147–151 [Google Scholar]
Sharma S., Durand R., Gur-Arie O. (1981). Identification and analysis of moderator variables. J. Mark. Res. 18, 291–300 10.2307/3150977 [DOI] [Google Scholar]
Stone E. F., Hollenbeck J. R. (1984). Some issues associated with the use of moderated regression. Organ. Behav. Hum. Perform. 34, 195–213 10.1016/0030-5073(84)90003-5 [DOI] [Google Scholar]
Stone E. F., Hollenbeck J. R. (1989). Clarifying some controversial issues surrounding statistical procedures for detecting moderator variables: empirical evidence and related matters. J. Appl. Psychol. 74, 3–10 10.1037/0021-9010.74.1.3 [DOI] [Google Scholar]
Tabachnick B., Fidell L. (2007). Experimental Design Using ANOVA. Belmont, CA: Duxbury [Google Scholar]
Vandenberg R. J., Lance C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organ. Res. Meth. 3, 4–69 10.1177/109442810031002 [DOI] [Google Scholar]
Zedeck S. (1971). Problems with the use of “moderator” variables. Psychol. Bull. 76, 295–310 10.1037/h0031543 [DOI] [Google Scholar]

[B1] Alexander R. A., DeShon R. P. (1994). The effect of error variance heterogeneity on the power of tests for regression slope differences. Psychol. Bull. 115, 308–314 [Google Scholar]

[B2] Allison D. B., Heshka S., Pierson R. N. J., Wang J., Heymsfield S. B. (1992). The analysis and identification of homologizer/moderator variables when the moderator is continuous: an illustration with anthropometric data. Am. J. Hum. Biol. 4, 775–782 10.1002/ajhb.1310040609 [DOI] [PubMed] [Google Scholar]

[B3] Arnold H. J. (1982). Moderator variables: a clarification of conceptual, analytic, and psychometric issues. Organ. Behav. Hum. Perform. 29, 143–174 10.1016/0030-5073(82)90254-9 [DOI] [Google Scholar]

[B4] Arnold H. J. (1984). Testing moderator variable hypotheses: a reply to Stone and Hollenbeck. Organ. Behav. Hum. Perform. 34, 214–224 10.1016/0030-5073(84)90004-7 [DOI] [Google Scholar]

[B5] Azzalini A. (1985). A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 [Google Scholar]

[B6] Correa J. C., Iral R., Rojas L. (2006). Estudio de potencias de prueba de homogeneidad de varianza (a study on homogeneity of variance tests). Revista Colombiana de Estadistica 29, 57–76 [Google Scholar]

[B7] DeShon R. P. (2004). Measures are not invariant across groups without error variance homogeneity. Psychol. Sci. 46, 137–149 [Google Scholar]

[B8] DeShon R. P., Alexander R. A. (1996). Alternative procedures for testing regression slope homogeneity when group error variances are unequal. Psychol. Methods 1, 261–277 10.1037/1082-989X.1.3.261 [DOI] [Google Scholar]

[B9] Dretzke B. J., Levin J. R., Serlin R. C. (1982). Testing for regression homogeneity under variance heterogeneity. Psychol. Bull. 91, 376–383 10.1037/0033-2909.91.2.376 [DOI] [Google Scholar]

[B10] Grissom R. (2000). Heterogeneity of variance in clinical data. J. Consult. Clin. Psychol. 68, 155–165 10.1037/0022-006X.68.1.155 [DOI] [PubMed] [Google Scholar]

[B11] Howell D. (2007). Statistical Methods for Psychology, 6th Edn Belmont, CA: Thomson Wadsworth [Google Scholar]

[B12] Muthén L., Muthén B. (2010). Mplus User’s Guide, 6th Edn Los Angeles, CA: Muthén and Muthén [Google Scholar]

[B13] Preacher K. (2006). Testing complex correlational hypotheses with structural equations models. Struct. Equ. Modeling 13, 520–543 10.1207/s15328007sem1304_2 [DOI] [Google Scholar]

[B14] Rosseel Y. (2012). lavaan: Latent Variable Analysis. R package version 0.4–13. Available at: http://CRAN.R-project.org/package=lavaan

[B15] Schmider E., Ziegler M., Danay E., Beyer L., Bühner M. (2010). Is it really robust? reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology (Gott) 6, 147–151 [Google Scholar]

[B16] Sharma S., Durand R., Gur-Arie O. (1981). Identification and analysis of moderator variables. J. Mark. Res. 18, 291–300 10.2307/3150977 [DOI] [Google Scholar]

[B17] Stone E. F., Hollenbeck J. R. (1984). Some issues associated with the use of moderated regression. Organ. Behav. Hum. Perform. 34, 195–213 10.1016/0030-5073(84)90003-5 [DOI] [Google Scholar]

[B18] Stone E. F., Hollenbeck J. R. (1989). Clarifying some controversial issues surrounding statistical procedures for detecting moderator variables: empirical evidence and related matters. J. Appl. Psychol. 74, 3–10 10.1037/0021-9010.74.1.3 [DOI] [Google Scholar]

[B19] Tabachnick B., Fidell L. (2007). Experimental Design Using ANOVA. Belmont, CA: Duxbury [Google Scholar]

[B20] Vandenberg R. J., Lance C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organ. Res. Meth. 3, 4–69 10.1177/109442810031002 [DOI] [Google Scholar]

[B21] Zedeck S. (1971). Problems with the use of “moderator” variables. Psychol. Bull. 76, 295–310 10.1037/h0031543 [DOI] [Google Scholar]

PERMALINK

A Simple Statistic for Comparing Moderation of Slopes and Correlations

Michael Smithson

Abstract

Introduction

A Test of EVR for Categorical Moderators