Abstract
Fit indices are descriptive measures that can help evaluate how well a confirmatory factor analysis (CFA) model fits a researcher’s data. In multigroup models, before between-group comparisons are made, fit indices may be used to evaluate measurement invariance by assessing the degree to which multiple groups’ data are consistent with increasingly constrained nested models. One such fit index is an adaptation of the root mean square error of approximation (RMSEA) called RMSEAD. This index embeds the chi-square and degree-of-freedom differences into a modified RMSEA formula. The present study comprehensively compared RMSEAD to ΔRMSEA, the difference between two RMSEA values associated with a comparison of nested models. The comparison consisted of both derivations as well as a population analysis using one-factor CFA models with features common to those found in practical research. The findings demonstrated that for the same model, RMSEAD will always have increased sensitivity relative to ΔRMSEA with an increasing number of indicator variables. The study also indicated that RMSEAD had increased ability to detect noninvariance relative to ΔRMSEA in one-factor models. For these reasons, when evaluating measurement invariance, RMSEAD is recommended instead of ΔRMSEA.
Keywords: measurement invariance, confirmatory factor analysis, RMSEA, fit index
When researchers are interested in assessing how observed variables are associated with hypothesized latent constructs, they may invoke a confirmatory factor analysis (CFA) model to specify the relations of interest. Before the magnitude of those associations can be investigated, however, it is necessary to evaluate how well the identified CFA model will fit the researcher’s data. In this vein, there have been numerous descriptive measures, known as fit indices, proposed to evaluate the correspondence between model and data. One of the most common is the root mean square error of approximation (RMSEA; Steiger & Lind, 1980), but there are numerous others, including, but not limited to, the comparative fit index (CFI; Bentler, 1990), the standardized root mean squared residual (SRMR; Bentler, 1995), and the Tucker–Lewis index (TLI; Bentler & Bonett, 1980; Tucker & Lewis, 1973). All of these indices evaluate model fit from complementary perspectives, incorporating aspects of parsimony and performance relative to competing models.
CFA models can also be extended to a multigroup (MG) framework, whereby a CFA model of similar, if not identical, structure can be fit simultaneously to data from participants from different populations. Such models may be analyzed for a variety of purposes, including testing differences in factor variances, covariances, and/or means. Before any between-group comparisons are made, however, it is important to determine the degree of measurement invariance across all groups. Indeed, doing so might even be the focus of the investigation, such as evaluating the extent to which an instrument’s items measure a given construct comparably across different populations. Whatever the ultimate purpose, measurement invariance is usually assessed using a multistep process where each subsequent step introduces more parameter (e.g., loading) equality constraints between groups (Millsap, 2012), and assessing changes in the overall model fit. Following Meredith (1993), the steps in this process typically include evaluating configural invariance (equivalent factor structure across groups), weak/metric invariance (equivalent loading magnitude across groups), and strong/scalar invariance (equivalent intercept magnitude across groups); less commonly in practice strict/residual invariance (equivalent residual variance across groups) may also be assessed. As such, measures of model fit specifically for MG models are required.
Fortunately, versions of fit indices in single-group CFA models exist in the multiple group context as well, where they can be used to evaluate the extent to which the multiple groups’ data are consistent with the increasingly restrictive models associated with typical steps in the measurement invariance assessment process. Perhaps most common are fit indices based upon RMSEA and CFI, and with a specific focus on MG model fit, the respective change-based indices ΔRMSEA and ΔCFI (Cheung & Rensvold, 2002). ΔRMSEA and ΔCFI are, not surprisingly, the differences between two RMSEA values and two CFI values, respectively, arising from the comparison of two nested models, as in the steps of the measurement invariance process. One critical concern of using ΔRMSEA and ΔCFI is that both indices can be insensitive to model misspecification, thus failing to detect noninvariance (Savalei et al., 2023). That is, if the two values being compared are similar, the difference between them could be excessively small and thus potentially mask misspecification introduced in a more restricted model.
An alternative fit index that can be used for evaluating measurement invariance is RMSEAD (Browne & du Toit, 1992; Savalei et al., 2023). As described in detail below, instead of taking the difference between RMSEA values from two models being compared, RMSEAD arises from an adaptation of a single RMSEA formula in which the chi-square and degree-of-freedom differences are embedded within a single index. RMSEAD was initially known as RDR (root deterioration per restriction; Browne & du Toit, 1992), although Browne, one of the index’s original creators, later recommended against its use due to its high sensitivity (MacCallum et al., 2006). Accordingly, the utilization of RMSEAD has been inconsistent in both the applied and methodological literatures (see Savalei et al., 2023). Recently, however, it has been reintroduced and even recommended in place of ΔRMSEA (Savalei et al., 2023) precisely because of its increased sensitivity (relative to ΔRMSEA), especially in assessing measurement invariance as illustrated using a series of real data invariance testing examples. Finally, while other analogs to RMSEAD exist, such as a CFID as an alternative to ΔCFI, Savalei et al. (2023) noted that these lack the increased sensitivity to misspecification offered by RMSEAD. For this reason, the current work will focus solely on RMSEAD.
The reintroduction of RMSEAD into the literature (see Savalei et al., 2023) has primarily included examples that compare the performance of RMSEAD and ΔRMSEA in the assessment of measurement invariance in previously published articles. To build upon these empirical examples and assess the performance of RMSEAD and its relation to the respective change-based index ΔRMSEA more systematically, the present study consists of both derivations and population analyses for one-factor CFA models; these serve as the basis for, and generalize to, multifactor CFA models. In the first part of the study, the derivations allow for the expression of one fit index in terms of the other for any potential MG one-factor CFA model, helping to understand the mechanics of performance differences between RMSEAD and ΔRMSEA as a function of number of indicators, pattern of noninvariance, and groups’ relative sample size. In the second part, population analyses serve to make the derivations more concrete by illustrating specific examples of MG one-factor CFA models that researchers might encounter in practice.
RMSEA for Single Groups and Multiple Groups
ΔRMSEA and RMSEAD are both based upon the RMSEA fit index. Returning to the single-group context, the sample RMSEA estimates the degree of model misfit within a population (Steiger & Lind, 1980). When a model is not a perfect representation of the population, its corresponding reference distribution is the noncentral χ2 distribution (given that other data assumptions are met), where the noncentrality parameter λ captures the degree of noncentrality within the population. When an SEM model is estimated with maximum likelihood (ML), the population RMSEA, RMSEApop, is
(1) |
where df refers to the model degrees of freedom and FML is the value of the ML discrepancy function when fit to the population data. For covariance structure models specifically,
(2) |
where denotes the model-implied variance/covariance matrix, denotes the observed variance/covariance matrix, and p corresponds to the number of observed variables. FML relates directly to the noncentrality of the expected χ2 distribution for fitting a model based on sample size N, such that (N− 1)FML=λ, and in turn FML=λ/(N− 1). Thus, the greater the model misfit, the higher the noncentrality in the population, and hence the higher the value of RMSEApop. For random samples from that population, the sample estimate for RMSEA in single-group models may be computed as
(3) |
where is the sample-based estimate of the ML discrepancy function, T is the model test statistic for the sample (following a χ2 distribution under standard assumptions), and T-df is an estimate of the noncentrality parameter ). When the degrees of freedom exceed the test statistic, RMSEA is set equal to zero.
Of direct relevance here, the formula for RMSEA can be extended to accommodate MG models. In the MG context, RMSEAMG estimates the degree of model misfit across all groups being modeled. At the population level, for G groups the RMSEAMG based on Steiger (1998) may be expressed as
(4) |
where N corresponds to the total sample size (i.e., for two groups N = n1+n2). This formula may also be adapted to estimate a sample RMSEAMG (i.e., for MG models) with
(5) |
As will be described below, RMSEAMG can be adapted to assess measurement invariance, the degree to which measurement properties of an instrument with respect to a latent trait are the same across different populations (Millsap, 2012).
Measurement Invariance
The multiple-group models of interest in the current study are those used for assessing measurement invariance. As mentioned earlier, the multistep invariance testing process typically involves the introduction of additional constraints at each step of the procedure, with each new set of constraints reflecting a more restrictive degree of parameter equality across groups. Models associated with each stage are then able to be compared. To start, as outlined by Meredith (1993), configural invariance is assessed in which all groups must hold the same pattern of free and fixed/zero factor loadings. Unacceptable fit in this stage precludes further assessment, given the failure to support a common model configuration within which further invariance would be evaluated. Given adequate fit, the next step assesses weak/metric invariance, in which factor loadings must be the same across groups, followed by the strong/scalar invariance step that adds equivalent intercepts across groups. Finally, albeit less common in practice, to assess strict/residual invariance, residual variances for each indicator must also be equal across groups. Below we discuss ways in which these models are compared from step to step to assess the degree of invariance for a given factor model.
χ2 Difference Test
The χ2 difference test (also known as the likelihood ratio test) computes the difference between two nested models’T statistics, ΔT. Under standard data assumptions, this difference itself follows a χ2 distribution, with degrees of freedom equal to the difference between the nested models’ degrees of freedom. The χ2 difference test assesses whether a more constrained model fits the data statistically significantly worse than a model without the additional constraints. For example, a model with weak/metric invariance may be compared with a model with configural invariance to test whether the constraints imposed by equating the corresponding factor loadings across groups will fit statistically significantly worse than a model with configural invariance only.
The problems of using the χ2 difference test for evaluating measurement invariance have been well noted in the literature (e.g., Cheung & Rensvold, 2002; Counsell et al., 2020; Kelloway, 1995), drawing from both the test’s logic and its sensitivity. First, as researchers proceed through the steps of testing for measurement invariance, they are looking for evidence of whether to select the more constrained model (i.e., selecting the model with more equality constraints over the model with fewer constraints). However, because the χ2 difference test detects whether the more constrained model is of statistically worse fit than the less constrained model, researchers often use a failure to reject the test’s null hypothesis to infer that the more constrained model is actually true in the population. The problem here is one of logic, as one can only state that the more constrained model did not show evidence of significantly degrading fit, but can never state that they have evidence that the more constrained model fits the data equally well. On other hand, when sample sizes are large, the problem is one of oversensitivity: even substantively trivial deviations from perfect fit can lead the χ2 difference test to be statistically significant. For these reasons, which mirror those when assessing single-group model fit, other indices are often sought to expand the evaluation of MG model fit.
ΔRMSEA
An alternate approach for assessing measurement invariance involves the use of fit index comparisons such as ΔRMSEA. To compute ΔRMSEA, RMSEA is calculated using the sample formula defined above for each of the nested models (e.g., a model with configural invariance is compared with a model with metric invariance),
(6) |
(7) |
where RMSEA1, and its associated T1 and df1, corresponds to the less restricted model and RMSEA2, and its associated T2 and df2, to the more constrained model. And just as there have been cutoffs proposed for evaluating whether an RMSEA value is evidence of good fit in a single-group SEM model (e.g., Hu & Bentler, 1999), so too have there been cutoffs proposed for ΔRMSEA in the context of measurement invariance. Chen (2007) presented recommendations for ΔRMSEA that depend on total sample size, pattern of noninvariance, and group sample size equality. Specifically, .010 was recommended for testing metric invariance with N≤ 300, with all factor loadings in one group established to be higher than in the other, and unequal group sample sizes. A cutoff of .015 was recommended for evaluating metric invariance with N > 300, with approximately half of the factor loadings established to be higher in one group and the other half higher in the other group, and equal sample sizes. Unfortunately, one of the problems with the use of ΔRMSEA is that a model with high initial degrees of freedom (i.e., reflected as a large denominator in the RMSEA formula) will mask misspecification when it is compared with a more constrained nested model (Savalei et al., 2023). Specifically, RMSEA values from the two models will be quite similar due to the dilution of misfit by a high number of df, causing ΔRMSEA to be overly small. Such large-df situations are indeed quite common, in particular, when a latent variable has relatively many indicators.
RMSEAD
An alternative to ΔRMSEA, RMSEAD, was initially presented by Browne and du Toit (1992) who introduced it as the RDR. They suggested that it be obtained by adapting the sample RMSEA formula presented above by changing the χ2 statistic and the degrees of freedom, that is,
(8) |
where df is the difference in the degrees of freedom between the nested models and T is the difference between the test statistics of the nested models. RMSEAD has also been extended into a MG framework, where the sample-based RMSEAD formula adapted by Dudgeon (2004) is
(9) |
Importantly, RMSEAD (for single and multiple groups) can perform differently from ΔRMSEA in many contexts. For example, because RMSEAD is not calculated as a difference between two (nested) models’ RMSEA values, RMSEAD will not become excessively small if those models have similar RMSEA values. Furthermore, when models have large degrees of freedom, sensitivity to misspecification may be masked in ΔRMSEA, but will not have the same effect on RMSEAD. As noted by Savalei et al. (2023), because RMSEAD is an adaptation of the RMSEA formula, it can be interpreted in RMSEA units. In contrast, because ΔRMSEA is simply a difference between two (nested) models’ RMSEA values, it does not share this interpretational advantage (indeed, it is for this reason that distinct cut-off values have been developed for ΔRMSEA and other change-based fit indices). This added feature of RMSEAD also allows for the construction of confidence intervals around this fit index, for single-group and MG models.
Relation Between ΔRMSEA and RMSEAD
To illustrate how these two important fit indices relate to one another, consider a scenario in which measurement invariance is being assessed for a one-factor CFA model across G = 2 populations. Assuming configural invariance holds, metric invariance must next be evaluated by comparing the fit of the configural (C) model to a metric (M) model in which the two populations are constrained to have the same factor loadings. For ΔRMSEA, the formula in this case would be
(10) |
Substituting the original RMSEApop formulas, and recognizing that because we have assumed configural invariance the fit of a configural model in the populations is perfect (i.e., = 0), we get
(11) |
Rearranging the ΔRMSEApop,M-C formula to solve for M,
(12) |
Next, the fit index RMSEAD for establishing metric invariance in the G = 2 population context can be defined as
(13) |
where and are the differences in the noncentrality parameters and degrees of freedom for population models assuming metric and configural invariance, respectively. However, because C = 0, reduces to M and thus RMSEApop,D,M-C can be simplified to
(14) |
It now follows that RMSEApop,D,M-C can be expressed in terms of ΔRMSEApop,M-C by substituting the above expression for M:
(15) |
(16) |
The formula above demonstrates that ΔRMSEApop,M-C (which equals RMSEApop,M) may be converted to RMSEApop,D,M-C simply through the square root of a ratio of degrees of freedom, assuming that the configural model is correct (and thus has a perfect fit in the population). Accordingly, the largest discrepancy between RMSEApop,D,M-C and ΔRMSEApop,M-C occurs when there is a large difference between and dfM.
The difference between and can be magnified in numerous situations. For instance, in one-factor models with p indicators, is equal to p–1 (i.e., the number of constraints on non-scale-referent loadings). In contrast, as the reader can easily derive, is p2–2p–1. Thus, with an increasing number of indicators, dfM scales quadratically, while only scales linearly; as such, larger values of p result in larger expected values of RMSEAD relative to ΔRMSEAM-C for the same model. (Note that these principles can also extend to other invariance testing phases as well, such as the comparison between models with metric and scalar invariance.)
Although the derivation above can be useful in demonstrating the differences between ΔRMSEA and RMSEAD generally, it may also be valuable to demonstrate actual differences between the two indices by employing examples similar to those found in practical research contexts. For this reason, we use the following population analyses to present several examples, showcasing how expected RMSEAD and ΔRMSEAM-C values are affected when different features of CFA models are manipulated (specifically, number of indicators, factor loading strength, and pattern of noninvariance). For the current focus on metric/weak invariance (i.e., not with intercepts as in scalar/strong invariance), population analysis allows for the specification of population covariance matrices along with group sample sizes, to evaluate some outcome of interest. In this case, we evaluated the values of ΔRMSEApop,M-C and RMSEApop,D.M-C across various patterns of noninvariance in two different populations. A population analysis is appropriate when sampling variability is not a focus in study (Bandalos & Leite, 2013), when the manipulation of distributions is not required, and/or when researchers are not directly interested in power and Type I error rates.
Method
Table 1 presents all the conditions explored in the population analysis. For the one-factor model examined, the primary features manipulated were number of indicators (p), sample size ratio between groups, and pattern of measurement noninvariance. We chose to keep the context focused on two groups because, in practice, social science studies typically compare one reference group to one focal group (see Putnick & Bornstein, 2016). The one-factor models in our investigation had p = 4, 8, or 12 indicator variables; we started with p = 4 given that it is the minimum number required for a one-factor model to be over-identified (e.g., Bollen, 1989). The ratio of sample sizes in the population analyses was either 1:1 or 2:1. We incorporated a condition where sample size was unequal across groups, because, similar to the simulation study conducted by French and Finch (2006) who incorporated a condition of unequal sample sizes, we aimed to reflect the real-world condition in which less data are able to be collected for a focal group of interest than a reference group.
Table 1.
Measurement Invariance Conditions for Population Analysis
p | IC | Group 1 standardized factor loadings | Group 2 standardized factor loadings |
---|---|---|---|
4 | 1 | .5, .5, .5, .5 | .5, .5, .5, .4 |
1 | .5, .5, .5, .5 | .5, .5, .5, .2 | |
1 | .5, .5, .5, .5 | .5, .5, .5, .6 | |
1 | .5, .5, .5, .5 | .5, .5, .5, .8 | |
2 | .5, .5, .5, .5 | .5, .4, .4, .4 | |
2 | .5, .5, .5, .5 | .5, .2, .2, .2 | |
2 | .5, .5, .5, .5 | .5, .6, .6, .6 | |
2 | .5, .5, .5, .5 | .5, .8, .8, .8 | |
3 | .5, .5, .5, .5 | .5, .2, .2, .8 | |
3 | .5, .5, .5, .5 | .5, .4, .4, .6 | |
3 | .5, .5, .5, .5 | .5, .2, .8, .8 | |
3 | .5, .5, .5, .5 | .5, .4, .6, .6 | |
8 | 1 | .5, .5, .5, .5, .5, .5, .5, .5, | .5, .5, .5, .5, .5, .5, .5, .4 |
1 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .2 | |
1 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .6 | |
1 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .8 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4. .4, .4, .4, .4 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .2, .2, .2, .2 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .6, .6, .6, .6, .6, .6, .6 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .8, .8, .8, .8, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .2, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4, .4, .6, .6, .6 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .8, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4, .6, .6, .6, .6 | |
12 | 1 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .4 |
1 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .2 | |
1 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .6 | |
1 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .8 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4, .4, .4, .4, .4, .4, .4, .4, .4 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .2, .2, .2, .2, .2, .2, .2, .2 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .6, .6, .6, .6, .6, .6, .6, .6, .6, .6, .6 | |
2 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .8, .8, .8, .8, .8, .8, .8, .8, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .2, .2, .2, .8, .8, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4, .4, .4, .4, .6, .6, .6, .6, .6 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .2, .2, .2, .2, .2, .8, .8, .8, .8, .8, .8 | |
3 | .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5, .5 | .5, .4, .4, .4, .4, .4, .6, .6, .6, .6, .6, .6 |
Note. p = number of observed variables; IC = measurement invariance condition where 1 = only the last loading is noninvariant; 2 = all but the first (scaling indicator) loading are noninvariant and homogeneous; 3 = all but the first (scaling indicator) loading are noninvariant and heterogeneous. Bold type denotes that the factor loading in group 2 is different from the factor loading in group 1.
Finally, we investigated three different patterns of noninvariance by manipulating the size of the loadings in the first group (the focal group), while keeping the loadings the same in the second group (the reference group) unchanged across all conditions. For each pattern of noninvariance examined, the first loading across both groups (with the factor and all variables in standardized metric) was always of identical magnitude (with a loading of 0.50) for scaling purposes. For the first pattern of noninvariance, all loadings except that of the last indicator (which had a loading of 0.20, 0.40, 0.60, or 0.80) were identical across focal and reference groups. For the second pattern of noninvariance, beyond the invariant scaling indicator, the magnitude of the remaining loadings in the focal group differed from the magnitude of the loadings in the reference group, with all nonscaling loadings in the focal group having the same magnitude (namely, 0.20, 0.40, 0.60, or 0.80). Finally, for the third pattern of noninvariance, the magnitude of the nonscaling loadings in the focal group also differed from the magnitude of those in the reference group, having one of two possible magnitudes (one higher than the loadings in the reference group, with loadings of 0.60 or 0.80 and one lower than the loadings in the reference group with loadings of 0.40 or 0.20). These three different patterns of noninvariance allowed us to use population analysis to determine both ΔRMSEApop,M-C and RMSEApop,D,M-C by imposing models with configural and metric invariance.
The population analysis was conducted using the lavaan package (Rosseel, 2012). When fitting all of the one-factor models, starting values were set at the true population parameters of the reference group.
Results
In the sections below, results are first described by the three different patterns of noninvariance examined where group sample sizes are equal. The last section describes the association between ΔRMSEApop,M-C and RMSEApop,D,M-C and relative sample size, discussing the difference in the results when the sample size ratio is 1:1 versus 2:1. Accordingly, Figure 1 and Table 2 present the results for the population analyses when the sample size ratio is 1:1, that is, n1 = n2 (note that proxy sample sizes of n1 = n2 = 500 for the 1:1 scenario and n1 = 500 and n2 = 250 for the 2:1 scenario were chosen in order for the software to run; however, these values were arbitrary and irrelevant to the computation of all population RMSEA-based values).
Figure 1.
Population Analysis Results With Sample Size Ratio of 1:1
Note. Pattern 1 = only the last loading is noninvariant; Pattern 2 = all but the first (scaling indicator) loading are noninvariant and homogeneous; Pattern 3 = all but the first (scaling indicator) loading are noninvariant and heterogeneous. RMSEA = root mean square error of approximation.
Table 2.
Population Analysis Results for 1:1 Sample Size Ratio
Pattern 1: Only the last loading is noninvariant | ||||
---|---|---|---|---|
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Last loading |
4 | .086 | .056 | .030 | 0.2 |
.027 | .018 | .009 | 0.4 | |
.025 | .016 | .009 | 0.6 | |
.068 | .044 | .023 | 0.8 | |
8 | .073 | .028 | .045 | 0.2 |
.024 | .009 | .015 | 0.4 | |
.023 | .009 | .014 | 0.6 | |
.067 | .026 | .041 | 0.8 | |
12 | .063 | .019 | .044 | 0.2 |
.021 | .006 | .015 | 0.4 | |
.020 | .006 | .014 | 0.6 | |
.060 | .018 | .041 | 0.8 | |
Pattern 2: All but the first (scaling indicator) loading are noninvariant and homogeneous | ||||
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Loadings |
4 | .049 | .032 | .017 | 0.2 |
.025 | .016 | .009 | 0.4 | |
.025 | .017 | .009 | 0.6 | |
.067 | .044 | .023 | 0.8 | |
8 | .054 | .021 | .033 | 0.2 |
.024 | .009 | .015 | 0.4 | |
.022 | .009 | .014 | 0.6 | |
.056 | .022 | .034 | 0.8 | |
12 | .053 | .016 | .037 | 0.2 |
.022 | .007 | .015 | 0.4 | |
.019 | .006 | .013 | 0.6 | |
.048 | .015 | .033 | 0.8 | |
Pattern 3: All but the first (scaling indicator) loading are noninvariant and heterogeneous | ||||
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Combination |
4 | .138 | .090 | .048 | 0.2(2), 0.8(1) |
.048 | .032 | .017 | 0.4(2), 0.6(1) | |
.143 | .094 | .049 | 0.2(1), 0.8(2) | |
.050 | .033 | .017 | 0.4(1), 0.6(2) | |
8 | .197 | .076 | .121 | 0.2(4), 0.8(3) |
.066 | .026 | .041 | 0.4(4), 0.6(3) | |
.185 | .071 | .113 | 0.2(3), 0.8(4) | |
.065 | .025 | .040 | 0.4(3), 0.6(4) | |
12 | .209 | .064 | .146 | 0.2(6), 0.8(5) |
.071 | .022 | .050 | 0.4(6), 0.6(5) | |
.200 | .061 | .139 | 0.2(5), 0.8(6) | |
.071 | .021 | .049 | 0.4(5), 0.6(6) |
Note. RMSEA = root mean square error of approximation.
Pattern 1: Only the Last Loading Is Noninvariant
For this pattern of noninvariance, the first p-1 loadings were identical between the reference and focal groups (i.e., 0.50), while the pth loading in the focal group was 0.20, 0.40, 0.60, or 0.80 (vs. the reference group loading of 0.50). In all instances, RMSEApop,D,M-C was larger than ΔRMSEApop,M-C, with the difference being largest when the last loading of the focal group was most noninvariant (i.e., 0.20 or 0.80 as opposed to 0.40 or 0.60). With p = 4 indicators, for example, when the last loading in the focal group was 0.20 the difference between the indices was .030 (with ΔRMSEApop,M-C = .056 and RMSEApop,D,M-C = .086), whereas when the last focal group loading was 0.40, the difference between the two indices was .009 (with ΔRMSEApop,M-C = .018 and RMSEApop,D,M-C = .027).
As the number of indicators p increased, ΔRMSEApop,M-C was observed to decrease in magnitude. In contrast, RMSEApop,D,M-C would also decrease with an increasing number of indicators, but decreases with smaller in magnitude than those of ΔRMSEApop,M-C. For instance, with a loading size of 0.80 and p = 4 indicators, ΔRMSEApop,M-C and RMSEApop,D,M-C were .044 and .068, respectively, whereas with p = 8, ΔRMSEApop,M-C dropped to .026 and RMSEApop,D,M-C held fairly steady at .067. Overall, this pattern of larger decreases in magnitude for ΔRMSEApop,M-C relative to RMSEApop,D,M-C led to the difference between the two indices being amplified with increasing number of indicators. This result is also illustrated in the first column of Figure 1. Moving down the column (i.e., increasing the number of indicators) highlights the different patterns between the two indices. Specifically, the line corresponding to RMSEApop,D,M-C maintains a V-shape with an increasing number of indicators. In contrast, the line corresponding to ΔRMSEApop,M-C has a V-shape that is flatter than the one associated with RMSEApop,D,M-C at p = 4, and becomes flatter still with an increasing number of variables. That is, as p increased ΔRMSEApop,M-C became less sensitive to detecting the single indicator’s noninvariance amid the larger number of invariant loadings, whereas RMSEApop,D,M-C tended to remain more sensitive, with this differentiation generally being most prominent when the last loading of the focal group was more noninvariant (0.20 or 0.80) rather than less so (0.40 or 0.60).
Pattern 2: All But the First (Scaling Indicator) Loading Are Noninvariant and Homogeneous
In this pattern of noninvariance, beyond the invariant scaling indicator (which was again set to 0.50), the remaining loadings in the focal group differed identically from the loadings in the reference group, all being 0.20, 0.40, 0.60, or 0.80. When there was a greater difference in the loading size between the focal and reference groups’ 0.50 loading and the loadings of the focal group (0.20 or 0.80 rather than 0.40 or 0.60), values of both ΔRMSEApop,M-C and RMSEApop,D,M-C were generally larger, as expected. Furthermore, like in the first pattern of noninvariance, RMSEApop,D,M-C was always larger than ΔRMSEApop,M-C, with the difference being more prominent when there was a greater difference in the size of the loadings between the reference and focal groups. For instance, with p = 4 indicators and a focal loading size of 0.20, ΔRMSEApop,M-C and RMSEApop,D,M-C were .032 and .049, respectively (with a difference of 0.17), whereas when the loading size was 0.40 ΔRMSEApop,M-C dropped to .016 and RMSEApop,D,M-C dropped to .025 (a difference of .009). Furthermore, similar to the first pattern, ΔRMSEApop,M-C decreased as the number of indicators increased, while RMSEApop,D,M-C would often, but not always, decrease with an increasing number of indicators, and those decreases were often smaller in magnitude than those of ΔRMSEApop,M-C. This pattern can be viewed in the second column of Figure 1, where, like in the first pattern, the line associated with RMSEApop,D,M-C maintains a V-shape as the number of indicators increases. In contrast, ΔRMSEApop,M-C has a flatter V-shape than RMSEApop,D,M-C given p = 4, and continues to flatten with an increasing number of indicators. For example, when the focal group’s noninvariant loadings were 0.40, as p increased from 4 to 8 the ΔRMSEApop,M-C decreased from .016 to .009, while RMSEApop,D,M-C decreased from .025 to 024. Accordingly, with an increasing number of indicators, the difference between both indices was generally magnified.
Pattern 3: All But the First (Scaling Indicator) Loading Are Noninvariant and Heterogeneous
For this pattern of noninvariance, like in Pattern 2, the value of the loadings in the focal group (other than the first loading) differed from the value of the loadings in the reference group. However, for this pattern, the nonscaling loadings had one of two possible magnitude combinations: (a) 0.20 and 0.80 or (b) 0.40 and 0.60. In all instances, the highest values for both ΔRMSEApop,M-C and RMSEApop,D,M-C resulted when the focal group’s loadings were a combination of 0.20 and 0.80 (rather than 0.40 and 0.60). This result is anticipated given that 0.20 and 0.80 are farther in magnitude from 0.50 (the loadings of the reference group) than 0.40 and 0.60.
Like the two other patterns of noninvariance examined, values of RMSEApop,D,M-C were always higher than values of ΔRMSEApop,M-C. This difference was again more prominent when the loadings between the reference and focal groups were farther apart (here, in loading combinations of 0.20 and 0.80 rather than combinations of 0.40 and 0.60). For example, with p = 4, and two loadings set to 0.20 and one loading set to 0.80, the difference between ΔRMSEApop,M-C and RMSEApop,D,M-C was .048, with the values of the fit indices being .090 and .138, respectively. Changing the loading configuration such that two loadings were set to 0.40 and one loading was set to 0.60 (with p = 4) resulted in a .017 differences between the indices, where ΔRMSEApop,M-C and RMSEApop,D,M-C were equal to .032 and .048, respectively.
Once again, ΔRMSEApop,M-C decreased as the number of indicators increased. In contrast, RMSEApop,D,M-C increased with an increasing number of indicators, leading to the difference between both indices being greatly amplified with an increasing number of indicators. For example, with p = 4 and the combination of standardized factor loadings of two 0.20s and one 0.80, ΔRMSEApop,M-C and RMSEApop,D,M-C were equal to .090 and .138, respectively. When p increased to 8, and the combination of factor loadings was four loadings of 0.20 and three loadings of 0.80, ΔRMSEApop,M-C and RMSEApop,D,M-C were equal to .076 and .197, respectively. The difference between the fit indices was most obvious when the focal group had loading combinations of 0.20 and 0.80, as opposed to combinations of 0.40 and 0.60.
Effect of Sample Size Ratio on Findings
Regardless of the specific sample size selected for a population analysis, when group sample sizes are equal (i.e., 1:1), RMSEA-based fit indices like those studied here will not change. This is because, although each group’s unique value can be obtained by multiplying the group’s associated discrepancy function by its sample size, and the total value associated with the MG model is obtained by summing the groups’ values, both ΔRMSEApop,M-C and RMSEApop,D,M-C require that the multisample be divided by the total sample size. Thus, in the aggregate represented by the MG fit indices, groups’ contributions are weighted not by sample size, but by sample size proportion. As such, all 1:1 sample size ratios will yield the same population fit, all 2:1 ratios will yield the same population fit (although typically different from 1:1 ratios), and so on. Table 3 illustrates the values of both fit indices when the sample size ratio was 2:1. Generally, the findings for the 2:1 sample size ratio had similar patterns to the results when groups were of equal size. In all conditions, RMSEApop,D,M-C was larger in magnitude than ΔRMSEApop,M-C. For all patterns of invariance explored, the difference between the two fit indices was most evident when the loading differences between the reference and focal groups were greater (e.g., 0.20 and 0.80, rather than 0.40 and 0.60). Also, across all patterns of noninvariance, with an increasing number of indicators, values of ΔRMSEApop,M-C decreased; that is, it became less sensitive to detecting the target loading’s noninvariance in the context of increasing numbers of invariant loadings. Meanwhile, with an increasing number of indicators, values of RMSEApop,D,M-C consistently increased under the third pattern of noninvariance (all but the first loading are noninvariant and heterogeneous), and were largely decreasing across the other two patterns (with decreases tending to be smaller in magnitude than ΔRMSEApop,M-C). That is, with increasing numbers of invariant indicators, RMSEApop,D,M-C did not illustrate a consistent pattern of decreased sensitivity as ΔRMSEApop,M-C so clearly did.
Table 3.
Population Analysis Results for 2:1 Sample Size Ratio
Pattern 1: Only the last loading is noninvariant | ||||
---|---|---|---|---|
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Last loading |
4 | .079 | .052 | .027 | 0.2 |
4 | .025 | .016 | .009 | 0.4 |
4 | .024 | .015 | .008 | 0.6 |
4 | .066 | .043 | .023 | 0.8 |
8 | .069 | .027 | .042 | 0.2 |
8 | .023 | .009 | .014 | 0.4 |
8 | .022 | .008 | .013 | 0.6 |
8 | .063 | .024 | .039 | 0.8 |
12 | .060 | .018 | .041 | 0.2 |
12 | .020 | .006 | .014 | 0.4 |
12 | .019 | .006 | .013 | 0.6 |
12 | .056 | .017 | .039 | 0.8 |
Pattern 2: All but the first (scaling indicator) loading are noninvariant and homogeneous | ||||
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Loadings |
4 | .041 | .027 | .014 | 0.2 |
4 | .023 | .015 | .008 | 0.4 |
4 | .025 | .016 | .009 | 0.6 |
4 | .070 | .046 | .024 | 0.8 |
8 | .045 | .017 | .028 | 0.2 |
8 | .022 | .008 | .013 | 0.4 |
8 | .022 | .008 | .013 | 0.6 |
8 | .058 | .022 | .036 | 0.8 |
12 | .044 | .013 | .031 | 0.2 |
12 | .020 | .006 | .014 | 0.4 |
12 | .019 | .006 | .013 | 0.6 |
12 | .050 | .015 | .034 | 0.8 |
Pattern 3: All but the first (scaling indicator) loading are noninvariant and heterogeneous | ||||
p | RMSEApop,D,M-C | ΔRMSEApop,M-C | RMSEApop,D,M-C - ΔRMSEApop,M-C | Combination |
4 | .122 | .080 | .042 | 0.2(2), 0.8(1) |
4 | .045 | .029 | .016 | 0.4(2), 0.6(1) |
4 | .144 | .094 | .050 | 0.2(1), 0.8(2) |
4 | .048 | .031 | .017 | 0.4(1), 0.6(2) |
8 | .186 | .072 | .114 | 0.2(4), 0.8(3) |
8 | .062 | .024 | .038 | 0.4(4), 0.6(3) |
8 | .184 | .071 | .113 | 0.2(3), 0.8(4) |
8 | .062 | .024 | .038 | 0.4(3), 0.6(4) |
12 | .201 | .061 | .140 | 0.2(6), 0.8(5) |
12 | .067 | .020 | .047 | 0.4(6), 0.6(5) |
12 | .197 | .060 | .137 | 0.2(5), 0.8(6) |
12 | .067 | .020 | .047 | 0.4(5), 0.6(6) |
Note. RMSEA = root mean square error of approximation.
Summary and Conclusion
There are various fit indices that exist to evaluate measurement invariance in MG models. One such example is ΔRMSEA, which takes the difference between RMSEA values corresponding to two nested models. ΔRMSEA has been critiqued for its lack of sensitivity which can occur, for example, when a model with high initial degrees of freedom is compared with a nested, more constrained model (Savalei et al., 2023). An alternative to ΔRMSEA is RMSEAD. RMSEAD inserts the differences between the chi-squares and degrees of freedom associated with both nested models into an adapted RMSEA formula (Browne & du Toit, 1992; Savalei et al., 2023). RMSEAD was recently reintroduced into the literature and recommended in place of ΔRMSEA, due to its increased sensitivity (Savalei et al., 2023).
To evaluate the performance of RMSEAD and compare it with ΔRMSEA, the present study employed both derivations and a population analysis of one-factor models. The derivations illustrated how, given two nested models (one imposing configural invariance and one imposing metric invariance), an increasing number of indicators always leads to greater values of RMSEAD relative to ΔRMSEAM-C for the same model. The population analysis illustrated how RMSEApop,D,M-C had increased sensitivity relative to ΔRMSEApop,M-C in one-factor models with features similar to those found in practical research contexts. Specifically, values of RMSEApop,D,M-C were always greater than ΔRMSEApop,M-C values, the former more effective at detecting misspecification, especially when patterns of noninvariance were more extreme (greater differences between the loadings of the groups) and when there was a greater number of indicators. Accordingly, RMSEApop,D,M-C was more well-attuned to detecting greater misspecifications in noninvariance than ΔRMSEApop,M-C.
To aid researchers in gauging whether a given value of ΔRMSEA is consistent with noninvariance, cutoffs such as .010 or .015 have been recommended for ΔRMSEA (the particular choice of cutoff depended on relative group size, sample size, and pattern of noninvariance, see Chen, 2007). As can be seen from the results of the population analysis, there were many instances where ΔRMSEApop,M-C fell below the .010 threshold. This occurred in tandem with RMSEApop,D,M-C being greater in magnitude, always more sensitive in detecting misspecification across the MG models. 1
The increased sensitivity of RMSEAD relative to ΔRMSEA can be explained by the relative dilution of misfit across degrees of freedom in the nested models being compared. For example, in a scenario where a model has a high number of initial degrees of freedom, the degrees of freedom associated with RMSEAD will still be consistent with how many equality constraints are imposed by invoking a more restricted model. In contrast, ΔRMSEA will have misspecification diluted across a greater number of degrees of freedom, leading to its value being lower than that of RMSEAD.
Accordingly, both the derivations and the population analysis focused upon metric invariance, keeping equality constraints at the level of the factor loadings, without increasing constraints to the level of the intercepts. This was done, not because imposing scalar invariance is unimportant, but rather because the relative sensitivity of both indices can be illustrated without needing to evaluate measurement invariance at the level of the intercepts. In other words, it is possible to demonstrate the increased sensitivity of RMSEAD relative to ΔRMSEA by simply focusing on assessing metric invariance, given that the relative dilution of misspecification across degrees of freedom will lead to greater values of RMSEAD relative to ΔRMSEA regardless of whether loadings, intercepts, or residual variances are being evaluated for invariance. Thus, these findings should be directly applicable not only for scenarios where researchers are only interested in evaluating metric invariance, but in all steps of evaluating invariance, where these findings provide a foundation upon which the subsequent stages of invariance testing will only magnify the differences between both indices. Furthermore, although the population analysis only used one-factor models, the same principles can be extended to multifactor models, including longitudinal models, where the factors occur at different time points.
In sum, RMSEAD was reintroduced into the literature due to its increased sensitivity to detect patterns of misspecification across MG models (see Savalei et al., 2023). This study employed both derivations and a population analysis to illustrate how the index is better at distinguishing various kinds of noninvariance than ΔRMSEA in different contexts. Due to its increased sensitivity, RMSEAD is recommended over ΔRMSEA for evaluating measurement invariance.
In both the derivation and the population analysis, we have assumed that the fit of the configural model was perfect ( = 0). When the configural model is not of perfect fit, there are two scenarios that may result. First, will be small and have little effect on the results. Second, will be large and the assessment of measurement invariance would end at the configural model, making further evaluation is unnecessary.
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The authors received no financial support for the research, authorship, and/or publication of this article.
ORCID iD: Nataly Beribisky
https://orcid.org/0000-0002-1081-0125
References
- Bandalos D. L., Leite W. (2013). The use of Monte Carlo studies in structural equation modeling research. In Hancock G. R., Mueller R. O. (Eds.), Structural equation modeling: A second course (2nd ed., pp. 625–666). Information Age. [Google Scholar]
- Bentler P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246. 10.1037/0033-2909.107.2.238 [DOI] [PubMed] [Google Scholar]
- Bentler P. M. (1995). EQS structural equations program manual. Multivariate Software. [Google Scholar]
- Bentler P. M., Bonett D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588–606. 10.1037/0033-2909.88.3.588 [DOI] [Google Scholar]
- Bollen K. A. (1989). Structural equations with latent variables. Wiley. [Google Scholar]
- Browne M. W., du Toit S. H. (1992). Automated fitting of nonstandard models. Multivariate. Behavioral Research, 27(2), 269–300. 10.1207/s15327906mbr2702_13 [DOI] [PubMed] [Google Scholar]
- Chen F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. 10.1080/10705510701301834 [DOI] [Google Scholar]
- Cheung G. W., Rensvold R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255. 10.1207/S15328007SEM0902_5 [DOI] [Google Scholar]
- Counsell A., Cribbie R. A., Flora D. B. (2020). Evaluating equivalence testing methods for measurement invariance. Multivariate Behavioral Research, 55(2), 312–328. 10.1080/00273171.2019.1633617 [DOI] [PubMed] [Google Scholar]
- Dudgeon P. (2004). A note on extending Steiger’s (1998) multiple sample RMSEA adjustment to other noncentrality parameter-based statistics. Structural Equation Modeling: A Multidisciplinary Journal, 11(3), 305–319. 10.1207/s15328007sem1103_1 [DOI] [Google Scholar]
- French B. F., Finch W. H. (2006). Confirmatory factor analytic procedures for the determination of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 13(3), 378–402. 10.1207/s15328007sem1303_3 [DOI] [Google Scholar]
- Hu L. T., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. 10.1080/10705519909540118 [DOI] [Google Scholar]
- Kelloway E. K. (1995). Structural equation modelling in perspective. Journal of Organizational Behavior, 16(3), 215–224. 10.1002/job.4030160304 [DOI] [Google Scholar]
- MacCallum R. C., Browne M. W., Cai L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. 10.1037/1082-989X.11.1.19 [DOI] [PubMed] [Google Scholar]
- Meredith W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. 10.1007/BF02294825 [DOI] [Google Scholar]
- Millsap R. E. (2012). Statistical approaches to measurement invariance. Routledge. [Google Scholar]
- Putnick D. L., Bornstein M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental. 10.1016/j.dr.2016.06.004 [DOI] [PMC free article] [PubMed]
- Rosseel Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. 10.18637/jss.v048.i02 [DOI] [Google Scholar]
- Savalei V., Brace J. C., Fouladi R. T. (2023). We need to change how we compute RMSEA for nested model comparisons in structural equation modeling. Psychological Methods. Advance online publication. 10.1037/met0000537 [DOI] [PubMed]
- Steiger J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5, 411–419. 10.1080/10705519809540115 [DOI] [Google Scholar]
- Steiger J. H., Lind J. C. (1980, May). Statistically-based tests for the number of common factors [Paper presentation]. Annual Meeting of the Psychometric Society: IMPS 1980, Iowa City, IA, United States. [Google Scholar]
- Tucker L. R., Lewis C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38(1), 1–10. 10.1007/BF02291170 [DOI] [Google Scholar]