Skip to main content
Educational and Psychological Measurement logoLink to Educational and Psychological Measurement
. 2020 May 28;81(1):61–89. doi: 10.1177/0013164420925122

Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling

Yan Wang 1,, Eunsook Kim 2, John M Ferron 2, Robert F Dedrick 2, Tony X Tan 2, Stephen Stark 2
PMCID: PMC7797957  PMID: 33456062

Abstract

Factor mixture modeling (FMM) has been increasingly used to investigate unobserved population heterogeneity. This study examined the issue of covariate effects with FMM in the context of measurement invariance testing. Specifically, the impact of excluding and misspecifying covariate effects on measurement invariance testing and class enumeration was investigated via Monte Carlo simulations. Data were generated based on FMM models with (1) a zero covariate effect, (2) a covariate effect on the latent class variable, and (3) covariate effects on both the latent class variable and the factor. For each population model, different analysis models that excluded or misspecified covariate effects were fitted. Results highlighted the importance of including proper covariates in measurement invariance testing and evidenced the utility of a model comparison approach in searching for the correct specification of covariate effects and the level of measurement invariance. This approach was demonstrated using an empirical data set. Implications for methodological and applied research are discussed.

Keywords: factor mixture modeling, measurement invariance, covariate effect, class enumeration, model selection

Introduction

Factor mixture modeling (FMM) has been increasingly used in health and social sciences over recent years. FMM incorporates both categorical and continuous latent variables and allows for the examination of unobserved population heterogeneity in the measurement model as well as structural parameters. The categorical latent variable in FMM is often called the latent class variable. Recent applications of FMM included the identification of latent classes based on anxiety sensitivity (Allan et al., 2014), posttraumatic stress disorder symptoms (Frost et al., 2019), job stressors and resources (Keller et al., 2016), just to name a few. For example, Allan et al. (2014) used FMM to identify latent classes of anxiety sensitivity with three factors, cognitive concerns, physical concerns, and social concerns. Three latent classes were identified based on factor scores: high anxiety sensitivity, moderate anxiety sensitivity, and low anxiety sensitivity, where the high anxiety sensitivity class had relatively higher means across all three factors than the other two classes.

Although the interpretation of latent classes in applied research using FMM is often based on factor means, the importance of considering and testing measurement invariance (MI) across classes has been highlighted from two perspectives. First, similar to observed groups (e.g., males and females), statistical comparison of factor means across classes is not valid when MI does not hold, because the factors measured differently across classes are not meaningfully comparable. Specifically, scalar invariance which entails the equivalence of both factor loadings and intercepts across classes is often considered as a sufficient prerequisite to valid factor mean comparison. Therefore, it is required to test MI if comparison of factor means across classes is of focus interest in applied research (e.g., E. S. Kim et al., 2017; Lubke & Muthén, 2005). Second, the specification of measurement parameters including factor loadings and intercepts is intrinsic to the model building process of FMM (Clark et al., 2013). When one works with FMM, decisions have to be made regarding the level of restrictions that is suitable in the measurement model, because different levels of restrictions have implications on determining the optimal number of classes (Bauer & Curran, 2004). For instance, spurious classes might emerge when factor loadings are falsely constrained to be equal across classes. Therefore, from the perspective of model specification as well as factor mean comparisons, MI testing is indispensable to FMM.

This study investigated the performance of MI testing in FMM under a specific scenario, the presence of covariate effects. The inclusion of covariates in FMM and mixture modeling in general has drawn the attention of many researchers. Specifically, given the nature of mixture modeling that classes are unobserved, predictors of the latent class variable are often included to better understand the formation and characterization of classes. For instance, teachers with prior teaching experience were more likely to be in a high-performing class than those without prior teaching experience (Dimitrov et al., 2015). In addition, the covariate effect on the factor can be included in FMM to explain within-class variations in the factor (e.g., students’ gender and urban status explained the variations in their math and science achievements within each class; Lubke & Muthén, 2005).

Despite the usefulness of covariates in mixture modeling, there have been different opinions about whether to include covariate(s) or not in class enumeration. One general recommendation is to exclude covariates in the class enumeration of mixture modeling (E. S. Kim et al., 2017; E. S. Kim & Wang, 2018; M. Kim et al., 2016; Nylund-Gibson & Masyn, 2016; Peugh & Fan, 2014; Stegmann & Grimm, 2018; Tofighi & Enders, 2008). This unconditional approach has been shown to lead to accurate class enumeration and/or class assignment. An additional advantage of this approach is that it can prevent the change of classification when different covariates are included or different ways of specifying covariate effects are taken (Nylund-Gibson & Masyn, 2016; Vermunt, 2010). However, the advantage of including covariate effect on the latent class variable has been highlighted by other simulation studies. For example, Lubke and Muthén (2007) observed that including the covariate effect improved the correct class assignment in FMM. Maij-de Meij et al. (2010), through Monte Carlo simulations with mixture Rasch model, also recommended including covariate effect on the latent class variable even if the covariate was weakly correlated with the latent class variable.

Despite the considerable discussions on covariate inclusion in mixture modeling, little is known about the role of covariates in FMM when the focus is to test MI across classes. That is, it is unclear whether covariates should be included in MI testing and if yes, which covariate effects should be included. To investigate the impact of covariate effects on MI testing in FMM, a Monte Carlo simulation study was conducted where the identification of the correct level of MI was examined under different specifications of covariate effects: excluding covariates, including a covariate effect on the latent class variable, and including covariate effects on the latent class variable and the factor.

The rest of the article is organized as follows. FMM and MI testing across classes is introduced foremost, followed by a discussion of different covariate effects and their implications in FMM. Next, extant literature on the inclusion of covariate effects in FMM is reviewed. After that, we investigated the impact of various specifications of covariate effects on MI testing with simulated data. MI testing across classes in FMM with different specifications of covariate effects was then demonstrated using an empirical data set. The article is concluded with recommendations for practitioners and implications for future methodological research.

Factor Mixture Modeling and Measurement Invariance Testing

FMM (see Figure 1) combines latent class analysis and a measurement model (Lubke & Muthén, 2005). The measurement model can be represented as

Figure 1.

Figure 1.

An example of factor mixture model (FMM). C is the latent class variable and η is the latent factor measured by six continuous items Y1 to Y6. Error terms associated with η and Y1 to Y6 are left out for simplicity.

Yik=τk+Λkηik+εik, (1)

where Yik is a J× 1 vector of item responses for an individual i in class k (k = 1, 2, . . ., K). J denotes the number of items. Yik is a function of the intercept vector τk (dimension J× 1), factor loading vector Λk (dimension J× 1 assuming a single factor), individual’s factor score ηik, and the residual vector εik (dimension J× 1). The subscript k associated with those parameters indicates that they can vary across latent classes. Homogeneity within each class is assumed so that the same measurement parameters, such as intercepts and factor loadings, apply to all individuals within the class. Residuals (εik) are assumed to be normally distributed with a mean of zero and variance of Θk within class. It is also assumed that ηik~N(αk,Φk). Thus the class-specific mean vectors and class-specific variance–covariance matrices can be expressed as

μk=τk+Λkαk, (2)
Σk=ΛkΦkΛk+Θk. (3)

In FMM, MI is tested across latent classes by constructing and comparing models with increasing equality restrictions on measurement parameters. Specifically, in the configural invariance model the same factor structure is fitted across latent classes but factor loadings and intercepts are free to vary across latent classes, which is expressed by Equation 1. In the metric invariance model, factor loadings are equal across latent classes, suggesting Yik=τk+Ληik+εik. An additional constraint of intercept equality across latent classes is imposed in the scalar invariance model, Yik=τ+Ληik+εik. Note that residual variances (Θk) are free to vary across classes in the configural, metric, and scalar invariance models.

One issue particular to MI testing in FMM is that because classes are not defined a priori, the task of determining the optimal number of classes is entangled with MI testing. Two approaches have been adopted in the methodological and substantive literature to address this issue. That is, with a sequential approach, the number of classes is first determined with measurement (loadings, intercepts/thresholds) and structural parameters (factor means, factor variances/covariances) relaxed across classes and then MI is tested across classes (E. S. Kim et al., 2016; Tay et al., 2011). Alternatively, the number of classes and the level of invariance can be determined simultaneously (Clark et al., 2013; E. S. Kim et al., 2017; Lubke & Muthén, 2005; Lubke & Neale, 2008). With this simultaneous approach, a series of models are compared, including one-class, two-class configural, two-class metric, two-class scalar, three-class configural, three-class metric, three-class scalar, and so on. The simultaneous approach was adopted in this study given that it evaluates model fit more comprehensively and does not rely on the convergence of relaxed models as much as the sequential approach. However, methodological studies are warranted to systematically evaluate the performance of these two approaches to MI testing.

Fitted models can be compared using likelihood-based tests, such as the Lo–Mendell–Rubin test (LMR; Lo et al., 2001), the adjusted LMR (aLMR; Lo et al., 2001), and the bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000). These tests compare the fit of K and K+ 1 classes and nonsignificant p values (i.e., p > .05) would indicate that the model with K classes has a better fit. Information criteria (ICs) are also commonly used in model comparisons, including Akaike information criterion (AIC; Akaike, 1974), consistent AIC (cAIC; Bozdogan, 1987), Bayesian information criterion (BIC; Schwarz, 1978), and sample size–adjusted BIC (saBIC; Sclove, 1987). Smaller values of ICs indicate better model fit. Alternatively, entropy that indexes the degree of class separation (ranging from 0 to 1) can also be used in model selection such that models with higher entropy values are more desirable. Additionally, a k-fold cross-validation procedure to model selection has been proposed (Grimm et al., 2016) and interested readers can refer to the reference for more details.

Factor Mixture Modeling With Covariate Effects

Different covariate effects can be modeled in FMM as shown in Figure 2, including the covariate effects on the latent class variable and the factor. This section introduces each covariate effect and discusses its rationale and implications.

Figure 2.

Figure 2.

A generic factor mixture model (FMM) with covariate effects. X denotes the covariate; C is the latent class variable; η is the factor measured by six continuous items Y1 to Y6. Error terms associated with η and Y1 to Y6 are left out for simplicity.

Covariate Effect on Latent Class

Covariate(s) can be included in FMM to explain the latent class membership (see Path 1 in Figure 2; e.g., Bernstein et al., 2013; Elhai et al., 2011). Thus, the probability of belonging to latent class k compared with the reference class (r) is estimated through a multinomial regression model with a covariate X:

ln[P(Ci=k|Xi)P(Ci=r|Xi)]=vk+ΓkXi, (4)

where vk and Γk denote the class-specific intercept and regression coefficient, respectively. The effect of covariate on the latent class variable is typically included in applications of FMM and other mixture models (e.g., Allan et al., 2014; Bernstein et al., 2013; Elhai et al., 2011). The exploration of such covariate effects would help researchers better understand the connections between the latent class membership and some observed characteristics and behaviors of individuals. Particularly in FMM, if latent classes are formed due to measurement noninvariance (MNI) in factor loadings, intercepts, or both, the inclusion of covariate effect on the latent class variable would help researchers identify the source of MNI (E. S. Kim & Wang, 2018).

Covariate Effect on Factor

Path 2 (Γkη) in Figure 2 indicates the impact of the covariate on the factor scores within each latent class. The factor scores can thus be expressed as

ηik=Ak+ΓkηXik+ζik. (5)

Assuming a single factor, Ak refers to the intercept of factor scores for the latent class k individual i belongs to. Alternatively, Ak can be considered as the factor mean for class k when there is no covariate X. Γkη is the effect of the covariate (Xik) on the factor. The superscript η is used to distinguish this covariate effect on the factor (Γkη) from the covariate effect on the latent class variable (Γk in Equation [4]). Note that the subscript k indicates that covariate effect Γkη can be class-specific, which means that the covariate impacts the factor differently across classes. ζik is the residual factor score that would vary across individuals and classes.

Lubke and Muthén (2005) demonstrated the inclusion of covariates to explain within-class variations in the factor scores. In their demonstration, both the latent class variable and the factors (i.e., math and science ability) were regressed on two covariates, gender and urban status. They specifically illustrated the class-specific covariate effects on the factors. For instance, in the five-class solution, the two covariates explained 18% of the variance in the math factor scores for Class 4, but only accounted for 1% to 5% of the variances in the factor scores across other classes. Though in a different context (i.e., growth mixture modeling or GMM), Bauer (2007) also discussed the decomposition of individual differences in growth factors (i.e., intercept and slope) into a between-class component and a within-class component. The between-class component refers to the differences in growth factors across classes and the within-class component refers to the within-class variations in growth factors. Both components should be considered when covariates were added to the model. To summarize, the within-class variations in the factor is of substantive significance and the inclusion of the covariate effect on the factor would help explain such variations.

It is worthwhile pointing out that in addition to covariate effects on the latent class variable and the factor, covariates can have direct impact on items (see Paths 3 and 4 in Figure 2), which indicates MNI with regard to the covariates within classes. This type of covariate effect has been discussed in the literature and interested readers are referred to De Ayala et al. (2002), Tay et al. (2011), Lubke and Muthén (2005), and Lee and Beretvas (2014).

Inclusion of Covariates in Factor Mixture Modeling

Inclusion of covariates has been discussed in various types of mixture modeling, such as LCA (Nylund-Gibson & Masyn, 2016), regression mixture modeling (M. Kim et al., 2016), FMM (Lubke & Muthén, 2005, 2007), GMM (e.g., Asparouhov & Muthén, 2014; Stegmann & Grimm, 2018; Tofighi & Enders, 2008), and mixture item response theory models (Lee & Beretvas, 2014; Maij-de Meij et al., 2010; Tay et al., 2011). This section briefly reviews results of previous studies regarding the inclusion of covariates in FMM, as well as GMM and mixture item response theory models where there is a measurement model comparable to that in FMM.

Simulation studies have examined the impact of including the covariate effect on the latent class variable versus excluding the covariate in FMM and GMM. For example, Lubke and Muthén (2007) observed the benefit of including the covariate effect on the latent class variable in FMM. That is, when the covariate had a medium to large effect on the latent class variable, including the covariate effect improved the correct class assignment and the coverage of factor mean differences even with small class separation. Maij-de Meij et al. (2010) included the covariate effect on the latent class variable in the mixture Rasch model to detect differential item functioning (DIF). They observed that when the covariate was uncorrelated with the latent class variable, excluding it from the mixture Rasch model resulted in more accurate DIF detection. However, DIF detection was improved with the inclusion of covariate effect even when the covariate correlated as low as .2 with the latent class variable. Conversely, an unconditional FMM without any covariate effects was found to perform well in identifying the correct level of invariance, though in the multilevel context with between-level classes (E. S. Kim et al., 2017; E. S. Kim et al., 2016; E. S. Kim & Wang, 2018). Stegmann and Grimm (2018) found that including covariate effect on the latent class variable improved the correct class assignment in GMM only when the covariate effects were strong and class separation was large. When the covariate effects became weak or classes were less separated, the unconditional GMM performed better than the GMM with covariates in terms of correct class assignment.

In addition to the covariate effect on the latent class variable, a few studies have considered the covariate effect on the factor(s). For instance, as mentioned earlier, Lubke and Muthén (2005) demonstrated including the covariate effects on two factors in FMM. B. O. Muthén (2004) suggested that the covariate effects on the latent class variable and the growth factors (i.e., intercept and slope) should be included in GMM to identify the correct number of classes and estimate class membership. Tofighi and Enders (2008) compared the class enumeration of the unconditional GMM and the GMM with covariate effects on the latent class variable as well as the growth factors, when the data generation model was the GMM with those covariate effects. Results showed that class enumeration was worse for the GMM with covariate effects than the unconditional GMM, especially when sample size was less than 1,000. However, Hu et al. (2017) observed that the GMM with covariate effects outperformed the unconditional GMM when sample size was small (i.e., 400) and class separation was small, although the correct class enumeration rates were relatively low across the models.

Purposes of the Study

Although the issue of covariate inclusion has been studied in prior research, no simulation study has been specifically designed to evaluate how the inclusion of covariate effects impacts MI testing across classes in FMM. To address this gap in the literature, a Monte Carlo simulation study was conducted to investigate how the detection of the correct level of MI would be affected by different specifications of covariate effects, such as excluding covariate effects, including covariate effects on the latent class variable, and including covariate effects on the latent class variable as well as the factor.

Method

Data Generation

Data were generated based on FMM in which the measurement model consisted of a single factor and six continuous items. Item scores were normally distributed with means of zero and variances of one. Residual variance for each item was specified in data generation and the values depended on the factor loadings used, that is, for each item the sum of the squared loading and residual variance equaled one. The number of latent classes was fixed at two. Classes were separated based on factor mean difference for MI conditions. For MNI (or DIF) conditions, either factor loadings or intercepts differed across classes, which will be detailed in the following section. Factor means were simulated to be 0 and .5 (corresponds to a .5 effect size) for Classes 1 and 2, respectively. Covariate effects were generated based on the FMM population model which will be introduced shortly. A continuous and normally distributed covariate was simulated with a mean of zero and a variance of one. Data were generated and analyzed using Mplus 7.3 (L. K. Muthén & Muthén, 1998-2014). Two hundred replications were simulated for each condition. The default maximum likelihood estimation with robust standard errors (MLR) was employed. Examples of data generation codes are provided in the Supplemental Appendix A (available online).

Simulation Factors

Manipulated factors included three population models, each representing one type of covariate effect; location of DIF (factor loadings and intercepts); magnitude of DIF (zero, small, medium, and large); number of DIF items (1 and 2); strength of covariate effects (1 and 2 for the effect on the latent class variable, .4 and .8 for the effect on the factor); sample size (500 and 2,000); and latent class proportions (balanced 50%-50%, and unbalanced 30%-70%).

Population Models

Three population models were considered including (a) FMM with a zero covariate effect on the latent class variable,1 (b) FMM with only the covariate effect on the latent class variable, and (c) FMM with covariate effects on both the latent class variable and the factor.

Location of DIF

DIF was generated in factor loadings or intercepts in order to evaluate the performance of FMM in identifying the correct level of MI under each location of DIF.

Magnitude of DIF

For MI conditions, both classes had zero intercept for all items and factor loadings ranged from .5 to .8. For the majority of the simulation studies on DIF, the magnitude of intercept DIF ranged from 0.3 to 1.5 (e.g., Jackman, 2012; E. S. Kim et al., 2017; Maij-de Meij et al., 2010). Therefore, three levels of intercept DIF magnitude were selected: 0.4, 0.8, and 1.2, representing small, medium, and large DIF, respectively. That is, across population models, all item intercepts in Class 1 were fixed at zero, while intercepts in Class 2 were 0.4, 0.8, or 1.2 for the DIF item(s) and zero for the invariant items. Given that factor loadings could possibly range from 0 to 1, the magnitude of loading DIF varied only at two levels, small and large. That is, loadings for DIF items in Class 2 were .2 and .4 lower than those in Class 1, for small and large DIFs, respectively.

Number of DIF Items

For DIF conditions, one or two items out of six had DIF across classes, which corresponds to about 17% and 33% DIF contamination. These two percentages are aligned with both applied and methodological studies on DIF detection or MI testing (e.g., Cho & Cohen, 2010; Davidov et al., 2012; Jak et al., 2013; E. S. Kim et al., 2017).

Strength of Covariate Effects

For population models that included covariate effects, the covariate effect on the latent class variable varied at two levels, 1 and 2, which is consistent with previous simulation studies on mixture modeling with covariates (Lubke & Muthén, 2007; Maij-de Meij et al., 2010; Nylund-Gibson & Masyn, 2016). These levels of effects correspond to odds ratios of 2.72 (i.e., e1) and 7.39 (i.e., e2) for membership in Class 1 compared with Class 2 for 1 unit of increase in covariate X. The covariate effect on the factor varied at .4 and .8, representing that 16% and 64% of the variances in the factor could be explained by the covariate. The choice of 16% of the variance explained by the covariate is consistent with other simulation studies that had covariate effects on factors (Hu et al., 2017; Lubke & Muthén, 2005; Tofighi & Enders, 2008). Conditions with 64% of the variance explained by the covariate were selected purposefully to mimic a strong covariate effect. Note that all covariate effects were class-invariant.

Sample Sizes

Sample sizes of 500 and 2,000 were chosen to represent the range of sample sizes in applied studies using FMM (e.g., Babusa et al., 2015, had an N of 304; Cassady & Finch, 2015, had an N of 619; Asmundson et al., 2012, used an N of 1,768). Although there were extremely large sample sizes in applied studies (Subramaniam et al., 2014, utilized an N of 6,616; Dimitrov et al., 2015, had an N of 15,962), they are relatively rare. In addition, 500 and 2,000 are consistent with the sample sizes in simulation studies on mixture modeling (Hu et al., 2017; Lee & Beretvas, 2014; Tofighi & Enders, 2008), which might facilitate the interpretation of study findings in relation to previous studies.

Latent Class Proportions

Balanced (50%-50%) and unbalanced (30%-70%) class proportions were considered, because both are common in applied studies using FMM (e.g., Cassady & Finch, 2015; Dimitrov et al., 2015). Although more unbalanced proportions might be possible (e.g., 9%-91% in Allan et al., 2014; 18%-82% in Asmundson et al., 2012), we chose 30% to 70% to ensure sufficient sample size in each latent class across sample size conditions. That is, when combining sample sizes and latent class proportions, the number of individuals for Classes 1 and 2 were 250 and 250, or 1,000 and 1,000 for balanced proportions; and 150 and 350, or 600 or 1,400 for unbalanced proportions.

It should be noted that simulation factors are not fully crossed in this study, given that population models involved different covariate effects and thus different simulation factors. In addition, only selected conditions were included for loading DIF based on preliminary analysis to make the simulation study manageable. Table 1 summarizes simulation factors and the number of conditions by population model and location of DIF. There were a total of 208 simulation conditions.

Table 1.

Simulation Factors by Population Model.

Population model Manipulated factors Number of conditions
MI Intercept DIF Loading DIFa
graphic file with name 10.1177_0013164420925122-img2.jpg Number of DIF items (1, 2) NA * NA * 2 * 2 = 4 2 * 3 * 2 * 2 = 24 1 DIF item with small DIF magnitude for sample sizes 500 and 2,000
Level of DIF magnitude (small, medium, and large)
Sample size (500, 2,000)
Latent class proportions (50-50, 30-70)
graphic file with name 10.1177_0013164420925122-img3.jpg Covariate effect on class (1, 2) 2 * NA * NA * 2 * 2 = 8 2 * 2 * 3 * 2 * 2 = 48 2 DIF items with large DIF magnitude for sample sizes 500 and 2,000
Number of DIF items (1, 2)
Level of DIF magnitude (small, medium, and large)
Sample size (500, 2,000)
Latent class proportions (50-50, 30-70)
graphic file with name 10.1177_0013164420925122-img4.jpg Covariate effect on class (1, 2) 2 * 2 * NA * NA * 2 * 2 = 16 2 * 2 * 2 * 3 * 2 * 2 = 96
Covariate effect on factor (.4, .8)
Number of DIF items (1, 2)
Level of DIF magnitude (small, medium, and large)
Sample size (500, 2,000)
Latent class proportions (50-50, 30-70)
Total 28 168 12

Note. MI = measurement invariance; NA = not applicable.

a

The four conditions of loading DIF all had equal class proportions, covariate effect on class being 1 if present, and covariate effect on factor being .4 if present.

Fitted Models and Simulation Outcomes

For data generated based on each population model, three specifications of covariate effects were considered. Figure 3 lists these models: (a) FMM with no covariate effect, (b) FMM with only the covariate effect on the latent class variable, and (c) FMM covariate effects on the latent class variable and the factor. For each specification of covariate effects, models with different numbers of classes and different levels of MI were constructed, including one-class, two-class configural, two-class metric, two-class scalar, three-class configural, three-class metric, and three-class scalar (see Supplemental Appendix B [available online], for example, Mplus codes used for data analysis). Note that the one-class model was excluded for analysis models that had covariate effect(s), as the covariate effect on the latent class variable required at least two classes. All fitted models were compared to search for a best-fitting model for each condition. Note that we also compared across different specifications of covariate effects to mimic the procedure one might take in applied practice where not only the level of MI and the number of classes are unknown but also the type of covariate effects that are present in FMM.

Figure 3.

Figure 3.

Analysis models: (a) an unconditional model excluding the covariate; (b) covariate is included in the model but only the effect on the latent class variable is modeled; and (c) both covariate effects on the latent class variable and the factor are included. Error terms associated with η and Y1 to Y6 are left out for simplicity.

Although it might be ideal to compare model fit using a variety of indices and procedures, only ICs (i.e., AIC, BIC, and saBIC) were used for model comparisons in this study. This is because the adequate performance of ICs in terms of selecting the correct model has been documented in simulation studies on mixture modeling (e.g., Lubke & Neale, 2008; Nylund et al., 2007; Tein et al., 2013). Although AIC has not been strongly recommended across simulation studies, we decided to include it given its better performance than BIC and saBIC when class separation or sample size was smaller, which has been noted in some studies (E. S. Kim et al., 2016; Lukočiene et al., 2010; Lukočiene & Vermunt, 2010; Wang et al., 2020). Entropy was not considered given its unreliable performance as documented in the literature (e.g., E. S. Kim et al., 2016; Tein et al., 2013). Bootstrap likelihood ratio test was excluded as well due to its long execution time which would prevent the completion of the simulations within a reasonable timeframe (Nylund et al., 2007). LMR and aLMR were not considered because simulation studies have shown that they did not outperform BIC or saBIC (e.g., Henson et al., 2007; Nylund et al., 2007; Tein et al., 2013).

Simulation outcomes included class enumeration rate—the number of replications that supported each analysis model over the total number of replications. Particularly, we focused on the correct class enumeration rate—the proportion of replications that selected the correct model (i.e., correct level of invariance, number of class, and specification of covariate effects). Another simulation outcome was MI detection rate—the proportion of replications where the level of MI was correctly detected (i.e., two-class scalar under MI, two-class configural under loading DIF, and two-class metric under intercept DIF) regardless of the number of classes or the specification of covariate effects. However, only class enumeration rates were reported given that MI detection rates were almost the same as correct class enumeration rates across conditions, indicating that when the level of MI was correctly detected, the correct number of classes and the correct specification of covariate effects was identified as well.

To examine the impact of simulation factors on correct class enumeration rates, a between-subjects analysis of variance (ANOVA) was conducted in which the correct class enumeration rate was the dependent variable and simulation factors and their first-order interactions were the independent variables. Class enumeration rates based on factors and interaction terms with η2 values greater than .0588 (the cutoff for a medium effect size) were reported (Cohen, 1988). Nonconvergence and inadmissible solutions (e.g., negative residual variances) were first checked and reported. Note that only converged models with proper solutions were compared for model selection.

Results

Nonconvergence and Inadmissible Solutions Check

Convergence was examined for all fitted models under each simulation condition for each population model. Overall convergence rates were high across conditions, ranging from .73 to 1.00 for population model with a zero covariate effect, .77 to 1.00 for population model with covariate effect on class, and .73 to 1.00 for population model with covariate effects on both class and factor. Across population models, it was observed that (a) convergence rates were slightly lower for the analysis model that did not include any covariate effects and (b) the three-class models, especially the three-class configural model, had lower convergence rates compared with other models. Overall, the nonconvergence observed for the three-class models was not of major concern, because those models were misspecified models and nonconvergence was expected. The rates for inadmissible solutions were near zero across simulation conditions.

Class Enumeration Rates

Overall, AIC did not perform very well across conditions as it tended to select a more complex model over the correct model. That is, a larger number of classes and/or a more relaxed model than the correct model could be supported. For example, the three-class configural model was supported instead of the correct two-class metric model when intercept DIF was simulated. Therefore, results based on AIC were not reported but are available on request. Only BIC and saBIC were reported in this study. In addition, note that ANOVA was conducted for intercept DIF conditions to identify significant factors that impacted correct class enumeration rates for each population model, but was not conducted for other conditions for two reasons. First, correct class enumeration rates were overall low for MI conditions. Second, only four loading DIF conditions were included for each population model; so, the impact of simulation factors and their interactions could be observed in a straightforward manner.

MI Conditions

For population model with a zero covariate effect, the one-class model that did not have any covariate effect was almost always selected by BIC across MI conditions. This unconditional one-class model was also supported incorrectly by saBIC when sample size was 2,000. When sample size was 500, none of the analysis models was dominant in class enumeration. For population model with covariate effect on class, the unconditional one-class model was incorrectly selected by BIC, whereas saBIC was able to identify the correct model when sample size was 2,000 and covariate effect on class was strong (i.e., 2), although the correct class enumeration rates were not high (e.g., .32 and .25 for balanced and unbalanced conditions, respectively). For population model with covariate effects on class and factor, correct class enumeration rates of BIC were high except when sample size was 500 and covariate effect on factor was .4 (the unconditional one-class model was selected instead). Under this population model, the performance of saBIC was not as good as BIC due to its relatively lower correct class enumeration rates across conditions.

Intercept DIF Conditions

Tables 2 to 4 present class enumeration rates by significant factors2 for BIC or saBIC (i.e., η2 values greater than .0588) under each population model. In addition, models with class enumeration rates less than 20% for both BIC and saBIC were not shown in the tables due to the space limit. However, complete class enumeration results including all models and all simulation factors are available on request. Overall correct class enumeration rates depended on population model, class separation (i.e., the number of DIF items and DIF magnitude), and sample size. That is, correct class enumeration rates were overall lower in the population model with a zero covariate effect as opposed to other population models with substantial covariate effect(s). For example, as shown in Table 2, the one-class model was usually selected rather than the true two-class metric invariance model when the population model had a zero covariate effect. Larger class separation and larger sample size was associated with higher correct class enumeration rates. For instance, correct class enumeration rates of BIC reached .98 for two DIF items with medium DIF magnitude when sample size was 2,000 and population model had covariate effect on class (see Table 3). The saBIC performed better in identifying the correct model when class separation was smaller and/or sample size was smaller. When BIC or saBIC failed to select the correct model, they tended to select the unconditional one-class model.

Table 2.

Class Enumeration Rates for Intercept DIF Conditions Under Population Model With A Zero Covariate Effect on Class.

Population model Sample size Number of DIF items DIF size Mixing proportions Analysis model
graphic file with name 10.1177_0013164420925122-img5.jpg graphic file with name 10.1177_0013164420925122-img6.jpg graphic file with name 10.1177_0013164420925122-img7.jpg
One-class Two-class metric Two-class metric
BIC saBIC BIC saBIC BIC saBIC
graphic file with name 10.1177_0013164420925122-img8.jpg 500 1 0.4 B 1.00 .26 .00 .05 .00 .02
U .98 .21 .00 .09 .00 .01
0.8 B 1.00 .27 .00 .05 .00 .03
U 1.00 .24 .00 .05 .00 .02
1.2 B .99 .25 .00 .07 .00 .02
U .99 .26 .00 .05 .00 .01
2 0.4 B .99 .25 .00 .08 .00 .01
U .99 .22 .00 .06 .00 .03
0.8 B .98 .17 .00 .13 .00 .04
U 1.00 .19 .00 .10 .00 .03
1.2 B .59 .00 .39 .60 .00 .03
U .70 .00 .28 .60 .01 .04
2,000 1 0.4 B 1.00 .84 .00 .01 .00 .00
U 1.00 .82 .00 .01 .00 .00
0.8 B 1.00 .81 .00 .02 .00 .01
U 1.00 .80 .00 .02 .00 .00
1.2 B 1.00 .77 .00 .02 .00 .01
U 1.00 .83 .00 .01 .00 .00
2 0.4 B 1.00 .77 .00 .01 .00 .00
U 1.00 .82 .00 .01 .00 .00
0.8 B .89 .08 .12 .86 .00 .01
U .97 .13 .03 .78 .00 .02
1.2 B .00 .00 .00 .00 1.00 1.00
U .00 .00 .99 .93 .00 .01

Note. The correct class enumeration rates are in boldface. Models with class enumeration rates less than .20 for both BIC and saBIC are not shown in the table due to space limit. B = Balanced proportions; U = unbalanced proportions; BIC = Bayesian information criterion, saBIC = sample size–adjusted BIC.

Table 4.

Class Enumeration Rates for Intercept DIF Conditions under Population Model with Covariate Effects on Class and Factor.

Population model Sample size Number of DIF items DIF size Analysis model
graphic file with name 10.1177_0013164420925122-img12.jpg graphic file with name 10.1177_0013164420925122-img13.jpg graphic file with name 10.1177_0013164420925122-img14.jpg
One-class Two-class metric Two-class scalar
BIC saBIC BIC saBIC BIC saBIC
graphic file with name 10.1177_0013164420925122-img15.jpg 500 1 0.4 .43 .00 .02 .23 .52 .31
0.8 .41 .00 .18 .55 .40 .12
1.2 .23 .00 .62 .74 .13 .02
2 0.4 .47 .01 .05 .38 .46 .17
0.8 .20 .00 .74 .74 .01 .00
1.2 .00 .00 .99 .82 .00 .00
2,000 1 0.4 .00 .00 .23 .66 .75 .30
0.8 .00 .00 .97 .99 .03 .00
1.2 .00 .00 1.00 .99 .00 .00
2 0.4 .00 .00 .85 .96 .13 .03
0.8 .00 .00 1.00 .99 .00 .00
1.2 .00 .00 1.00 .99 .00 .00

Note. The correct class enumeration rates are in boldface. Models with class enumeration rates less than .20 for both BIC and saBIC are not shown in the table due to space limit. BIC = Bayesian information criterion; saBIC = sample size–adjusted BIC.

Table 3.

Class Enumeration Rates for Intercept DIF Conditions Under Population Model With Covariate Effect on Class.

Population model Sample size Number of DIF items DIF size Covariate effect on class Analysis model
graphic file with name 10.1177_0013164420925122-img9.jpg graphic file with name 10.1177_0013164420925122-img10.jpg
One-class Two-class metric
BIC saBIC BIC saBIC
graphic file with name 10.1177_0013164420925122-img11.jpg 500 1 0.4 1 .99 .24 .00 .04
2 .98 .14 .00 .09
0.8 1 .98 .17 .00 .09
2 .93 .03 .04 .36
1.2 1 .99 .08 .00 .27
2 .46 .00 .51 .58
2 0.4 1 .98 .16 .00 .08
2 .97 .05 .01 .24
0.8 1 .90 .01 .07 .42
2 .13 .00 .81 .66
1.2 1 .01 .00 .96 .70
2 .00 .00 .97 .66
2,000 1 0.4 1 1.00 .66 .00 .05
2 .83 .03 .11 .70
0.8 1 .91 .08 .08 .73
2 .00 .00 .98 .90
1.2 1 .09 .00 .88 .92
2 .00 .00 .98 .93
2 0.4 1 .97 .15 .01 .50
2 .03 .00 .95 .93
0.8 1 .00 .00 .98 .95
2 .00 .00 .99 .95
1.2 1 .00 .00 .99 .96
2 .00 .00 .99 .93

Note. The correct class enumeration rates are in boldface. Models with class enumeration rates less than .20 for both BIC and saBIC are not shown in the table due to space limit. BIC = Bayesian information criterion; saBIC = sample size–adjusted BIC.

Loading DIF Conditions

When there was a zero covariate effect in the population, the unconditional one-class model was supported by BIC and saBIC. When covariate effect(s) was simulated in the population, saBIC could identify the correct model only when sample size was 2,000 and class separation was large (i.e., two DIF items with large DIF magnitude). For this condition, correct class enumeration rates were .76 and .80 for population models with covariate effect on class and on both class and factor, respectively. In other conditions where the correct class enumeration rates were low, saBIC tended to support the unconditional one-class model except that when the population model had both covariate effects on class and factor, the two-class scalar model with the correct covariate effects was selected. Correct class enumeration rates were overall very low for BIC across all loading DIF conditions. Instead, the unconditional one-class model tended to be selected as the best-fitting model.

Demonstration

Sample and Measures

We demonstrated MI testing across classes in FMM when different specifications of covariate effects were considered using the Programme for International Student Assessment (PISA) 2015 data. PISA evaluates 15-year-old students’ performance on science, mathematics, and reading in all Organisation for Economic Co-operation and Development countries and economies. This demonstration used four items that measured school principals’ curricular leadership (see Supplemental Appendix D for a full list of items) which was a subscale of school management. Principals were asked to report how frequently leadership actions occurred to communicate the school’s educational goals with teachers during the previous academic year. Items were measured on a 1 (did not occur) to 6 (more than once a week) Likert-type scale. In accordance with the sample sizes considered in the simulation, the Canada sample was used, which consisted of 759 school principals. There were 122 observations that missed responses to all four items and were thus deleted from the analysis, resulting in a final sample size of 637.

Covariates that might impact latent classes of curricular leadership were considered in the demonstration. Covariates included the size of the community where the school was located which was measured on a 5-point Likert-type scale with higher numbers indicating more populous communities, the average size of classes measured on a 9-point Likert-type scale (1 = 15 students or fewer, and 9 = more than 50 students), and school type (public or private). Two covariates that measure the school’s policy on organizing instructions were also considered, including the extent to which students were grouped by ability into different classes and the extent to which students were grouped by ability within their classes, both of which were measured on a 3-point Likert-type scale (1 = for all subjects, 2 = for some subjects, and 3 = not for any subjects).

Statistical Analysis

Three specifications of covariate effects were considered when testing MI across classes in FMM, including no covariate effects, covariate effect on latent class, and covariate effects on both class and factor, which is consistent with the analytic models in the simulation. Under each specification, a series of models were constructed for the purpose of identifying the correct number of classes and the correct level of MI. The models included one-class, two-class scalar, metric, and configural, and three-class scalar, metric, and configural, except that the one-class model was excluded for the two FMMs that included covariate effect(s). Note that factor variances and item residual variances were freely estimated across classes and the factor mean of the last class was fixed at zero by default in Mplus (L. K. Muthén & Muthén, 1998-2014). Model fit comparisons were based on AIC, BIC, and saBIC.

Results

Table 5 presents the model fit information for each model. Note that all three-class models did not converge so they were not reported in the table. Among the other classes, BIC supported the two-class scalar model with covariate effect on class, while AIC and saBIC selected the two-class metric model with the same covariate effect. Given that saBIC outperformed BIC with smaller sample size (i.e., 500) in the simulation, we relied on saBIC and selected the two-class metric model with covariate effect on class. Class 1 (69%) had higher intercept for items, “ensure professional development aligned with teaching goals” and “discuss academic goals with teachers,” and lower intercept for item “ensure teachers work towards school’s educational goals” than Class 2 (31%). Note that item-level DIF analyses are needed to further identify significant differences in intercepts across classes, which is beyond the scope of this study. Among the five covariates, only one covariate, the extent to which students were grouped by ability within their classes, had significant relationship with the latent class variable. The structural coefficient of .48 (odds ratio 1.62) indicates that schools that grouped students by ability within their classes for fewer subjects were associated with a higher likelihood of being assigned to Class 1. In other words, the extent to which students were grouped by ability within their classes was a source of noninvariance in intercepts. Of note is that given the lack of scalar invariance, the factor means were not comparable across classes.

Table 5.

Model Fit Comparisons of Factor Mixture Models (FMM) for Demonstration Data.

FMM Number of free parameters Log likelihood AIC BIC saBIC
Unconditional FMM
 One-class 12 −3407.50 6839.01 6892.49 6854.39
 Two-class scalar Nonconvergence
 Two-class metric
 Two-class configural
FMM with covariate effect on class
 Two-class scalar 24 −3224.79 6497.58 6603.78 6527.59
 Two-class metric 27 −3217.24 6488.47 6607.94 6522.22
 Two-class configural Nonconvergence
FMM with covariate effects on class and factor
 Two-class scalar 29 −3217.86 6493.71 6622.03 6529.96
 Two-class metric Nonconvergence
 Two-class configural

Note. All three-class models did not converge and are thus not included in the table. The smallest AIC, BIC, and saBIC values are in boldface. AIC = Akaike information criterion; BIC = Bayesian information criterion; saBIC = sample size–adjusted BIC.

Discussion

Prior research has examined the inclusion of covariate effects and its impact on the class enumeration, class assignment, and/or parameter estimation in LCA and GMM. This simulation study extended the prior research by considering multiple covariate effects in FMM and investigating the performance of MI testing across classes under different specifications of covariate effects. Three population models were considered in this study, including an FMM with a zero covariate effect, an FMM with covariate effect on the latent class variable, and an FMM with covariate effects on both the latent class variable and the factor. A series of analysis models that varied in the level of MI, the number of classes, and the specification of covariate effects was fitted and compared in terms of model fit.

A major finding of this study is that correct class enumeration rates were overall much higher for population models that had covariate effect(s) than for population models without any covariate effect. That is, when a zero covariate effect on the latent class variable was simulated in the population model, comparing analysis models that excluded or included such improper covariate did not lead to the selection of the correct model. Instead, the unconditional one-class model was supported by BIC and saBIC across simulation conditions. By contrast, when covariate effect(s) was truly present in the population model, it was more likely to identify the correct model by comparing a series of analysis models. The discrepancy in correct class enumeration rates between the population models with and without covariate effect(s) highlighted the importance of choosing proper covariates that can explain the heterogeneity in FMM (Lubke & Muthén, 2005).

After proper covariates are selected, another critical issue is how to specify covariate effects in FMM when MI is tested. A promising finding of this study is that the specification of covariate effects as well as the level of MI and the number of classes can be correctly identified simultaneously through the comprehensive model selection procedure. That is, analysis models that varied in the number of classes, level of MI, and specification of covariate effects (i.e., excluding covariate, including covariate effect on class, and including both effects on class and factor) were compared. All three components of the FMM were correctly identified in the best-fitting model, which is an encouraging finding given the complexity of the FMM.

However, it is important to note that correct class enumeration rates depended on class separation, sample size, and the location of DIF across all population models. For instance, when class separation was larger (i.e., more DIF items and/or larger DIF magnitude) or sample size was larger (2,000 as opposed to 500), correct class enumeration rates were higher. Overall MI conditions had lower correct class enumeration rates than the conditions with DIF given that classes were only distinguished by factor mean differences. Such impact of class separation and sample size on class enumeration has been well documented in the literature (Dias, 2004; E. S. Kim et al., 2016; Lubke & Neale, 2006, 2008). Of note is that overall it was more challenging to identify the correct model under loading DIF than intercept DIF, which is aligned with the literature (e.g., E. S. Kim & Yoon, 2011). For example, the correct model was identified by saBIC under loading DIF only when sample size was 2,000 with two DIF items and large DIF magnitude (0.8), whereas correct class enumeration rates were high regardless of the number of DIF items or DIF magnitude under intercept DIF when sample size was 2,000.

Additionally, class enumeration depended on ICs. Although the impact of class separation and sample size was observed for both BIC and saBIC, saBIC outperformed BIC when class separation was smaller or sample size was smaller, which is consistent with the finding of previous simulation studies (e.g., E. S. Kim et al., 2016). The discrepancy between BIC and saBIC with smaller sample size can be explained by the fact that saBIC imposes lesser penalty on sample size than BIC, that is, saBIC uses (N+ 2)/24 in its computation to substitute the sample size N in the formula of BIC. Such adjustment of the penalty on sample size tended to have considerable effects on model selection when sample size is smaller (Yang, 2006). When sample size was larger (i.e., 2,000 in this study), the performance of BIC and saBIC became much more comparable. AIC tended to select a model that was more complicated than the correct one (i.e., the overextraction of the number of classes and/or the selection of configural over the true metric or scalar model), which is consistent with previous simulation studies (e.g., Cho & Cohen, 2010; Henson et al., 2007; Nylund et al., 2007).

Of note is that when covariate effect(s) was simulated in the population, the correct level of MI was almost never selected with the unconditional model. Further analyses have been conducted to compare analysis models with different numbers of classes and levels of MI within the unconditional model and results also showed that the correct level of MI could not be identified. The one-class model was selected instead. The poor performance of the unconditional model compared with other models that included covariate effects in this study showed that including covariate effects might help improve class separation and thus MI testing and class enumeration in FMM (Lubke & Muthén, 2007; Maij-de Meij et al., 2010). Note that the poor performance of the unconditional model was not observed in previous simulation studies using LCA (Nylund-Gibson & Masyn, 2016) or GMM (Tofighi & Enders, 2008). Those studies found that the unconditional model performed well in terms of class enumeration. This discrepancy in the performance of the unconditional model might occur due to model differences between this study and the other studies mentioned above. That is, this study focused on FMM, which is a combination of measurement model and LCA. Therefore, the model could be considered as more complex than LCA or GMM that has a simpler measurement model with fewer parameters to estimate than those in the measurement model of FMM. It might be more difficult to distinguish latent classes because simulated differences across classes might be absorbed by other model parameters. If this happens, including covariates might help improve class separation and class enumeration.

Limitations and Future Research Directions

Some words of caution need to be pointed out in interpreting or generalizing results of this study. First, findings are limited to the simulation conditions considered in this study. Simulation conditions have been carefully selected not only to present a complete picture of the role of covariates in MI testing but also to make the study manageable given the number of analysis models fitted to each replication and the relatively long execution time for mixture modeling in general, particularly, when the number of classes is larger (e.g., 3). Therefore, we have limited the simulation design to two latent classes, a single-factor model with six continuous items, and one continuous covariate. Future research is warranted to further examine MI testing in FMM with covariates under more latent classes, more complex measurement models (e.g., two or more factors or bifactor structure), categorical items or covariate, or more than one covariate.

When these more complex designs are considered in future research, an issue that needs particular attention is the sample size. For instance, MI testing across a larger number of classes might involve more complex model specifications and thus require larger samples than those examined in this study (500 and 2,000). It will be worthwhile conducting future simulation studies to provide recommendations regarding the desired sample size to reach sufficient statistical power in detecting noninvariance. In these future endeavors, a variety of model selection indices and procedures can be included, such as the recently proposed k-fold cross-validation approach (Grimm et al., 2016).

Another important line of future research is the selection of proper covariates. Although this study highlighted the importance of choosing proper covariates that have substantial effect on latent classes and latent factor, the selection of covariates is another critical issue that warrants investigation. We suggest two possible approaches to covariate selection. First, applied researchers’ knowledge in the substantive theories and the research setting could be valuable in selecting covariates and understanding how covariates might contribute to the heterogeneity in FMM. It is well understood that researchers might not be able to hypothesize the possible classes before conducting FMM analyses (which is precisely the rationale for using this model). They might consider covariates that are associated with the construct being examined. For instance, if researchers are interested in latent classes of anxiety, covariates that might affect individuals’ anxiety levels (e.g., stress at work, personal relationships, financial security, medical conditions) can be potential proper covariates in FMM. Note that when these hypothesized covariates are included in FMM, the proper covariates and the specification of covariate effects can be correctly identified along with the level of MI and the number of classes, which is a major promising finding of the study. Second, machine learning approach (specifically, structural equation model trees or SEM Trees) might also contribute to the search for relevant covariates that explain population heterogeneity (Brandmaier et al., 2013; Jacobucci et al., 2017). That is, SEM Trees combine SEM and the decision tree paradigm in which the data set is recursively partitioned into subsets based on the value of covariates so that differences in SEM parameter estimates are maximized across subsets at each partition. Note that these approaches might complement each other.

When interpreting results of an FMM with covariate effects, researchers should be cognizant that the formation of latent classes (including the level of MI) is model dependent and different latent class solutions might emerge if different covariates are entered into the model (Asparouhov & Muthén, 2012; Lubke & Muthén, 2005). For example, when including a different set of covariates, changes might occur in the number of classes selected and the level of MI supported, let alone individuals’ assignment into latent classes or parameter estimates of latent classes. As the formation and interpretation of latent classes is subjected to the covariates included in FMM, the importance of selecting proper covariates that are well grounded in substantive theories is highlighted.

An interesting observation in this study is that although nonconvergence was not a concern in the simulation study, severe convergence issues were encountered in the demonstration when an empirical data set was used. Specifically, nonconvergence occurred for the unconditional models, the more relaxed models (i.e., configural), as well as the three-class models. Although the demonstration showed different approaches to covariate effect specification in MI testing, future research is needed to examine the problem of nonconvergence in FMM and explore viable solutions (e.g., Bayesian estimation).

Recommendations for Applied Researchers

Despite the limitations discussed above, this study offers several recommendations that are applicable for researchers who use or consider using FMM to explore unobserved population heterogeneity. It is recommended that researchers select covariates that might explain the heterogeneity in FMM prior to data analysis. With some proper covariates, researchers can run FMM analysis by comparing models with various levels of MI, different numbers of classes, and different specifications of covariate effects (i.e., excluding covariates, including covariate effect on class, and covariate effects on both class and factor). Through model fit comparisons, a best-fitting model can be selected so that the level of MI, the number of classes, as well as the covariate effects, can all be identified. Of note is that larger sample size (e.g., 2,000) could greatly help applied researchers arrive at the correct solution. The use of saBIC for model selection is preferred over BIC given its better performance with lower class separation or smaller sample size.

Supplemental Material

Appendix – Supplemental material for Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling

Supplemental material, Appendix for Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling by Yan Wang, Eunsook Kim, John M. Ferron, Robert F. Dedrick, Tony X. Tan and Stephen Stark in Educational and Psychological Measurement

1.

Note that we generated a zero covariate effect rather than generating data without covariate, because a covariate was needed in two analysis models. Including a covariate effect that is truly zero in the analysis models represents a scenario that might occur in applied practice that an improper covariate is included.

2.

Please see Supplemental Appendix C for eta-square values of factors and first-order interaction terms for BIC and saBIC under each population model.

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material: Supplemental material for this article is available online.

References

  1. Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]
  2. Allan N. P., MacPherson L., Young K. C., Lejuez C. W., Schmidt N. B. (2014). Examining the latent structure of anxiety sensitivity in adolescents using factor mixture modeling. Psychological Assessment, 26(3), 741-751. 10.1037/a0036744 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Allan N. P., Raines A. M., Capron D. W., Norr A. M., Zvolensky M. J., Schmidt N. B. (2014). Identification of anxiety sensitivity classes and clinical cut-scores in a sample of adult smokers: Results from a factor mixture model. Journal of Anxiety Disorders, 28(7), 696-703. 10.1016/j.janxdis.2014.07.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Asmundson G. J. G., Taylor S., Carleton R. N., Weeks J. W., Hadjstavropoulos H. D. (2012). Should health anxiety be carved at the joint? A look at the health anxiety construct using factor mixture modeling in a non-clinical sample. Journal of Anxiety Disorders, 26(1), 246-251. 10.1016/j.janxdis.2011.11.009 [DOI] [PubMed] [Google Scholar]
  5. Asparouhov T., Muthén B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling, 21(3), 329-341. 10.1080/10705511.2014.915181 [DOI] [Google Scholar]
  6. Babusa B., Czeglédi E., Túry F., Mayville S. B., Urbán R. (2015). Differentiating the levels of risk for muscle dysmorphia among Hungarian male weightlifters: A factor mixture modeling approach. Body Image, 12, 14-21. 10.1016/j.bodyim.2014.09.001 [DOI] [PubMed] [Google Scholar]
  7. Bauer D. J. (2007). Observations on the use of growth mixture models in psychological research. Multivariate Behavioral Research, 42(4), 757-786. 10.1080/00273170701710338 [DOI] [Google Scholar]
  8. Bauer D. J., Curran P. J. (2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9(1), 3-29. 10.1037/1082-989X.9.1.3 [DOI] [PubMed] [Google Scholar]
  9. Bernstein A., Stickle T. R., Schmidt N. B. (2013). Factor mixture model of anxiety sensitivity and anxiety psychopathology vulnerability. Journal of Affective Disorders, 149(1-3), 406-417. 10.1016/j.jad.2012.11.024 [DOI] [PubMed] [Google Scholar]
  10. Bozdogan H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52(3), 345-370. 10.1007/BF02294361 [DOI] [Google Scholar]
  11. Brandmaier A. M., von Oertzen T., McArdle J. J., Lindenberger U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86. 10.1037/a0030001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cassady J. C., Finch W. H. (2015). Using factor mixture modeling to identify dimensions of cognitive test anxiety. Learning and Individual Differences, 41, 14-20. 10.1016/j.lindif.2015.06.002 [DOI] [Google Scholar]
  13. Cho S., Cohen A. S. (2010). A multilevel mixture IRT model with an application to DIF. Journal of Educational and Behavioral Statistics, 35(3), 336-370. 10.3102/1076998609353111 [DOI] [Google Scholar]
  14. Clark S. L., Muthén B. O., Kaprio J., D’Onofrio B. M., Viken R., Rose R. J. (2013). Models and strategies for factor mixture analysis: An example concerning the structural underlying psychological disorders. Structural Equation Modeling, 20(4), 681-703. 10.1080/10705511.2013.824786 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Academic Press. [Google Scholar]
  16. Davidov E., Dülmer H., Schlüter E., Schmidt P., Meuleman B. (2012). Using a multilevel structural equation modeling approach to explain cross-cultural measurement noninvariance. Journal of Cross-Cultural Psychology, 43(4), 558-575. 10.1177/0022022112438397 [DOI] [Google Scholar]
  17. De Ayala R. J., Kim S.-H., Stapleton L. M., Dayton C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. 10.1080/15305058.2002.9669495 [DOI] [Google Scholar]
  18. Dias J. G. (2004). Finite mixture models: Review, applications, and computer-intensive methods. Ridderprint. [Google Scholar]
  19. Dimitrov D. M., Al-Saud F. A. A.-M., Alsadaawi A. S. (2015). Investigating population heterogeneity and interaction effects of covariates: The case of a large-scale assessment for teacher licensure in Saudi Arabia. Journal of Psychoeducational Assessment, 33(7), 674-686. 10.1177/0734282914562121 [DOI] [Google Scholar]
  20. Elhai J. D., Naifeh J. A., Forbes D., Ractliffe K. C., Tamburrino M. (2011). Heterogeneity in clinical presentations of posttraumatic stress disorder among medical patients: Testing factor structure variation using factor mixture modeling. Journal of Traumatic Stress, 24(4), 435-443. 10.1002/jts.20653 [DOI] [PubMed] [Google Scholar]
  21. Frost R., Hyland P., McCarthy A., Halpin R., Shevlin M., Murphy J. (2019). The complexity of trauma exposure and response: Profiling PTSD and CPTSD among a refugee sample. Psychological Trauma: Theory, Research, Practice, and Policy, 11(2), 165-175. 10.1037/tra0000408 [DOI] [PubMed] [Google Scholar]
  22. Grimm K. J., Mazza G. L., Davoudzadeh P. (2016). Model selection in finite mixture models: A k-fold cross-validation approach. Structural Equation Modeling, 24(2), 246-256. 10.1080/10705511.2016.1250638 [DOI] [Google Scholar]
  23. Henson J. M., Reise S. P., Kim K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling, 14(2), 202-226. 10.1080/10705510709336744 [DOI] [Google Scholar]
  24. Hu J., Leite W. L., Gao M. (2017). An evaluation of the use of covariates to assist in class enumeration in linear growth mixture modeling. Behavior Research Methods, 49(3), 1179-1190. 10.3758/s13428-016-0778-1 [DOI] [PubMed] [Google Scholar]
  25. Jackman M. G.-A. (2012). A Monte Carlo investigation of the performance of factor mixture modeling in the detection of differential item functioning (Publication No. 3480453) [Doctoral dissertation]. ProQuest Dissertation & Theses A&I. [Google Scholar]
  26. Jacobucci R., Grimm K. J., McArdle J. J. (2017). A comparison of methods for uncovering sample heterogeneity: Structural equation model trees and finite mixture models. Structural Equation Modeling, 24(2), 270-282. 10.1080/10705511.2016.1250637 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Jak S., Oort F. J., Dolan C. V. (2013). A test for cluster bias: Detecting violations of measurement invariance across clusters in multilevel data. Structural Equation Modeling, 20(2), 265-282. 10.1080/10705511.2013.769392 [DOI] [Google Scholar]
  28. Keller A. C., Igic I., Meier L. L., Semmer N. K., Schaubroeck J. M., Brunner B., Elfering A. (2016). Testing job typologies and identifying at-risk subpopulations using factor mixture models. Journal of Occupational Health Psychology, 22(4), 503-517. 10.1037/ocp0000038 [DOI] [PubMed] [Google Scholar]
  29. Kim E. S., Cao C., Wang Y., Nguyen D. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling, 24(4), 524-544. 10.1080/10705511.2017.1304822 [DOI] [Google Scholar]
  30. Kim E. S., Joo S.-H., Lee P., Wang Y., Stark S. (2016). Measurement invariance testing across between-level latent classes using multilevel factor mixture modeling. Structural Equation Modeling, 23(6), 870-887. 10.1080/10705511.2016.1196108 [DOI] [Google Scholar]
  31. Kim E. S., Wang Y. (2018). Investigating sources of heterogeneity with 3-step multilevel factor mixture modeling: Beyond testing measurement invariance in cross-national studies. Structural Equation Modeling, 26(2), 165-181. 10.1080/10705511.2018.1521284 [DOI] [Google Scholar]
  32. Kim E. S., Yoon M. (2011). Testing measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling, 18(2), 212-228. 10.1080/10705511.2011.557337 [DOI] [Google Scholar]
  33. Kim M., Vermunt J., Bakk Z., Jaki T., Van Horn M. L. (2016). Modeling predictors of latent classes in regression mixture models. Structural Equation Modeling, 23(4), 601-614. 10.1080/10705511.2016.1158655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lee H., Beretvas S. N. (2014). Evaluation of two types of differential item functioning in factor mixture models with binary outcomes. Educational and Psychological Measurement, 74(5), 831-858. 10.1177/0013164414526881 [DOI] [Google Scholar]
  35. Lo Y., Mendell N. R., Rubin D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. 10.1093/biomet/88.3.767 [DOI] [Google Scholar]
  36. Lubke G., Muthén B. O. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling, 14(1), 26-47. 10.1080/10705510709336735 [DOI] [Google Scholar]
  37. Lubke G. H., Muthén B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10(1), 21-39. 10.1037/1082-989X.10.1.21 [DOI] [PubMed] [Google Scholar]
  38. Lubke G., Neale M. C. (2006). Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood? Multivariate Behavioral Research, 41(4), 499-532. 10.1207/s15327906mbr4104_4 [DOI] [PubMed] [Google Scholar]
  39. Lubke G., Neale M. C. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43(4), 592-620. 10.1080/00273170802490673 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lukočienė O., Varriale R., Vermunt J. K. (2010). The simultaneous decision(s) about the number of lower- and higher-level classes in multilevel latent class analysis. Sociological Methodology, 40(1), 247-283. 10.1111/j.1467-9531.2010.01231.x [DOI] [Google Scholar]
  41. Lukočienė O., Vermunt J. K. (2010). Determining the number of components in mixture models for hierarchical data. In Fink A., Berthold L., Seidel W., Ultsch A. (Eds.), Advances in data analysis, data handling and business intelligence (pp. 241-249). Springer; 10.1007/978-3-642-01044-6_22 [DOI] [Google Scholar]
  42. Maij-de Meij A. M., Kelderman H., Van der Flier H. (2010). Improvement in detection of differential item functioning using a mixture item response theory model. Multivariate Behavioral Research, 45(6), 975-999. 10.1080/00273171.2010.533047 [DOI] [PubMed] [Google Scholar]
  43. McLachlan G., Peel D. (2000). Finite mixture models. Wiley. 10.1002/0471721182 [DOI]
  44. Muthén B. O. (2004). Latent variable analysis: Growth mixture modeling and related techniques. In Kaplan D. (Ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Sage. [Google Scholar]
  45. Muthén L. K., Muthén B. O. (1998-2014). Mplus user’s guide (7th ed.). [Google Scholar]
  46. Nylund K. L., Asparouhov T., Muthén B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535-569. 10.1080/10705510701575396 [DOI] [Google Scholar]
  47. Nylund-Gibson K., Masyn K. E. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling, 23(6), 782-797. 10.1080/10705511.2016.1221313 [DOI] [Google Scholar]
  48. Peugh J., Fan X. (2014). Enumeration index performance in generalized growth mixture models: A Monte Carlo test of Muthén’s (2003) hypothesis. Structural Equation Modeling, 22(1), 115-131. 10.1080/10705511.2016.1221313 [DOI] [Google Scholar]
  49. Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464. 10.1214/aos/1176344136 [DOI] [Google Scholar]
  50. Sclove S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52(3), 333-343. 10.1007/BF02294360 [DOI] [Google Scholar]
  51. Stegmann G., Grimm K. J. (2018). A new perspective on the effects of covariates in mixture models. Structural Equation Modeling, 25(2), 167-178. 10.1080/10705511.2017.1318070 [DOI] [Google Scholar]
  52. Subramaniam M., Abdin E., Vaingankar J. A., Verma S., Chong S. A. (2014). Latent structure of psychosis in the general population: Results from the Singapore Mental Health Study. Psychological Medicine, 44(1), 51-60. 10.1017/S0033291713000688 [DOI] [PubMed] [Google Scholar]
  53. Tay L., Newman D. A., Vermunt J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 14(1), 147-176. 10.1177/1094428110366037 [DOI] [Google Scholar]
  54. Tein J., Coxe S., Cham H. (2013). Statistical power to detect the correct number of classes in latent profile analysis. Structural Equation Modeling, 20(4), 640-657. 10.1080/10705511.2013.824781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Tofighi D., Enders C. (2008). Identifying the correct number of classes in a growth mixture model. In Hancock G. (Ed.), Mixture models in latent variable research (pp. 317-341). Information Age. [Google Scholar]
  56. Vermunt J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450-469. 10.1093/pan/mpq025 [DOI] [Google Scholar]
  57. Wang Y., Kim E., Joo S.-H., Chun S., Alamri A., Lee P., Stark S. (2020) reconsidering multilevel latent class models: Can level-2 latent classes affect item response probabilities? Journal of Experimental Education. Advance online publication. 10.1080/00220973.2020.1737913 [DOI]
  58. Yang C. C. (2006). Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis, 50(4), 1090-1104. 10.1016/j.csda.2004.11.004 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix – Supplemental material for Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling

Supplemental material, Appendix for Testing Measurement Invariance Across Unobserved Groups: The Role of Covariates in Factor Mixture Modeling by Yan Wang, Eunsook Kim, John M. Ferron, Robert F. Dedrick, Tony X. Tan and Stephen Stark in Educational and Psychological Measurement


Articles from Educational and Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES