Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 1.
Published in final edited form as: Multivariate Behav Res. 2019 Jul 2;54(6):882–905. doi: 10.1080/00273171.2019.1596781

Assessing the Robustness of Mixture Models to Measurement Non-Invariance

VT Cole 1, DJ Bauer 1, AM Hussong 1
PMCID: PMC7247772  NIHMSID: NIHMS1042430  PMID: 31264477

Abstract

Recent work reframes direct effects of covariates on items in mixture models as differential item functioning (DIF) and shows that, when present in the data but omitted from the fitted latent class model, DIF can lead to overextraction of classes. However, less is known about the effects of DIF on model performance – including parameter bias, classification accuracy, and distortion of class-specific response profiles – once the correct number of classes is chosen. First, we replicate and extend prior findings relating DIF to class enumeration using a comprehensive simulation study. In a second simulation study using the same parameters, we show that, while the performance of LCA is robust to the misspecification of DIF effects, it is degraded when DIF is omitted entirely. Moreover, the robustness of LCA to omitted DIF differs widely based on the degree of class separation. Finally, simulation results are contextualized by an empirical example.

Keywords: Latent class analysis < Other Topics, Differential item functioning < Test Theory, Mixture modeling < Other Topics, Measurement models < Factor Analysis

Introduction

In behavioral research, it is often of interest to form homogeneous groupings of people based on some pattern of variables. Mixture models, a broad class of models which decompose a population into homogeneous categories, have frequently been used to form empirically-derived subgroups in the service of this goal (McLachlan & Peel, 2004). Commonly used models include latent class analysis (LCA), latent profile analysis (LPA), and factor mixture models, each of which impose different models on items within a given class (FMM; Lazarsfeld & Henry, 1968; Gibson, 1959; Lubke & Muthén, 2005; 2007).

Much of the appeal of mixture models lies in their ability to incorporate exogenous variables, which may represent predictors of the patterns of behaviors the classes represent. Exogenous predictors may be incorporated into mixture models in two primary ways. First, predictors may impact class membership through a logistic regression equation for the class membership probability. Second, predictors may directly affect indicators beyond their effects on class membership. In the vast majority of applications, exogenous variables are exclusively considered as predictors of class membership. Correspondingly, the set of techniques available for incorporating predictors of class membership has grown rapidly in recent years. Though predictors can be directly included in the model for latent class membership (Huang & Bandeen-Roche, 2004), a number of three-step approaches, which allow for the prediction of class membership after the latent class model has already been estimated, give researchers the flexibility to consider predictors of class membership without altering the model for latent classes themselves (Bolck, Croon, & Hagenaars, 2004; Vermunt, 2010; Bakk, Tekle, & Vermunt, 2013; Lanza, Tan, & Bray, 2013).

The second way of incorporating predictors, through direct effects on indicators, has seen much less frequent application. Perhaps the lack of attention paid to direct effects in mixture models owes to their relative lack of interpretability. In a recent paper, Masyn (2017) addressed this problem by drawing on the concordance between continuous and categorical latent variable models. In models for continuous latent variables, direct effects of predictors on indicators represent measurement non-invariance or differential item functioning (DIF; Meredith, 1993; Millsap, 2012; Holland & Wainer, 1993). DIF is directly accounted for using continuous latent variable models which allow for direct effects such as the multiple indicator multiple cause (MIMIC; Muthen, 1989; Joreskog & Goldberger, 1975) and moderated nonlinear factor analysis (MNLFA; Bauer & Hussong, 2009; Bauer, 2017) models. Likewise, Masyn (2017) noted that direct effects of predictors in mixture models constitute DIF, and their presence indicates that the definition of latent classes is not constant across levels of the DIF-generating predictor.

A growing body of simulation work suggests that omission of DIF may lead to a number of potentially serious consequences. Recent work has indicated that, in the presence of omitted direct effects from a predictor to an indicator, estimates of the relationship between the predictor and the latent class are biased (Kim, Vermunt, Bakk, Jaki, and Van Horn, 2016; Masyn, 2017; Asparouhov and Muthén, 2014). Moreover, Asparouhov and Muthén (2014) found that that three-step estimation of the relationship between the predictor and latent class did not mitigate this bias (Vermunt, 2010). Additionally, omitted direct effects are linked with overestimation of the number of classes in the class enumeration process in latent class models, growth mixture models, and regression mixture models when predictors are erroneously included in the model as predictors of latent class membership (Nylund-Gibson & Masyn, 2016; Diallo, Morin, & Lu, 2017; Kim, Vermunt, Bakk, Jaki, & Van Horn, 2016). However, class enumeration procedures have been shown to yield the correct number of classes even in the presence of unmodeled DIF when covariate effects on class membership are excluded from the model (Nylund-Gibson & Masyn, 2016; Kim et al., 2016). It is therefore critical to determine whether, even when the correct number of classes is chosen, mixture model parameter estimates are still biased.

In the current report, we add to the growing literature on DIF in mixture models by systematically extending the exploration of the effects of omitted DIF beyond class enumeration. In addition to class enumeration, we focus on bias in estimates of parameters as well as heretofore unexamined quantities such as class-specific endorsement probabilities and individual classifications, due to omitted DIF. We adopt the definition of direct effects as DIF in mixture models, and use this definition both for interpretation and to make predictions about omitted direct effects by drawing on the continuous latent variable literature. After introducing the LCA model and defining DIF within it, we present two simulations of DIF in LCA. The first study extends prior work examining the implications of omitted DIF on the class enumeration process. The second study uses the same conditions to examine the effects of omitted DIF on the accuracy of parameter estimates, model-implied class-specific endorsement probabilities, and individual classifications in a 2-class latent class analysis. Finally, Study 3 consists of an empirical study examining the consequences of modeling DIF on an analysis of alcohol use disorder (AUD) diagnoses estimated through LCA.

Defining measurement invariance in mixture models

Finite mixture models express observed variables as a function of a given subject’s membership to one of K latent subgroups, each of which is governed by its own subgroup-specific set of parameters (McLachlan & Peel, 2004). Specifically, mixture models measure K latent classes (k = 1, … , K) using a set of J indicators (j = 1, … , J) observed on N subjects (i = 1, … , N). Define yi as a subject-specific J × 1 vector of observed variables, with individual elements yij, and define ηi as a subject-specific K × 1 vector of latent variables ηik which take a value of 1 if subject i is in class k and 0 otherwise.

Class membership ηi is distributed according to a multinomial distribution with endorsement probabilities given by the mixing probability vector π, with individual elements πk, where k=1Kπk=1. Each class is governed by its own set of parameters, which produce a class-specific implied distribution of yi, f (yi|ηik = 1). These distributions are weighted by the mixing probabilities π to yield the marginal distribution of observed variables:

f(yi)=k=1Kπkf(yi|ηik=1) (1)

The class-specific distribution function f (yi|ηik = 1) may b e expressed in terms of a generalized linear model (GLM; McCullagh & Nelder, 1989; Agresti, 2015). Though any distribution in the exponential family (e.g., normal, binomial, Poisson, negative binomial, or gamma) may be used, here we focus on the simple case of binary observed variables – i.e., latent class analysis.

Typically, in a latent class analysis, all items are considered conditionally independent of one another after accounting for class membership. Thus, the class-specific distribution function for all variables yi is given a product of all class-specific endorsement probabilities:

f(yi|ηik=1)=j=1Jf(yij|ηik=1) (2)

In the case of an LCA for binary observed variables, the item-specific function, f (yij|ηik = 1), is the probability of observing a given yij, with yij ∈ (0, 1). The endorsement probabilities of each item yij, here denoted µjk, are given by a logistic regression equation:

μjk=P(yij=1|ηik=1)=exp(δjk)1+exp(δjk) (3)

The above formulation may be easily altered to include covariate effects on both class membership and the within-class expected value of yij. Denote the P × 1 vector of covariates for the ith individual xi, with individual elements xip. These covariates are included by considering class membership probabilities and class-specific linear predictors to be deterministic functions of covariates. The probability of membership to class k conditional on xi, denoted πk(xi), is given by a typical multinomial logistic regression equation:

πk(xi)=P(ηik=1|xi)=exp(α0k+p=1Pαpkxip)k=1Kexp(α0k+p=1Pαpkxip) (4)

Here, α0k is an intercept representing the baseline log-odds of membership to class k, and αpk is a coefficient transmitting the effect of xip on πik (xi). Constraints on the values of αpk are generally required to identify the model. In the current application, αpk is constrained to zero for a reference class, although other parameterizations are possible (Huang & Bandeen-Roche, 2004).

Similarly, the class-specific endorsement probability may be considered as a function of covariates:

μjk(xi)=P(yij=1|ηik=1,xi)=exp(β0jk+p=1Pβpjkxip)1+exp(β0jk+p=1Pβpjkxip) (5)

where β0jk represents the baseline value of the linear predictor when all covariates are zero, and βpjk, p ≠ 0, is the coefficient relating the covariate xip to the probability of endorsing item yij, conditional on class. Note that, while we use a logistic regression formulation here given that we consider binary variables, a wide variety of link functions could be used for mixture models with all different class-specific distributions.

A key insight gleaned from Masyn (2017) is that the above regression-based formulation of DIF provides a framework in which to interpret covariate effects in mixture models. On the basis of the definition of uniform DIF as an effect of xip on yij that is constant over the levels of ηik, and on nonuniform DIF as an effect of xip on yij that varies over the levels of ηik, uniform and nonuniform DIF can be defined in mixture models on the basis of the parameters βpjk. If there are no nonzero direct effects of covariates, i.e., if

βpjk=0,k,p0. (6)

then there is no DIF. When nonzero values of covariate effects are constant across classes – i.e., when

βpjk=βpj,k,p0. (7)

uniform DIF exists on the basis of xip in the measurement of latent classes. By constrast, when nonzero values of covariate effects vary across classes – i.e., when

βpjkβpj,k,p0. (8)

nonuniform DIF exists on the basis of xip in the measurement of latent classes.

Based on Angoff’s matching principle (1993), uniform and nonuniform DIF are defined intuitively in the same way in mixture models and factor models: uniform DIF is a covariate effect on yi which is constant across levels of ηi, and nonuniform DIF exists when covariate effects vary across levels of ηi. Given the conceptual concordance of DIF across continuous and categorical latent variables models, we can draw on the literature of the former to inform hypotheses concerning the latter.

The current paper

In the current report, we extend prior findings to examine the effects of relatively low levels of DIF on class enumeration, parameter estimates, and individual classification values in latent class analysis. In two simulation studies, we generate data under realistic conditions in which DIF is of fairly small magnitude, spread over multiple predictors and items, and in different, potentially canceling directions. In Study 1, we examine the effects of relatively small DIF on class enumeration, in order to determine whether omitting even minor DIF during class enumeration leads to the same level of overextraction of classes observed in previous studies. Study 2 focuses on bias in parameters, class profiles, and individual classifications assuming classes have been correctly enumerated. The logic of Study 2 is to determine whether, even in cases in which DIF is not sufficiently severe to interfere with class enumeration, researchers may still draw incorrect inferences due to unmodeled DIF. Finally, Study 3 provides an empirical example of the effects of DIF on conclusions drawn from an LCA of diagnostic criteria for alcohol use disorder (AUD). Taken together, these three studies aim to extend the growing line of research into the biasing effects of DIF in mixture models to a broader set of inferences and a wider range of conditions.

Predictions based on prior work in categorical and continuous latent variable models.

Investigations of measurement invariance are motivated by the need to distinguish differences in measurement from differences in means or covariance structure of the latent variable itself. We draw on findings from the growing literature assessing omitted DIF in mixture models, as well as a number of findings from factor analysis and IRT, to make four predictions.

Prediction 1:

In Study 1, the exclusion of DIF effects will not interfere with class enumeration if an unconditional model is used in the class enumeration process, but classes will be overextracted if covariate effects on class membership are included. This prediction is informed by studies of class enumeration with direct effects of covariates, which generally find that class enumeration is unaffected by unmodeled DIF if covariate effects on class membership are excluded altogether, but severely compromised if these covariate effects are included but DIF is not (Nylund-Gibson & Masyn, 2016; Diallo, Morin, & Lu, 2017; Kim, Vermunt, Bakk, Jaki, & Van Horn, 2016). As described below, the DIF in this study is relatively minor by comparison to most other studies of class enumeration. We therefore expect the biasing effects of DIF to be somewhat less severe than in prior work.

Prediction 2:

In Study 2, omitting DIF will lead to biased estimates of covariate effects on class membership. This prediction primarily arises from recent studies which have shown such effects to be biased when DIF is present in the data-generating model but not included in the model (Kim et al., 2016; Asparouhov & Muthen, 2014). In particular, this finding would replicate the results of Kim and colleagues (2016), in which the magnitude of covariate effects on class membership was inflated. More tentative but specific predictions arise from the literature on continuous latent variable models. With some exceptions (Kaplan & George, 1995), studies of continuous latent variables have found that effect size estimates of between-group differences in factor means are biased when DIF is omitted (Chen, 2008; DeBeuckaler & Swinnen, 2011; Kuha and Moustaki, 2015). Particularly in the case of omitted intercept DIF, between-group differences in measurement parameters may manifest as spurious group differences on the underlying latent variable (DeBeuckaler and Swinnen, 2011).

Prediction 3:

In Study 2, the estimated class membership of individuals (i.e., class membership estimates η^ik) will be relatively robust to the omission of DIF. The logic of this prediction is that, though there may be bias in both covariate effects on class membership and within-class regression effects (as was observed in regression mixture modeling by Kim et al., 2016), these effects may combine to cancel one another out. In continuous latent variable models, η^ik most often takes the form of modal a posteriori (MAP) or expected a posteriori (EAP) factor scores (Bock & Aitkin, 1981; Mislevy & Bock, 1982). While inaccuracy in factor scores may result from omitting covariate effects entirely, factor scores are often fairly close to true values if small amounts of DIF are omitted, so long as covariate effects on the latent variable are included (Mislevy, 1983; Curran et al., 2016). The biasing effects of DIF on scores are also mitigated if DIF is compensatory – i.e., if DIF effects on multiple items in the same scale cancel one another out (Raju, Van der Linden, & Fleer, 1995; Chalmers, Flora, & Counsell, 2016).

Prediction 4:

In Study 2, estimates of class profiles (i.e., µjk (xi)) will generally be biased only in conditions in which misclassification is most severe – that is, when estimates of class membership η^ik are most biased. In mixture models, class profiles are critical to defining and interpreting the latent classes. However, conditional predicted values are not as central in continuous latent variable models, and there are few results pertaining to conditional predicted values of yij given omitted DIF in IRT and factor analysis. Therefore, this prediction is based not on continuous latent variable models but on the fact that the estimation of the class-specific function f (yij|ηik, xi) and class membership are estimated simultaneously, and dependent on one another. As such, it has been found that misspecifying f (yij|ηik, xi) may lead to bias in class membership ηik and vice versa (Bauer and Curran, 2003; 2004; van Horn et al., 2012).

We now test the above predictions through two simulation studies and apply DIF modeling strategies to an empirical example.

Study 1

A number of recent studies have provided evidence that omitted DIF may adversely affect the class enumeration process, particularly when models with covariate effects on class membership are used to select the number of classes (Nylund-Gibson & Masyn, 2016; Kim et al., 2016; Diallo, Morin, & Lu, 2017). Given that these other studies have demonstrated class enumeration problems under relatively large DIF, the goal here is to determine if fit indices more reliably select the correct number of classes under a few heretofore unexamined, highly realistic conditions. In particular, we examine the effects of relatively small DIF which is spread across multiple items and covariates, a condition which is often encountered in practice.

Data generation

All data generation was conducted using R, version 3.4.0. Data were generated from a latent class model with K = 2 classes according to Equations 4 and 5. Data generation followed three steps. First, covariates (p = 1, … , P ) were generated. Second, these covariates were used to generate class membership variables ηik (k = 1, 2) by using Equation 4 to generate class membership probabilities and sampling from a binomial distribution with these probabilities. Finally, covariates and class membership were then used to generate items yij according to Equation 4. Factors held constant across cells included the configuration and magnitude of covariate effects on class membership, the configuration of covariate effects on items (i.e., DIF effects), and overall sample size. Four factors varied across cells: baseline class separation, class membership probabilities, magnitude of DIF effects, and uniformity of DIF effects.

Factors held constant across cells.

Covariate effects on class membership.

The same configuration of covariate effects held across cells. In all cells, there were P = 4 normally distributed covariates. Across all cells, covariates had unit variance and were weakly correlated with one another at .3. Covariates xi1, xi2, and xi3 increased the probability of membership to class 1, with α11 = α21 = α31 = .7, and covariate xi4 was unrelated to class membership, with α41 = 0. Though categorical grouping variables are of interest, continuous predictors were simulated to facilitate the generation of correlated covariate effects.

Configuration of DIF effects.

Though baseline class membership logits β0jk, and magnitude and uniformity of DIF effects βpjk all varied by cells as described below, items were generated with the same general pattern of DIF across cells. The location and sign of all DIF effects are shown in Table 1. The placement and signs of DIF effects are meant to simulate realistic patterns of DIF with respect to both covariates xip and indicators yij. Covariates xi1, xi2, and xi4 all exert DIF effects on items. Thus, whereas covariate xi3 has class membership effects but no DIF, covariate xi4 has DIF but no class membership effects. Each covariate affects two items. For covariates xi1 and xi4, both DIF effects are positive. For covariate xi2, DIF effects are only ever negative. Thus, covariates xi1 and xi4 only ever increase the overall item endorsement probability. Similarly, covariate xi2 only ever decreases the overall item endorsement probability.

Table 1.

Study 1: Configuration of endorsement probability parameters across all conditions

Baseline endorsement logit
Sign of DIF parameter in both classes
Class 1 (β0j1) Class 2 (β0j2) β1jk β2jk β3jk β4jk


Low class separation
 Item 1 0.33 −0.33
 Item 2 0.50 −0.50
 Item 3 0.67 −0.67 + +
 Item 4 0.50 −0.50
 Item 5 0.33 −0.33 +
 Item 6 0.50 −0.50
 Item 7 0.67 −0.67
 Item 8 0.50 −0.50 +
Medium class separation
 Item 1 0.67 −0.67
 Item 2 1.00 −1.00
 Item 3 1.33 −1.33 + +
 Item 4 1.00 −1.00
 Item 5 0.67 −0.67 +
 Item 6 1.00 −1.00
 Item 7 1.33 −1.33
 Item 8 1.00 −1.00 +
High class separation 1.00 −1.00
 Item 1
 Item 2 1.50 −1.50
 Item 3 2.00 −2.00 + +
 Item 4 1.50 −1.50
 Item 5 1.00 −1.00 +
 Item 6 1.50 −1.50
 Item 7 2.00 −2.00
 Item 8 1.50 −1.50 +

Similarly, DIF was spread over indicators yij such that 3/8 items have DIF, and each DIF item has effects from 2/4 covariates. For one DIF indicator ( yi3), both DIF effects were positive. For the other DIF indicators (yi5 and yi8) the two DIF effects went in opposite directions, with one positive effect and one negative effect. This is done in order to create the realistic, compensatory pattern of DIF often seen in practice, in which DIF effects in opposite directions may cancel one another out. In continuous latent variable models, increasing evidence has shown that the biasing effects of DIF on factor scores may be mitigated when DIF effects are spread across items, especially if effects are both positive and negative, regardless of the magnitude of any one DIF effect (Raju, van der Linden, & Fleer, 1995; Chalmers, Counsell, & Flora, 2016; Curran et al., 2016).

Number of items.

The total number of items was held constant at J = 8.

Total sample size.

Total sample size was held constant across cells at N = 500, a sample size consistent with generally good power in mixture models (Nylund, Asparouhov, and Muthén, 2007).

Factors varying across cells.

Four factors are manipulated: overall class membership probabilities (2 levels); baseline class separation (3 levels); type and magnitude of DIF (6 levels). This yields 36 unique cells. For each cell, R = 100 replications are evaluated.

Mixing proportion.

The mixing proportion is manipulated between cells with 2 levels: 50%/50% and 20%/80%. These proportions correspond to values of α01 = 0 and α01 = −1.8 respectively. These values are identical to the ones chosen by Nylund-Gibson and Masyn (2016), who showed that the effects of omitted DIF on class enumeration were greater in the case of unequally-sized classes.

Baseline class separation.

Baseline class separation is manipulated between cells, with 3 levels: low, medium, and high class separation. Baseline logits β0jk for each class are shown in Table 1. In classes 1 and 2 respectively, values of baseline logits β0j1 and β0j2 were +1.5 and −1.5 in the high class separation condition, +1.0 and −1.0 in the medium class separation condition, and +0.5 and −0.5 in the low class separation condition. In models with no DIF, low, medium, and high class separation conditions correspond to entropy values of .47, .79, and .94 respectively when classes are equally sized, and .58, .83, and .95 when classes are unequally sized. The medium and high class separation conditions were chosen to correspond to those used in previous studies of class enumeration (Nylund-Gibson et al., 2007; Nylund-Gibson and Masyn, 2016). The low class separation condition is meant to correspond to a difficult but realistic case a researcher might encounter in practice using self-report items. The logic of including this condition is to see whether the effects of DIF on class enumeration and parameter bias are exacerbated by overall low levels of class separation.

Type and magnitude of DIF.

There were six conditions encompassing six combinations of DIF magnitude and uniformity, with three uniform DIF conditions (small uniform, medium uniform, and large uniform DIF) and three nonuniform DIF conditions (nonuniform DIF with small DIF in class 1 and medium DIF in class 2; small DIF in class 1 and large DIF in class 2; and medium DIF in class 1 and large DIF in class 2). The values of βpjk corresponding to small, medium, and large DIF were equal to ±0.10, ±0.30, and ±0.50, respectively. These values were chosen to be of smaller magnitude than the DIF in comparable studies using LCA, which used values ranging from βpjk = .5 to βpjk = 1.0 (Asparouhov & Muthen, 2014; Nylund-Gibson & Masyn, 2016). The logic of this choice was to generate DIF that is generally not large enough to drastically affect class enumeration, in order to determine whether DIF still biases results even when the correct number of classes was chosen.

Fitted models

All models were fitted using the expectation maximization (EM) algorithm as implemented in Mplus software, with 1000 and 100 random starts in the first and second stages of estimation, respectively.

Both models on the left panel of Figure 1 were fitted to the data, with between K = 1 and K = 4 classes. This includes Model A, sometimes referred to as the unconditional model, in which there are no covariate effects on class membership; and Model B, sometimes referred to as the conditional model, in which there are covariate effects on class membership but no DIF effects. Importantly, for each value of K, the model with the correct specification of DIF was also fit to the data to confirm that a full-DIF model chooses the correct number of classes under all conditions. These results are not tabulated here because, as expected, virtually all indices chose the correct number of classes when the DIF model was properly specified.

Figure 1.

Figure 1.

Study 1: Summary of all models fitted.

For each cell, the values of three widely-used fit indices were recorded under each value of K. The Bayesian Information Criterion (BIC; Schwarz, 1978) weighs the loglikelihood of a given model against the number of parameters, balancing fit and parsimony. Lower values of BIC represent better correspondence between model and data. Additionally, two likelihood ratio tests, the Lo-Mendell-Rubin likelihood ratio test (LMR; Vuong, 1989; Lo, Mendell, Rubin, 2001), and the bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000) were consulted. These two tests compare a model with K classes to one with K − 1 classes, and differ in how they approximate the distribution of the test statistic. The LMR and BLRT are applied to models with increasing numbers of classes, and the chosen solution is the one with the highest value of K for which the fit of a K − 1-class model is significantly worse than that of a K-class model.

Results

Class enumeration under Model A.

All replications of Model A, the unconditional model with K between 1 and 4, converged. The left panel of Table 2 summarizes the proportion of times the correct number of classes was chosen by all four fit statistics for the unconditional model, Model A. Results are shown for low and high class separation, as results for the medium class separation case were intermediate between these two extremes. Additionally, Online Supplement 1 contains a table of the fit indices’ performance in terms of the mean number of classes chosen by each fit index across replications, to get a sense of whether classes were over- or under-extracted by each of the indices. Three primary findings emerged. First, the performance of all indices varied most by class separation and mixing proportion, with underextraction of classes only ever being observed under low class separation, particularly when the classes were unequally sized. Second, DIF exerted virtually no effect on class enumeration, given that few if any differences in class enumeration were observed on the basis of DIF. Finally, the BIC correctly chose the two-class solution in all but the low class separation condition; in this condition, the BIC erroneously favored a one-class solution but the BLRT reliably chose two classes.

Table 2.

Study 1: Proportion of correct class enumeration processes

Model A
Model B
BIC LMR BLRT Proper solutions with K = 3 BIC LMR BLRT



Equally-Sized Classes
 Low class separation
  Uniform Small DIF 0.51 0.63 0.97 0.33
  Uniform Medium DIF 0.58 0.71 0.95 0.56 1 0.93 0.80
  Uniform Large DIF 0.77 0.83 0.93 0.83 1 0.85 0.17
  Non-Uniform Small/Medium DIF 0.63 0.72 0.94 0.36
  Non-Uniform Small/Large DIF 0.6 0.76 0.93 0.55 1 0.91 0.69
  Non-Uniform Medium/Large DIF 0.63 0.77 0.91 0.67 1 0.99 0.43
 Medium class separation
  Uniform Small DIF 1 0.81 0.92 0.47
  Uniform Medium DIF 1 0.83 0.91 0.74 1 0.92 0.76
  Uniform Large DIF 1 0.86 0.96 0.93 1 0.81 0.34
  Non-Uniform Small/Medium DIF 1 0.86 0.95 0.58 1 0.86 0.81
  Non-Uniform Small/Large DIF 1 0.82 0.92 0.84 1 0.86 0.52
  Non-Uniform Medium/Large DIF 1 0.84 0.93 0.84 1 0.83 0.38
 High class separation
  Uniform Small DIF 1 0.68 0.95 0.58 1 0.98 0.98
  Uniform Medium DIF 1 0.75 0.9 0.82 1 0.87 0.82
  Uniform Large DIF 1 0.75 0.98 0.93 1 0.83 0.40
  Non-Uniform Small/Medium DIF 1 0.74 0.94 0.77 1 0.88 0.88
  Non-Uniform Small/Large DIF 1 0.72 0.89 0.83 1 0.71 0.48
  Non-Uniform Medium/Large DIF 1 0.77 0.96 0.92 1 0.87 0.55
Unequally-Sized Classes
 Low class separation
  Uniform Small DIF 0.06 0.3 0.87 0.2
  Uniform Medium DIF 0.08 0.36 0.91 0.36
  Uniform Large DIF 0.17 0.41 0.88 0.81 1 0.81 0.25
  Non-Uniform Small/Medium DIF 0.06 0.37 0.87 0.31 1 0.84 0.46
  Non-Uniform Small/Large DIF 0.11 0.42 0.82 0.59
  Non-Uniform Medium/Large DIF 0.11 0.47 0.86 0.7 1 0.85 0.37
 Medium class separation
  Uniform Small DIF 1 0.86 0.94 0.3
  Uniform Medium DIF 1 0.84 0.97 0.72 1 0.85 0.68
  Uniform Large DIF 1 0.78 0.9 0.93 1 0.73 0.03
  Non-Uniform Small/Medium DIF 1 0.92 0.95 0.63 1 0.87 0.73
  Non-Uniform Small/Large DIF 1 0.77 0.93 0.96 1 0.78 0.14
  Non-Uniform Medium/Large DIF 1 0.76 0.94 0.99 1 0.66 0.13
 High class separation
  Uniform Small DIF 1 0.64 0.92 0.39
  Uniform Medium DIF 1 0.68 0.99 0.69 1 0.81 0.72
  Uniform Large DIF 1 0.78 0.98 0.97 1 0.71 0.19
  Non-Uniform Small/Medium DIF 1 0.74 0.95 0.64 1 0.83 0.78
  Non-Uniform Small/Large DIF 1 0.77 0.98 0.96 1 0.71 0.22
  Non-Uniform Medium/Large DIF 1 0.72 0.94 0.94 1 0.70 0.13

One possibility, particularly given that DIF did not affect the BIC’s performance, that the BIC’s underperformance under low class separation simply occurred as a byproduct of the relatively low sample size.1 To test this hypothesis, we assessed the performance of the BIC in all six conditions (i.e., all six combinations of DIF) with poorly separated, unequally sized classes but a sample size of N = 2000. Across all these conditions, in which the BIC showed the poorest performance with N = 500, the BIC correctly chose the 2-class solution 100% of the time. This finding provides preliminary support for the hypothesis that the BIC’s underextraction of classes was due to low sample size.

Class enumeration under Model B.

Improper solutions in Model B were quite frequent when the 3- or 4-class models were fit. Improper solutions were more frequent in the case of low class separation and smaller amounts of DIF. In particular, 3- or 4-class solutions were often associated with non-positive definite first order product derivative matrices, indicating empirical underidentification. In applied settings, researchers often take estimation problems in a K-class model as evidence that no more than K − 1 classes are empirically supported. Therefore, cases in which these problems occurred may be ones in which researchers might correctly choose a 2-class solution, even though evidence from fit indices is unavailable. The right side of Table 2 presents class enumeration results for Model B. This includes the rate of proper solutions in the 3-class case, as well as the percentage of times each fit index found the correct number of solutions in all cells with rates of proper solutions greater than 50%. Here, both the LMR and BLRT overextracted classes considerably more frequently than in the unconditional model. The BIC consistently chose the two-class model, regardless of the magnitude of the unmodeled DIF. More generally, as in the unconditional model, DIF did not make a strong difference in class enumeration, as over- and under-extraction were not any more severe in the larger DIF conditions.

Summary

In Study 1, DIF generally did not interfere with class enumeration. As has been observed elsewhere, the BIC generally underextracted classes under Model A when class separation was poor (Diallo, Morin, & Lu, 2017), in which case the BLRT showed adequate performance. Importantly, the underextraction of classes in these cases had little to do with the magnitude of DIF, and appears to be a byproduct of sample size. Model B experienced a variety of convergence problems when solutions with more than 2 classes were fit. In a broader sense, findings mirrored those examining class enumeration more generally in that the BIC was least prone to overextraction of classes when Model B was fit (e.g., Nylund, Asparouhov, & Muthen, 2007; Li & Hser, 2011; Nylund-Gibson & Masyn, 2016).

Study 2

Study 1 replicated and extended previous findings on class enumeration in the presence of unmodeled DIF in LCA, indicating that DIF of small magnitude exerts few effects on class enumeration and confirming that the correct number of classes may be chosen when an unconditional model is used for class enumeration by either the BIC or BLRT. Thus, Study 2’s primary focus was on the next question researchers might face in interpreting mixture model results: given that the correct number of classes has been chosen, are there still consequences of leaving DIF unmodeled? First, we assessed the accuracy of model parameter estimates, including covariate effects on class membership and covariate effects on endorsement probabilities. Second, we assessed the accuracy of quantities pertaining to class membership, including the prevalence of Class 1, as well as the classification accuracy of individuals. Third, we assessed the accuracy of average item endorsement probabilities within a given class.

Data generation

Data-generating conditions were the same as in Study 1. Due to the difference in computational burden between Study 1 and Study 2 (i.e., in Study 1, four models with K = 1 to K = 4 had to be fit for every one model fit to the data assuming the correct number of classes in Study 2), we examined more replications, R = 500, in Study 2.

Fitted models

Because our focus was on parameter bias assuming class enumeration had already been done correctly, we assume the correct number of classes (K = 2) for all fitted models. Figure 1 shows the three models which were fit to the data. These included: the conditional model, which assumes covariate effects on the latent class variable but not the items (Model B in Figure 1); the uniform DIF fitted model, which estimates DIF effects βpjk on all DIF-containing items (i.e., all items which truly have either uniform or nonuniform DIF) but holds them to be class-invariant (Model C in Figure 1); and the nonuniform DIF fitted model, which is the full model containing class-varying DIF effects βpjk (Model D in Figure 1). Note that in data generating cases with only uniform DIF, Model C is the correct model, whereas Model D contains redundant class-varying effects; therefore, Model D is correct but overparameterized. By contrast, for data-generating cases with nonuniform DIF, Model D is the correct model.

It is of interest to establish whether the inclusion of any covariates, even those whose effects have been misspecified (as they are in Model B for all cells, and in the uniform DIF fitted model for all cells with nonuniform DIF), may absorb omitted DIF effects to produce relatively unbiased estimates of model parameters and scores. Curran et al. (2016) found some estimates of scores to be highly accurate in the analogous case (i.e., covariate effects on class membership included but DIF omitted) in continuous latent variable models. The uniform DIF fitted model was included in order to determine whether simply including uniform DIF would absorb the biasing effects of covariates xip, even if the nature of DIF was misspecified. This is particularly important given that the model with uniform DIF only is easier to identify (Huang & Bandeen-Roche, 2004), and that researchers might favor this model, despite its being misspecified, for ease of estimation.

Importantly, mixture models are only identified up to the permutation of class labels, which makes label switching a potential issue in this and all other simulations of mixture models. However, correct interpretation of all of the quantities of interest –i.e., covariate effects on class membership; individual classifications; and class-specific profiles– is contingent on correctly identifying each class. Decision rules similar to those presented by Tueller, Drotar, and Lubke (2011) were used to defend against potential class label switching.

Results

Parameter bias.

Table 3 shows relative bias for covariate effects on class membership in Model B and Model C. Relative bias is shown for the effects of covariates xi1, xi2, and xi3 (α11, α12, and α13, respectively). Rows index data-generating conditions, and columns show the relative bias for each coefficient under Model B and Model C. Relative bias was almost identical for Models C and D, even in cases in which Model C was misspecified (i.e., in the presence of nonuniform DIF in the data-generating model). Therefore, bias is only presented for the two potentially misspecified models, Model B and Model C.

Table 3.

Study 2: Relative bias in covariate effects on class membership

α11
α12
α13
Fitted Model
Fitted Model
Fitted Model
Model B Model C Model B Model C Model B Model C



Equally-Sized Classes
 Low Class Separation
  Uniform Small DIF 0.282 0.068 −0.005 0.124 0.099 0.108
  Uniform Medium DIF 0.715 0.115 −0.279 0.13 0.034 0.095
  Uniform Large DIF 1.147 0.108 −0.592 0.083 −0.092 0.123
  Non-Uniform Small/Medium DIF 0.494 0.112 −0.169 0.088 0.076 0.088
  Non-Uniform Small/Large DIF 0.744 0.186 −0.329 0.106 0.027 0.109
  Non-Uniform Medium/Large DIF 0.925 0.108 −0.457 0.081 −0.035 0.104
 Medium Class Separation
  Uniform Small DIF 0.071 0.017 −0.014 0.021 0.044 0.043
  Uniform Medium DIF 0.196 0.03 −0.09 0.01 0.038 0.034
  Uniform Large DIF 0.311 0.001 −0.137 0.024 0.045 0.042
  Non-Uniform Small/Medium DIF 0.132 0.024 −0.035 0.033 0.007 0.005
  Non-Uniform Small/Large DIF 0.19 0.026 −0.091 0.015 0.018 0.017
  Non-Uniform Medium/Large DIF 0.271 0.04 −0.102 0.03 0.033 0.031
 High Class Separation
  Uniform Small DIF 0.042 0.027 0.009 0.018 0.012 0.012
  Uniform Medium DIF 0.051 0.006 −0.002 0.027 0.025 0.024
  Uniform Large DIF 0.098 0.015 −0.022 0.025 0.016 0.013
  Non-Uniform Small/Medium DIF 0.053 0.023 0.000 0.018 0.022 0.02
  Non-Uniform Small/Large DIF 0.070 0.023 −0.023 0.009 0.023 0.022
  Non-Uniform Medium/Large DIF 0.082 0.018 −0.027 0.013 0.032 0.031
Unequally-Sized Classes
 Low Class Separation
  Uniform Small DIF 0.450 0.309 0.045 0.269 0.229 0.351
  Uniform Medium DIF 0.892 0.204 −0.440 0.173 −0.055 0.114
  Uniform Large DIF 1.230 0.208 −0.910 0.221 −0.261 0.246
  Non-Uniform Small/Medium DIF 0.741 0.213 −0.378 0.273 −0.021 0.161
  Non-Uniform Small/Large DIF 1.145 0.032 −0.884 0.29 −0.219 0.193
  Non-Uniform Medium/Large DIF 1.198 0.127 −0.923 0.292 −0.235 0.255
 Medium Class Separation
  Uniform Small DIF 0.080 0.023 0.012 0.052 0.000 0.001
  Uniform Medium DIF 0.201 0.013 −0.092 0.030 0.026 0.030
  Uniform Large DIF 0.375 0.033 −0.171 0.041 −0.008 0.011
  Non-Uniform Small/Medium DIF 0.158 0.002 −0.07 0.044 0.014 0.018
  Non-Uniform Small/Large DIF 0.277 −0.008 −0.111 0.103 −0.004 0.016
  Non-Uniform Medium/Large DIF 0.319 0.004 −0.156 0.055 0.019 0.033
 High Class Separation
  Uniform Small DIF 0.037 0.022 0.006 0.016 0.041 0.041
  Uniform Medium DIF 0.082 0.03 −0.007 0.025 0.019 0.018
  Uniform Large DIF 0.130 0.036 −0.015 0.038 0.035 0.034
  Non-Uniform Small/Medium DIF 0.066 0.023 −0.01 0.022 0.002 0.002
  Non-Uniform Small/Large DIF 0.079 0.006 −0.022 0.031 0.018 0.019
  Non-Uniform Medium/Large DIF 0.106 0.025 −0.043 0.01 0.019 0.020

The pattern of relative bias shown in Table 3 demonstrates three primary findings. First, low class separation was associated with considerably more bias across all other conditions. Second, bias was greatest under Model B, whereas bias was fairly small for most cells under Model C. Finally, with the exception of the low class separation condition with unequally-sized classes, bias was generally less severe in the effect of xi3, which has no DIF, on class membership (α13), even when Model B was fit. A more complex picture emerges with respect to differences between covariates under Model B. Whereas relative bias was uniformly negative for the effects of covariate xi2 (which has a negative DIF effect in the population) across all fitted models, it was positive for the effects of covariate xi1, which has positive DIF effects in the population) when Model B was fit.

Null hypothesis rejection rates for each covariate effect on class membership generally mirrored the findings for relative bias; these rates are shown in Online Supplement 2. When class separation was medium or high, power to detect the effects of xi1, xi2, and xi3 was above .9 for all data-generating conditions under both Models B and C. However, when class separation was low, power to detect these effects dropped off significantly for both models. DIF did not strongly affect either model’s power to detect these effects. For cases with low and medium class separation, differences between Model B and Model C were most noticeable in the null hypothesis rejection rate for the null effect of xi4. Model C detected a spurious effect of xi4 less than 5% of the time in most cases, even when class separation was low and DIF was severe. However, Model B detected a spurious effect of xi4 an average of 37.5% of the time under low class separation and 11.4% of the time under medium classification. The range within these conditions was considerable, with between 5.6% and 72.0% of effects detected in the low class separation condition, and between 5.0% and 23.4% of effects detected in the medium class separation condition. Within these conditions, the magnitude of DIF accounted for much of this range, with more spurious effects being detected when the data-generating model contained medium or large uniform DIF, or medium/large nonuniform DIF.

In general, recovery of covariate effects on classes was strong in the uniform DIF fitted model, even when this model was misspecified (i.e., when nonuniform DIF truly existed in the population). However, covariate effects on items were biased in this case, as the value of βpjk, which truly varies across class, was erroneously estimated to be invariant across the two classes. Thus, as expected, estimated values of βpjk under the uniform DIF fitted model were generally intermediate between the two classes’ true values of βpjk when nonuniform DIF was present in the data-generating model. As one example, we consider bias in the estimated DIF parameter for covariate xi1 on item yi3, denoted β131 and β132 for classes 1 and 2 respectively, collapsing across class separation, and mixing proportion. In the small/medium nonuniform DIF condition, average relative bias in β^131 and β^132 was −0.21 and 0.84, respectively. Bias was predictably more severe in the small/large nonuniform DIF condition, where true values of β131 and β132 are further apart (relative bias = −0.28 in β^131 and 1.41 in β^132).

Class membership.

We next assessed the accuracy of individual classifications. Though estimates of class membership may be estimated a number of ways, it is typically based on the posterior probability of class membership, which is calculated as:

τi1=P(yi|ηi1=1,xi)P(ηi1=1|xi)P(yi|ηi1=1,xi)P(ηi1=1|xi)+P(yi|ηi2=1,xi)P(ηi2=1|xi) (9)

Here, τi1 is the probability that subject i is a member of class 1, and τi2 = 1 − τi1. Values of this posterior probability were calculated using estimated model parameters under each cell, as well as under true model parameters. Estimated and true posterior probabilities were denoted τ^i1 and τi1, respectively.

Individuals were given modal class assignments based on their posterior probabilities (Nagin, 1999) on the basis of τ^ik. These binary variables, denoted η˜ik, take a value of 1 if subject i’s highest posterior probability is for class k and 0 otherwise. These estimated modal assignments η˜ik were compared to true classifications ηik using the adjusted Rand Index (ARI; Hubert and Arabie, 1985). The ARI is a measure of the concordance between two hard partitions2 which adjusts for chance. The ARI ranges from 0 to 1, with a value of 0 representing random chance and a value of 1 representing perfect agreement.

ARI values are shown for each fitted model in Figure 2. Lines shown in red (with circular points), green (with square points), and blue (with triangular points) indicate values of ARI under Models B, C, and D, respectively. Three trends emerge when examining Figure 2. First, misclassification was far more severe in the case of low class separation; for all cells with medium or high levels of class separation, values of ARI were over .8 for most cells even under significant model misspecification. Second, among cells with low class separation, misclassification was greatest in the presence of larger DIF, and was exacerbated by unequally-sized classes. Finally, even if nonuniform DIF was present in the data, classification accuracy mainly increased between Model B and Model C, with no noticeable difference in accuracy between Model C and Model D.

Figure 2.

Figure 2.

Study 1: Adjusted Rand Indices comparing true and estimated classifications under all fitted models.

Class-specific item endorsement probabilities.

DIF in mixture models presents researchers with a problem from the standpoint of interpretation: because individual covariates affect the expected values of indicators, class-specific expected values are not dependent exclusively on class membership. Therefore, when DIF is modeled, there is no one class-specific profile that represents the prototypical responses for that class.

One possible solution is to obtain a weighted mean of each class’ expected value, using estimated posterior probabilities τik as weights. After computing estimated class-specific endorsement probabilities and posterior probabilities of class membership using Equations 5 and 9 respectively, one may compute the marginal endorsement probability for a given class, summing over individuals, as:

E(yij|ηik=1)=i=1Nτikμjk(xi)i=1Nτik (10)

This marginal endorsement probability represents the average predicted probability of endorsing item yij, weighted by the probability that subject i is a member of class k given their observed data. When Model B is fit, class-specific endorsement probabilities do not depend on covariates at all. Therefore, the model-implied endorsement probability is the same for all members of a given class. Thus, when Model B is fit, Equation 5 above reduces to Equation 3.

The class-specific probabilities of endorsing items yij were calculated for both classes under each fitted model as well as the true model. Relative bias of each item’s endorsement probability, compared with the corresponding true value, was calculated. Figure 3 shows values of relative bias for items with DIF (solid lines) and without DIF (dotted lines) in Class 2.3 The lines shown in red (with circular points), green (with square points), and blue (with triangular points) in this figure indicate values of relative bias under the Models B, C, and D, respectively. Bias was largely confined to Model B; under no circumstances did relative bias exceed 10% under the uniform DIF or nonuniform DIF fitted models. As with individual classifications, marginal endorsement probabilities were most compromised with poorly separated classes, and, to a lesser extent, unequally-sized classes. Interestingly, in a few of the cells with the highest level of misclassification – cases with poor classification and large DIF in one or both classes – bias spread even to items with no DIF.

Figure 3.

Figure 3.

Study 2: Relative bias in class-specific endorsement probabilities for DIF and non-DIF items in Class 2 under all fitted models.

Although we have so far considered bias pooling across items, it is more typical for researchers to examine class profiles. The average model-implied marginal endorsement profile is thus shown in Figure 4 for four representative cases: low class separation with large uniform DIF and high class separation with large uniform DIF, each with equally-sized versus unequally-sized classes. As shown, both Model C and Model D yielded generally unbiased estimates for the class profiles, but, when class separation was low, Model B yielded profiles in which the differences in item 3 between classes were inflated and other differences between items were attenuated. When class separation was high, all fitted models produced unbiased estimates of class profiles.

Figure 4.

Figure 4.

Study 2: Two example profile plots under all fitted models for the medium uniform DIF/low class separation/8 item condition.

Summary

The goal of Study 2 was to determine the consequences of omitting DIF in latent class analysis. Estimates of covariate effects on class membership were generally biased in the direction of the omitted DIF effect - for example, if a covariate increased the probability of both class membership and item endorsement probabilities, the class membership effect was positively biased when DIF was omitted. In general, misclassification and inaccuracy of class-specific endorsement probabilities were both worst when classes were poorly separated. The close correspondence between these two sets of results underscores the fact that the estimation of class-specific parameters and class membership are closely related, such that inaccuracy in one necessarily leads to inaccuracy in the other. Perhaps most surprising is the finding that fitting a uniform DIF model in the presence of nonuniform DIF generally yielded results that were no more biased than the correct nonuniform DIF model. The predictable exception to this finding is that DIF effects themselves are biased when nonuniform DIF is erroneously constrained to be uniform; these parameters are seldom of substantive interest. Taken together, the findings offer the general impression that properly specifying the class-varying (i.e., nonuniform) DIF is not as important as including DIF in the first place. Including uniform DIF, even if the uniform DIF model is misspecified, is generally sufficient to produce unbiased results.

Study 3

The dual goals of this analysis are to demonstrate the use of LCA with DIF in a real dataset, and to use mixture models to assess potential DIF in the measurement of alcohol use disorder (AUD). The two most commonly used sets of diagnostic criteria for AUD come from the International Statistical Classification of Diseases (ICD-10; WHO, 1992) and the Diagnostic and Statistical Manual-5 (APA, 2013). These criteria may be considered as binary items which intend to measure some categorical latent variable representing AUD diagnosis. Here we fit an LCA to obtain discrete groups of individuals based on patterns of AUD diagnostic criteria, and assess DIF in the measurement of these criteria on the basis of a number of predictors.

Methods

Study design.

Data come from the Real Experiences and Lives in the University (REAL-U) study, a study of the measurement of alcohol use disorder (AUD) and related constructs in an undergraduate sample. One of the principal goals of the REAL-U study is to test the biasing effects of subtle differences across studies in measurement, and to assess how well data harmonization methods such as integrative data analysis (IDA; Bauer & Hussong 2009; Curran & Hussong, 2009; Hussong, Curran, & Bauer, 2013) can detect and mitigate this bias. The study consisted of two visits, separated by a period of two weeks. At each of the visits, subjects completed one of two test batteries, each consisting of a number of different measures of constructs related to alcohol and substance use, as well as associated psychosocial constructs.

Sample.

Participants (N = 411; 46.7% male) were undergraduates at a large southeastern research university, recruited by email. The sample was relatively ethnically diverse, with 58.7%, 22.1%, 10.5%, 0.5% of participants respectively identifying as White, Black, Asian, and American Indian/Native American; 6.1% and 2.9% of participants identified as more than one race or some other race. Of these participants, 3.0% identified as Hispanic or Latino. The mean age of alcohol initiation was 17.26 years (SD =1.814). In addition, 28.6% of the participants were first year students, 20.5% were sophomores, 20% were juniors, 28.9% were seniors, and 2% were non-students, did not specify or were graduate students. Participants were required to have consumed alcohol in the past year, as assessed by a screener.

Measures.

In both batteries, lifetime AUD was measured using the 12 DSM criteria listed in Table 4. This list of criteria includes all items used in either DSM-IV or DSM-5, and thus both legal trouble (Item 3; discarded in DSM-5) and craving (Item 12; new to DSM-5) were initially included. However, Items 3 (legal trouble) and 8 (gave up activities for drinking) showed sufficiently low endorsement probabilities that they were not retained, given the sensitivity of LCA to extremely low-endorsement items at small sample sizes. Thus, the final dataset consisted of only ten items.

Table 4.

Study 3: Alcohol Use Disorder (AUD) criteria used in the current study.

Item Criterion Notes
1 Role impairment
2 Used in dangerous situations
3 Legal problems Dropped from DSM-5
4 Drinking despite problems with family and friends
5 Uncontrolled drinking
6 Unsuccessful quit attempts
7 Spent a lot of time drinking
8 Gave up activities for drinking
9 Continued use despite health or psychological problems
10 Tolerance
11 Withdrawal symptoms
12 Craving New to DSM-5

Data analytic strategy.

A number of separate LCAs, described in greater detail below, were fit. First, based on prior results indicating that omitted DIF effects lead to overextraction of classes when class enumeration is performed with a conditional model, class enumeration was performed in models without covariates (Nylund-Gibson & Masyn, 2016; Kim et al., 2014; Diallo, Morin, & Lu, 2017). Then, an impact-only model, containing only the effects of covariates on class membership, was fit using the number of classes chosen at the previous step. The covariates included were age (centered at age 21), gender (dummy coded; 1 = male and 0 = female), race (dummy coded; 1 = white; 0 = all other races), and study visit (dummy coded; 1 = visit 2; 0 = visit 1).

Finally, the presence and location of DIF was assessed using the MIMIC-LCA method of Masyn (2017). The MIMIC-LCA procedure used to locate DIF is described in Online Supplement 3.

Results

Class enumeration.

Models with between K = 1 and K = 5 classes were fit to the data; however, the 5-class solution was empirically underidentified. Table 5 shows fit indices for different numbers of classes. The bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000) favored a 4-class model, which consisted of one class with fewer than 10 cases. A 2-class model was favored by both the BIC and the LMR. Based on the BIC’s well-documented ability to extract the correct number of classes when DIF is omitted both in this study and elsewhere (Nylund-Gibson & Masyn, 2016), it was given extra weight when adjudicating between solutions with different numbers of classes. For this reason, combined with the interpretability and stability of the 2-class solution, we proceed with this model moving forward.

Table 5.

Study 3: Class enumeration statistics for unconditional models.

K #Param LL BIC AIC LMR LMR p.val BLRT BLRT p.val

1 12 −1754.6 3581.33 3533.1
2 25 −1530.8 3212.13 3111.67 441.787 < .001 447.434 < .001
3 38 −1505.9 3240.6 3087.89 49.152 0.0145 49.78 < .001
4 51 −1487.6 3282.08 3077.13 36.296 0.2015 36.76 < .001
5 64 −1476.6 3338.4 3081.21 21.641 0.362 21.917 0.3333

Initial model.

Following the fitting of the unconditional model, which favored a 2-class solution, a conditional model – i.e., a model with covariate effects on class membership but no DIF – with K = 2 was fit. Item endorsement patterns in the two-class solution are shown in Figure 5 for this initial model. The majority of the sample (71.3%) fell into a class (the “low-symptoms” class) characterized by low levels of AUD symptom endorsement. The remainder of the sample (28.7%) fell into a class (the “high-symptoms” class) characterized by higher levels of AUD symptom endorsement. In particular, a majority of individuals in the high-symptoms class endorsed Items 5 (uncontrolled drinking) and 10 (tolerance), which were endorsed by roughly 95% and 70% of individuals in this class respectively. Additionally, roughly half the members of this class endorsed Items 1 (role impairment), 2 (drinking in dangerous situations), and 9 (continued drinking despite health or psychological problems), and between one quarter and one third of members in this class endorsed Items 6 (unsuccessful quit attempts) and 7 (spent a lot of time drinking). Given cutoffs of 2, 5, and 8 item endorsements required for mild, moderate, and severe AUD diagnoses respectively, the modal member of this class met criteria for a least mild AUD.

Figure 5.

Figure 5.

Study 3: Marginal endorsement probabilities under the impact-only and DIF models.

All four covariate effects on class membership were significantly different from zero. Male participants were more likely to be in the high-symptoms class (OR = 1.91, 95%CI = [1.13, 3.22]), as were white participants (OR = 2.04, 95%CI = [1.06, 3.91]) and older participants (OR = 1.36, 95%CI = [1.12, 1.66]. Finally, participants were more likely to be classified into the high-symptoms class at Visit 1 than Visit 2 (OR = 3.08, 95%CI = [1.76, 5.38]).

Full model.

The MIMIC-LCA procedure detailed by Masyn (2017) was applied separately for each covariate to find DIF and class membership effects of each of the four covariates. A final model containing all of the covariate effects found to be significantly different from zero in the MIMIC-LCA algorithm was then fit. Marginal class-specific item endorsement probabilities for this model given by Equation 8 are shown in Figure 5. The effects of all four covariates on class membership remained significantly different from zero. As in the initial model, the probability of membership to the high-symptoms class was higher for male participants, (OR = 1.96, 95%CI = [1.17, 3.31]), white participants (OR = 2.00, 95%CI = [1.12, 3.55]), older participants (OR = 1.39, 95%CI = [1.15, 1.68]), and participants seen at Visit 1 (OR = 2.11, 95%CI = [1.25, 3.58]).

In the final model determined by the MIMIC-LCA procedure, there were two uniform DIF effects from Visit, such that participants seen at Visit 1 were more likely than those at Visit 2 to endorse Item 5 (uncontrolled drinking; βVisit.5 = 0.78, z = 3.15, p = 0.002) and Item 9 (continued drinking despite health or psychological problems; βVisit.9 = 2.36, z = 5.69, p < 0.001). No other DIF effects from any covariate were identified. Model-implied class-specific endorsement profiles for participants seen at Visit 1 and Visit 2 are shown in Figure 6.

Figure 6.

Figure 6.

Study 3: Profiles of endorsement probabilities for both classes at Visit 1 and Visit 2.

Even though the individual likelihood of endorsing Items 5 and 9 changes from Visit 1 to Visit 2, visual inspection of Figure 5 suggests that class prevalences and model-implied marginal endorsement probabilities are quite similar between the initial and final models. Therefore it was of interest to assess the agreement between modal classifications η˜ik generated by the initial and final models using the Adjusted Rand Index (ARI), as in Study 2. The ARI comparing modal classifications under the initial and final models was 0.92, suggesting high concordance between individual classifications generated by the two models. The posterior probabilities of class membership were also quite similar between the initial and final models as shown in the figure in Online Supplement 3.

Summary

Study 3 assessed the effects of demographic covariates on diagnostic criteria for AUD using LCA with DIF. In the full model, DIF effects were fewer but generally larger in magnitude than those considered in Study 1 and Study 2. In both the initial and final models, roughly three quarters of the sample fell into a class (the “low-symptoms” class) with low levels of most symptoms, and the remainder falling into a class (the “high-symptoms” class) in which most members were likely to meet criteria for at least mild AUD. A minority of individual-level class assignments and posterior class membership probabilities changed between the two models. However, two fairly sizeable DIF effects were found on the basis of study visit, and the effects of visit on class membership changed substantially when this DIF was included. These findings suggest that, even if individual diagnostic classifications for AUD may not change drastically based on DIF, omitted DIF may masquerade as true differences in AUD liability on the basis of covariates.

Discussion

In the current study, we examined the consequences of omitting DIF in latent class analysis (LCA). In both studies, data were generated according to a number of conditions conducive to a complex pattern of bias in LCA results. Though DIF generally did not interfere with class enumeration in Study 1, large DIF did lead to bias in parameter estimates, class profiles, and individual classifications in Study 2. This discrepancy generally suggests that even in cases where researchers choose the correct number of classes, inferences based on mixture models with unmodeled DIF may still be incorrect. Meanwhile, in Study 3, DIF of large magnitude relative to the DIF in Studies 1 and 2 made relatively little difference in estimated classifications for individuals and overall endorsement profiles, but did affect covariate effects on class membership.

Evaluating Predictions about DIF

This study was motivated by four major predictions about the effects of unmodeled DIF on LCA results. The first, Prediction 1, was that class enumeration would be most strongly affected by unmodeled DIF when a conditional model was used to decide the number of classes in Study 1. This prediction was difficult to evaluate given the frequent convergence problems for the conditional model, but in general DIF did not affect the ability of the BIC to extract the correct number of classes under this model. This general finding was further complicated by the fact that the BIC generally underextracted classes when the unconditional model was used, although assessing the BIC’s performance under a few conditions with N = 2000 provided evidence that this was merely a product of sample size. The relatively weak effects of DIF on both fitted models in class enumeration, relative to previous work (Nylund-Gibson & Masyn, 2016; Kim et al., 2014; Diallo, Morin, & Lu, 2017), likely owes to the small magnitude of DIF chosen here. Therefore, the conditions chosen here generally represent conditions in which the researcher is unlikely to overextract classes due to unmodeled DIF. However, particularly as underscored by the convergence problems in the conditional model with K = 3 or K = 4, these results should not be taken as evidence that the conditional model is appropriate for class enumeration.

Study 2 presented a complicated set of issues a researcher may encounter having chosen the correct number of classes. As predicted (Prediction 2), covariate effects on class membership were biased when DIF was omitted, as in the case of covariate effects on continuous latent variables (Chen, 2008; DeBeuckaler & Swinnen, 2011; Kuha and Moustaki, 2015). In this regard, we extend prior results suggesting bias in covariate effects on class membership given omitted DIF (Kim et al., 2016; Masyn, 2017) to a wider range of conditions, showing that class separation largely supersedes all other factors in its effect on bias.

The effects of omitted DIF on individual classifications (Prediction 3) were less pronounced than was hypothesized. On the one hand, the finding that class membership was generally accurate even when the nature of DIF is misspecified is concordant with the growing evidence that DIF must be pervasive in order to adversely affect factor score recovery (Chalmers, Counsell, & Flora, 2016; Curran et al., 2016). On the other hand, by contrast to the results of Curran et al. (2016), simply including the impact of covariates on class membership did not mitigate the effects of omitted DIF on scores.

Finally, as predicted based on the well-known relationship between class membership and class-specific functions for indicators (Prediction 4), class-specific endorsement profiles were most biased in cases where class separation was lowest and misclassification was most severe. Though the absolute magnitude of bias in predicted endorsement probabilities was quite small, the cumulative differences across items generated by unmodeled DIF may present a distorted picture of the classes’ overall endorsement profiles.

The uniform and nonuniform DIF fitted models

One of the key findings of Study 2 was that, even when there was nonuniform DIF in the data-generating model, the nonuniform DIF fitted model (Model D) did not improve the accuracy of covariate effects on class membership, model-implied class profiles, or individual classifications. This results motivates the question of whether the nonuniform DIF model is worth fitting, particularly given that a model with class-varying direct effects often faces problems with identification (Huang & Bandeen-Roche, 2004). Nonuniform DIF also poses interpretational challenges, given that it can be conceptualized as an interaction effect between the covariate and the latent variable, thereby conveying some of the “true” effect of the latent variable (Swaminathan & Rogers, 1990). Given these challenges, as well as the tentative evidence that the nonuniform DIF model does not improve the accuracy of the quantities explored here, researchers may opt to fit the simpler uniform DIF model. The important caveat to this is that, even if a number of parameters and model-implied scores are not compromised by erroneously assuming DIF to be uniform, the DIF effects themselves will be biased. Therefore, if interpreting the direct effects themselves is of interest, correctly modeling the uniformity of DIF is key.

Jointly considering class enumeration and parameter accuracy

Though DIF was arguably not large enough to strongly impact class enumeration in this study, it is clear that it may be an impediment to obtaining the correct number of classes, particularly when the conditional model is used (Nylund-Gibson & Masyn, 2016). Thus, while the unconditional model is recommended for class enumeration, the conditional model may be a useful comparator.4 If a researcher finds different numbers or configurations of classes between the conditional and unconditional models, this may serve as a signal that DIF is present in the data (Petras & Masyn, 2010; Masyn, 2013). Such differences indicate that a conditional model is misspecified, prompting a formal exploration of DIF once the number of classes has been determined using the unconditional model.

The results of Study 2, in which the accuracy of individual classifications and class profiles was only compromised when class separation was low, must b e considered in light of the results of Study 1. In particular, the cases with the greatest degree of inaccuracy in person-level estimates in Study 2 are the cases in which class underextraction by the BIC was most significant in Study 1. This raises a critical question: are the only truly problematic cases of DIF ones which would affect class enumeration? That is, might researchers potentially not even reach the issues of bias assessed in Study 2, having failed to extract the correct number of classes?

In considering this question, we are guided by the finding that DIF did not actually adversely affect class enumeration in these cases; rather, results with N = 2000 indicate that sample size was a greater issue here. As has been observed elsewhere (Diallo, Morin, & Lu, 2017; Diallo, Morin, & Lu, 2016) the BIC underextracted classes only when class separation was low, a condition under which other indices such as the BLRT extracted the correct number of classes. This occurred regardless of the level of DIF. Therefore it stands to reason that, even in the presence of poorly separated classes, a researcher could get over the hurdle of class enumeration and find that, even with the correct number of classes, DIF adversely affects group- and person-level inferences.

Thus, one recommendation which emerges from this study is to be particularly cautious about DIF effects if the value of entropy in the chosen K-class solution is low. The current analyses suggest that correctly modeling the uniformity or non-uniformity of this DIF is not critical to obtaining unbiased results, but omitting DIF entirely will likely lead to biased estimates of covariate effects on class membership and class-specific endorsement probabilities, as well as potentially severe misclassification of individuals into latent classes.

However, the question of whether this bias is worth considering in cases in which classes are reliably overextracted remains an open one. Given that models with different numbers of classes may produce nearly-equivalent estimates of individual predicted values (e.g., Sterba & Bauer, 2014; Lubke & Neale, 2006; 2008), it is conceivable that, even if classes are overextracted due to unmodeled DIF, the individual-level quantities such as predicted endorsement probabilities may be surprisingly accurate. However, by definition, some quantities – such as estimates of covariate effects on class membership and individual class assignments – would necessarily be incorrect if spurious classes were extracted due to DIF. As such, considerably more research needs to be done to establish the reasonable limits of parameter interpretation in mixture models when the class enumeration process is uncertain, with or without DIF.

Limitations

The current study possesses a number of limitations, and must be extended in a number of ways. First, it is important to consider a wider range of data-generating conditions, in order to determine the full scope of the biasing effect of DIF in mixture models. There are at least four obvious extensions to the current work. First, in the current study only two-class models were considered, and future work should consider models with three or more classes. Models with more than two classes are of interest because the nature of differences between classes may be considerably more complicated than they were here. For instance, item parameter values may differ in one class relative to all other classes, but be similar between the remaining classes. The consequences of “uneven” patterns of DIF such as this remain to be seen, and cannot be addressed in a simulation with only two classes. Second, only binary responses were considered, and it is critical to determine whether the patterns of bias associated with omitted DIF generalize to indicators in other scales (e.g., count, ordinal, or continuous). Third, the simulation study considered a data-generating model with complete local independence between indicators. However, a wide diversity of covariance relationships may exist between indicators in finite mixture models, from LCA models with only incidental local dependence to factor mixture models in which continuous latent variables govern the distribution of indicators within a given class (Reboussin et al., 2008; Lubke & Muthén, 2005; Lubke & Muthén, 2007). It remains to be seen how omitted DIF biases model parameters, as well as how DIF effects should be interpreted, in the presence of dependence between indicators. Finally, DIF on the basis of categorical covariates should be considered, especially given the sensitivity of mixture model results to the distribution of predictors (Asparouhov & Muthen, 2014).

Additionally, the process of measurement invariance testing was not investigated here; in Studies 1 and 2 the location of all DIF effects was known, and in Study 3, the MIMIC-LCA algorithm was used to locate covariate-item pairs with DIF. Typical empirical studies likely have complex patterns of DIF from a number of covariates, and thus it is crucial to establish guidelines for empirically determining whether and where DIF exists in a given dataset. Unlike the continuous latent variable literature, in which a wide variety of specification searches are available to help researchers find all of the DIF effects which truly exist (e.g., Thissen, 2001; Woods, 2009; Woods, Cai, & Wang, 2013; Shi et al., 2017), the MIMIC-LCA (Masyn, 2017) algorithm used here is the only published specification search algorithm available for finding DIF in mixture models. As such, future work must focus on determining how well this and other search algorithms perform at finding DIF so that it may be modeled correctly. Although the scope of the present research precludes investigating all of these additional issues, the current results nevertheless provide a clear picture of the consequences of omitting DIF in mixture models.

Supplementary Material

SM1
SM2
SM3

Footnotes

1

We thank an anonymous reviewer for suggesting this possibility.

2

In addition to modal classifications, the accuracy of posterior probabilities was assessed using an extension of the ARI which uses cosine similarity to assess the agreement between two fuzzy partitions (Brouwer, 2009). However, no meaningful differences were found between these results and the ARI examining hard partitions,and so the simpler ARI results are presented.

3

Class 2 is shown because, when the classes are unequally sized, it is the larger class. Bias was worst in Class 1 under the same conditions as Class 2; however, here non-DIF items showed the most severe bias.

4

We thank an anonymous reviewer for suggesting this strategy.

References

  1. Agresti A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons. [Google Scholar]
  2. Angoff WH. (1993). Perspectives on differential item functioning methodology
  3. Asparouhov T, & Muthén B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21 (3), 329–341. Retrieved from https://doi.org/10.1080%2F10705511.2014.915181 doi: 10.1080/10705511.2014.915181 [DOI] [Google Scholar]
  4. Bakk Z, Tekle FB, & Vermunt JK. (2013). Estimating the association between latent class membership and external variables using bias-adjusted three-step approaches. Sociological Methodology, 43 (1), 272–311. Retrieved from https://doi.org/10.1177%2F0081175012470644 doi: 10.1177/0081175012470644 [DOI] [Google Scholar]
  5. Bauer DJ. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22 (3), 507–526. Retrieved from https://doi.org/10.1037%2Fmet0000077 doi: 10.1037/met0000077 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bauer DJ, & Curran PJ. (2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8 (3), 338–363. Retrieved from https://doi.org/10.1037%2F1082-989x.8.3.338 doi: 10.1037/1082-989x.8.3.338 [DOI] [PubMed] [Google Scholar]
  7. Bauer DJ, & Curran PJ. (2004). The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods, 9 (1), 3–29. Retrieved from https://doi.org/10.1037%2F1082-989x.9.1.3 doi: 10.1037/1082-989x.9.1.3 [DOI] [PubMed] [Google Scholar]
  8. Bauer DJ, & Hussong AM. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14 (2), 101–125. Retrieved from https://doi.org/10.1037%2Fa0015583 doi: 10.1037/a0015583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beuckelaer A. d., & Swinnen G. (2011). Biased latent variable mean comparisons due to measurement noninvariance: a simulation study (pp. 117–148). Cross-Cultural Analysis. Methods and Applications. NY/Hove: Taylor and Francis. [Google Scholar]
  10. Bock RD, & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46 (4), 443–459. Retrieved from https://doi.org/10.1007%2Fbf02293801 doi: 10.1007/bf02293801 [DOI] [Google Scholar]
  11. Bolck A, Croon M, & Hagenaars J. (2004). Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political Analysis, 12 (01), 3–27. Retrieved from https://doi.org/10.1093%2Fpan%2Fmph001 doi: 10.1093/pan/mph001 [DOI] [Google Scholar]
  12. Brouwer RK. (2008). Extending the rand, adjusted rand and jaccard indices to fuzzy partitions. Journal of Intelligent Information Systems, 32 (3), 213–235. Retrieved from https://doi.org/10.1007%2Fs10844-008-0054-7 doi: 10.1007/s10844-008-0054-7 [DOI] [Google Scholar]
  13. Chalmers RP, Counsell A, & Flora DB. (2015). It might not make a big DIF. Educational and Psychological Measurement, 76 (1), 114–140. Retrieved from https://doi.org/10.1177%2F0013164415584576 doi: 10.1177/0013164415584576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chen FF. (2008). What happens if we compare chopsticks with forks? the impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95 (5), 1005–1018. Retrieved from https://doi.org/10.1037%2Fa0013193 doi: 10.1037/a0013193 [DOI] [PubMed] [Google Scholar]
  15. Curran PJ, Cole V, Bauer DJ, Hussong AM, & Gottfredson N. (2016). Improving factor score estimation through the use of observed background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23 (6), 827–844. Retrieved from https://doi.org/10.1080%2F10705511.2016.1220839 doi: 10.1080/10705511.2016.1220839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Curran PJ, & Hussong AM. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14 (2), 81–100. Retrieved from https://doi.org/10.1037%2Fa0015914 doi: 10.1037/a0015914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Diagnostic and statistical manual of mental disorders (dsm-5®). (n.d.). [DOI] [PubMed]
  18. Diallo TMO, Morin AJS, & Lu H. (2016). Impact of misspecifications of the latent variance–covariance and residual matrices on the class enumeration accuracy of growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23 (4), 507–531. Retrieved from https://doi.org/10.1080%2F10705511.2016.1169188 doi: 10.1080/10705511.2016.1169188 [DOI] [Google Scholar]
  19. Diallo TMO, Morin AJS, & Lu H. (2017). The impact of total and partial inclusion or exclusion of active and inactive time invariant covariates in growth mixture models. Psychological Methods, 22 (1), 166–190. Retrieved from https://doi.org/10.1037%2Fmet0000084 doi: 10.1037/met0000084 [DOI] [PubMed] [Google Scholar]
  20. Gibson WA. (1959). Three multivariate models: Factor analysis, latent structure analysis, and latent profile analysis. Psychometrika, 24 (3), 229–252. [Google Scholar]
  21. Holland PW. (2012). Differential item functioning. Routledge; Retrieved from https://doi.org/10.4324%2F9780203357811 doi: 10.4324/9780203357811 [DOI] [Google Scholar]
  22. Horn MLV, Smith J, Fagan AA, Jaki T, Feaster DJ, Masyn K, … Howe G. (2012). Not quite normal: Consequences of violating the assumption of normality in regression mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 19 (2), 227–249. Retrieved from https://doi.org/10.1080%2F10705511.2012.659622 doi: 10.1080/10705511.2012.659622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Huang G-H, & Bandeen-Roche K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69 (1), 5–32. Retrieved from https://doi.org/10.1007%2Fbf02295837 doi: 10.1007/bf02295837 [DOI] [Google Scholar]
  24. Hubert L, & Arabie P. (1985). Comparing partitions. Journal of Classification, 2 (1), 193–218. Retrieved from https://doi.org/10.1007%2Fbf01908075 doi: 10.1007/bf01908075 [DOI] [Google Scholar]
  25. Hussong AM, Curran PJ, & Bauer DJ. (2013). Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology, 9 (1), 61–89. Retrieved from https://doi.org/10.1146%2Fannurev-clinpsy-050212-185522 doi: 10.1146/annurev-clinpsy-050212-185522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. The icd-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines. (n.d.).
  27. Joreskog KG, & Goldberger AS. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70 (351), 631 Retrieved from https://doi.org/10.2307%2F2285946 doi: 10.2307/2285946 [DOI] [Google Scholar]
  28. Kaplan D, & George R. (1995). A study of the power associated with testing factor mean differences under violations of factorial invariance. Structural Equation Modeling: A Multidisciplinary Journal, 2 (2), 101–118. Retrieved from https://doi.org/10.1080%2F10705519509539999 doi: 10.1080/10705519509539999 [DOI] [Google Scholar]
  29. Kim M, Vermunt J, Bakk Z, Jaki T, & Horn MLV. (2016). Modeling predictors of latent classes in regression mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23 (4), 601–614. Retrieved from https://doi.org/10.1080%2F10705511.2016.1158655 doi: 10.1080/10705511.2016.1158655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lanza ST, Tan X, & Bray BC. (2013). Latent class analysis with distal outcomes: A flexible model-based approach. Structural Equation Modeling: A Multidisciplinary Journal, 20 (1), 1–26. Retrieved from https://doi.org/10.1080%2F10705511.2013.742377 doi: 10.1080/10705511.2013.742377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lazarsfeld P, & Henry N. (1968). Latent structure analysis. Houghton Mifflin Company, Boston, Massachusetts. [Google Scholar]
  32. Li L, & Hser Y-I. (2011). On inclusion of covariates for class enumeration of growth mixture models. Multivariate Behavioral Research, 46 (2), 266–302. Retrieved from https://doi.org/10.1080%2F00273171.2011.556549 doi: 10.1080/00273171.2011.556549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Lo Y. (2001). Testing the number of components in a normal mixture. Biometrika, 88 (3), 767–778. Retrieved from https://doi.org/10.1093%2Fbiomet%2F88.3.767 doi: 10.1093/biomet/88.3.767 [DOI] [Google Scholar]
  34. Lubke G, & Muthén BO. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14 (1), 26–47. Retrieved from https://doi.org/10.1207%2Fs15328007sem14012 doi: 10.1207/s15328007sem14012 [DOI] [Google Scholar]
  35. Lubke G, & Neale M. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43 (4), 592–620. Retrieved from https://doi.org/10.1080%2F00273170802490673 doi: 10.1080/00273170802490673 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lubke G, & Neale MC. (2006). Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood? Multivariate Behavioral Research, 41 (4), 499–532. Retrieved from https://doi.org/10.1207%2Fs15327906mbr41044 doi: 10.1207/s15327906mbr41044 [DOI] [PubMed] [Google Scholar]
  37. Lubke GH, & Muthén B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10 (1), 21–39. Retrieved from https://doi.org/10.1037%2F1082-989x.10.1.21 doi: 10.1037/1082-989x.10.1.21 [DOI] [PubMed] [Google Scholar]
  38. Masyn KE. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinary Journal, 24 (2), 180–197. Retrieved from https://doi.org/10.1080%2F10705511.2016.1254049 doi: 10.1080/10705511.2016.1254049 [DOI] [Google Scholar]
  39. McCullagh P, & Nelder JA. (1989). Binary data In Generalized linear models (pp. 1–20). Springer US; Retrieved from https://doi.org/10.1007%2F978-1-4899-3242-61 doi: 10.1007/978-1-4899-3242-61 [DOI] [Google Scholar]
  40. McLachlan G, & Peel D. (2004). Finite mixture models. John Wiley & Sons. doi: 10.1002/0471721182 [DOI] [Google Scholar]
  41. Meredith W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58 (4), 525–543. Retrieved from https://doi.org/10.1007%2Fbf02294825 doi: 10.1007/bf02294825 [DOI] [Google Scholar]
  42. Millsap RE. (2012). Statistical approaches to measurement invariance. Routledge; Retrieved from https://doi.org/10.4324%2F9780203821961 doi: 10.4324/9780203821961 [DOI] [Google Scholar]
  43. Mislevy RJ. (1983). Item response models for grouped data. Journal of Educational Statistics, 8 (4), 271 Retrieved from https://doi.org/10.2307%2F1164913 doi: 10.2307/1164913 [DOI] [Google Scholar]
  44. Mislevy RJ, & Bock RD. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42 (3), 725–737. Retrieved from https://doi.org/10.1177%2F001316448204200302 doi: 10.1177/001316448204200302 [DOI] [Google Scholar]
  45. Muthén BO. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54 (4), 557–585. Retrieved from https://doi.org/10.1007%2Fbf02296397 doi: 10.1007/bf02296397 [DOI] [Google Scholar]
  46. Nagin DS. (1999). Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods, 4 (2), 139–157. Retrieved from https://doi.org/10.1037%2F1082-989x.4.2.139 doi: 10.1037/1082-989x.4.2.139 [DOI] [Google Scholar]
  47. Nylund KL, Asparouhov T, & Muthén BO. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A monte carlo simulation study. Structural Equation Modeling: A Multidisciplinary Journal, 14 (4), 535–569. Retrieved from https://doi.org/10.1080%2F10705510701575396 doi: 10.1080/10705510701575396 [DOI] [Google Scholar]
  48. Nylund-Gibson K, & Masyn KE. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling: A Multidisciplinary Journal, 23 (6), 782–797. Retrieved from https://doi.org/10.1080%2F10705511.2016.1221313 doi: 10.1080/10705511.2016.1221313 [DOI] [Google Scholar]
  49. Roju NS, van der Linden WJ, & Fleer PF. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19 (4), 353–368. Retrieved from https://doi.org/10.1177%2F014662169501900405 doi: 10.1177/014662169501900405 [DOI] [Google Scholar]
  50. Schwarz G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461–464. Retrieved from https://doi.org/10.1214%2Faos%2F1176344136 doi: 10.1214/aos/1176344136 [DOI] [Google Scholar]
  51. Sterba SK, & Bauer DJ. (2014). Predictions of individual change recovered with latent class or random coefficient growth models. Structural Equation Modeling: A Multidisciplinary Journal, 21 (3), 342–360. Retrieved from https://doi.org/10.1080%2F10705511.2014.915189 doi: 10.1080/10705511.2014.915189 [DOI] [Google Scholar]
  52. Supplemental material for nonequivalence of measurement in latent variable modeling of multigroup data: A sensitivity analysis. (2015). Psychological Methods. Retrieved from https://doi.org/10.1037%2Fmet0000031.supp doi: 10.1037/met0000031.supp [DOI] [PubMed] [Google Scholar]
  53. Swaminathan H, & Rogers HJ. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27 (4), 361–370. Retrieved from https://doi.org/10.1111%2Fj.1745-3984.1990.tb00754.x doi: 10.1111/j.1745-3984.1990.tb00754.x [DOI] [Google Scholar]
  54. Tueller SJ, Drotar S, & Lubke GH. (2011). Addressing the problem of switched class labels in latent variable mixture model simulation studies. Structural Equation Modeling: A Multidisciplinary Journal, 18 (1), 110–131. Retrieved from https://doi.org/10.1080%2F10705511.2011.534695 doi: 10.1080/10705511.2011.534695 [DOI] [Google Scholar]
  55. Vermunt JK. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18 (04), 450–469. Retrieved from https://doi.org/10.1093%2Fpan%2Fmpq025 doi: 10.1093/pan/mpq025 [DOI] [Google Scholar]
  56. Vuong QH. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57 (2), 307 Retrieved from https://doi.org/10.2307%2F1912557 doi: 10.2307/1912557 [DOI] [Google Scholar]
  57. Woods CM. (2008). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33 (1), 42–57. Retrieved from https://doi.org/10.1177%2F0146621607314044 doi: 10.1177/0146621607314044 [DOI] [Google Scholar]
  58. Woods CM. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44 (1), 1–27. Retrieved from https://doi.org/10.1080%2F00273170802620121 doi: 10.1080/00273170802620121 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SM1
SM2
SM3

RESOURCES