Skip to main content
Educational and Psychological Measurement logoLink to Educational and Psychological Measurement
. 2020 Nov 16;81(4):668–697. doi: 10.1177/0013164420970614

A Comparison of Label Switching Algorithms in the Context of Growth Mixture Models

Kristina R Cassiday 1,, Youngmi Cho 2, Jeffrey R Harring 1
PMCID: PMC8243206  PMID: 34267396

Abstract

Simulation studies involving mixture models inevitably aggregate parameter estimates and other output across numerous replications. A primary issue that arises in these methodological investigations is label switching. The current study compares several label switching corrections that are commonly used when dealing with mixture models. A growth mixture model is used in this simulation study, and the design crosses three manipulated variables—number of latent classes, latent class probabilities, and class separation, yielding a total of 18 conditions. Within each of these conditions, the accuracy of a priori identifiability constraints, a priori training of the algorithm, and four post hoc algorithms developed by Tueller et al.; Cho; Stephens; and Rodriguez and Walker are tested to determine their classification accuracy. Findings reveal that, of all a priori methods, training of the algorithm leads to the most accurate classification under all conditions. In a case where an a priori algorithm is not selected, Rodriguez and Walker’s algorithm is an excellent choice if interested specifically in aggregating class output without consideration as to whether the classes are accurately ordered. Using any of the post hoc algorithms tested yields improvement over baseline accuracy and is most effective under two-class models when class separation is high. This study found that if the class constraint algorithm was used a priori, it should be combined with a post hoc algorithm for accurate classification.

Keywords: label switching, mixture modeling, growth mixture model, simulation, training data set


Mixture models are not new. In fact, it has been over 130 years since Newcomb (1886) applied normal mixtures as models for outliers and over 120 years since Pearson (1894) wrote his classic article on the decomposition of mixtures of normal distributions by the method of moments. In Pearson’s article, he analyzed a data set with measurements from Naples crabs, including the forehead width to body length ratio. He found that, while the rest of the attributes all followed a single normal distribution, the forehead width to body length ratio measurement deviated from this pattern. Pearson hypothesized that the “asymmetry may arise from the fact that the units grouped together in the measured material are not really homogenous” (p. 72). During the 1800s, this work was all done without the aid of computers. With the rise of new technologies and advancement in computational methods, estimation of mixture models has become easier, but the models themselves have become more complex with larger data sets, more unknown parameters, and more complicated estimation algorithms.

Finite Mixture Models

Finite mixture models are regularly employed in situations in which latent heterogeneity in the population is to be investigated. In particular, it may be reasonable to assume that a sample of observations, y=(y1,,yn) , arises from a composite of underlying subpopulations in which class membership for each observation is unobserved and where the relative proportions of the subpopulations are unknown (Everitt & Hand, 1981; Titterington et al., 1985). As Kohli et al. (2015) point out, the central goals to many mixture analyses are to (1) decompose the sample into its mixture components and (2) estimate the mixing proportions and unknown model parameters of each component distribution. The general form of a finite mixture model follows that given by McLachlan and Peel (2000),

p(y|θ)=φ1f1(y|θ1)++φKfK(y|θK),

where p(y|θ) is the composite distribution for all k=1,,K number of components; fk(y|θk) is a single component distribution of the composite that depends on component-specific parameter vector, θk . The parameters, φ1,,φK are called the mixing proportions and are assumed to be nonnegative quantities that sum to one. Then, the likelihood function can be defined as

L(θ|y)=Πi=1np(yi|θ)=Πi=1n{φ1f1(yi|θ1)++φKfK(yi|θK)},

where parameters to be estimated are θ=(φ1,,φK1,θ1,,θK) .

Label Switching

Methodological investigations into estimating mixture models have revealed a number of issues that must still be resolved, including the potential of choosing a local solution over a global one as detailed by Hipp and Bauer (2006), as well as the challenge of label switching. When using maximum likelihood estimation, label switching arises when working with large simulation studies involving mixture models. It can also occur when fitting the same model on a single data set using different starting values. Label switching refers to the possible switching of class labels from replication to replication (Stephens, 2000). Since the likelihood function is the same for any permutation of the class label, the labels of the components or classes can switch (Yao, 2015). To be specific, for any permutation λ=(λ(1),,λ(K)) of the identity permutation λ=(1,,K) , denote the corresponding permutation of the parameter vector θ by θλ=(φλ(1),,φλ(K1),θλ(1),,θλ(K)) . Then the likelihood function L(θλ|y) is identical to L(θ|y) for any permutation λ —that is L(θ|y) is invariant to the permutation of component label of θ .

In these situations, parameter estimates and other output are inevitably aggregated across numerous replications, and because the labels of the enumerated classes are arbitrarily assigned, there is no guarantee that the same class would have the same label from replication to replication. There are K! possible permutations of the labels, with K being the number of latent classes. As the number of classes increases, the combinations of labels to be switched increase as well. When switched labels are ignored, the aggregation of parameters across replications is untenable.

Label switching is a well-known and fundamental problem in Bayesian estimation as well (see, e.g., Papastamoulis & Iliopoulos, 2010; Rodriguez & Walker, 2014; Stephens, 2000). In the recurrent case that the prior distribution of the model parameters is the same for all latent classes, then both the likelihood and posterior distribution are invariant to permutations of the parameters (Papastamoulis, 2014). As a consequence of this property, Markov chain Monte Carlo (MCMC) samples simulated from the posterior distribution are not identifiable. Analogously to the simulation scenario in a frequentist estimation framework, it is not at all straightforward to draw inferences for any parametric function that depends on the labels of the components. In addition, label switching in Bayesian estimation is complicated by the fact that the problem occurs both within- and between-Markov chains. Thus, when Bayesian inference is used for mixture models, label switching can occur for a single data set, unlike in cases using maximum likelihood estimation.

Although the issue of label switching is more complicated in Bayesian estimation schemes, frequentist mixture models also must deal with the label switching problem between replications when simulating mixture models, using bootstrapping to estimate parameters, or when employing multiple imputation when data are missing in simulation studies (D. Y. Lee, 2019).

To more clearly illustrate the label switching issue, Tueller et al. (2011) provided an easy to understand example of manual switching methods using a one-factor, two-class mixture model fit to data, which is reproduced in Table 1. In this example, the population variances were set so that Class 1 had a factor variance of 1.0, and Class 2 had a factor variance of 3.0. After simulating four data sets, three replications had incompatible labels. Not correcting for this issue yields meaningless summary statistics (see left side of Table 1). Since the correct class for each participant is known, parameter estimates in this instance could be inspected visually to determine the correct class (see right side of Table 1).

Table 1.

Consequences of Ignoring Switched Labels.

Ignored Corrected
Replication ψ1 ψ2 ψ1 ψ2
Data 1 3.1 0.9 0.9 3.1
Data 2 1.1 3.2 1.1 3.2
Data 3 2.8 1.2 1.2 2.8
Data 4 2.9 0.8 0.8 2.9
M 2.475 1.525 1.0 3.0

Note. ψ1 and ψ2 are the factor variances in Classes 1 and 2.

However, simulation studies tend to require copious replications, making manual correction of switched labels impractical. Additionally, depending on the parameter estimates used to make the classification decision, it is possible to arrive at different—and wrong—conclusions, based on incorrectly switched labels.

Label Switching Corrections

One may attempt to correct the label switching problem a priori by using identifiability constraints or a training set. These options are available in commonly used statistical software such as Mplus (Muthén & Muthén, 2017), OpenMx (Neale et al., 2016), and Latent GOLD 5.0 (Vermunt & Magidson, 2013). Identifiability constraints can be placed on various parameters (e.g., intercepts, factor means) to break the symmetry in the likelihood (Diebolt & Robert, 1994). For example, if Class 1 is known to have the smallest factor mean and Class 3 has the largest factor mean, inequality constraints could be specified in the computer software’s syntax so that means of the class reflect this supposition (i.e., α1 < α2 < α3 , using structural equation modeling notation). For each replication, the specified constraint will be satisfied so that the labels are appropriate. A constraint could alternatively be placed on the class probabilities in a similar fashion. Identifiability constraints can be implemented in Mplus through use of the MODEL CONSTRAINT command (see, e.g., Jasra et al., 2005, for use in a Bayesian framework and Li et al., 2014, for advice about starting values, convergence criteria, and other inputs under which growth mixture model (GMM) parameters might be successfully estimated in the software).

One may also establish latent class membership of a small number of cases by use of a training set. When using the TRAINING command in Mplus, variables containing information about latent class membership are identified and corresponding participants (cases) are assigned values ahead of time (Muthén & Muthén, 2017). In Latent GOLD (Vermunt & Magidson, 2013) and OpenMx (Neale et al., 2016), the KNOWN CLASS option can be used to incorporate training data. In the same spirit as the use of training data in supervised machine learning algorithms, directly assigning some participants to membership in the correct latent class can make it easier for the program to identify cases with characteristics akin to those in the training set, which should remedy incorrect labels.

One theoretical downside of imposing artificial constraints on a model a priori is that they are beyond those needed for model identification. Additionally, not all choices of constraints are guaranteed to eliminate switching, and different choices may yield different permutations of the class labels (Asparouhov & Muthén, 2010). When using these a priori options, it is also difficult to determine whether data were correctly classified, because a researcher has no knowledge of the algorithm’s internal mechanisms. Given this inherent opacity, this article also considers four post hoc alternatives for correction of label switching. Two engineered for frequentist estimation of mixture models and two algorithms adapted from estimating mixture models within a Bayesian framework using an MCMC algorithm.

Tueller et al. (2011) devised an algorithm 1 that automatically detects switched class labels and provides information so that labels can be corrected as a byproduct of estimating the model in Mplus using the expectation-maximization (EM) algorithm. The algorithm is not model-specific, but needs a class assignment matrix for each replication. This class assignment matrix is a K×K matrix where the columns represent true class labels and the rows represent assigned class labels. The left half of Table 2 illustrates a class assignment matrix that would easily be corrected with the Tueller et al. (2011) algorithm. In this case, there is only one maximum value in each column, and classification accuracy is much greater than chance. The algorithm keeps the column value fixed and only moves rows so that larger numbers are on the diagonal. In this example, rows 1 and 2 would be switched. This algorithm can handle cases where the proportion of correct cases assigned exceeds 1/1KK in K−1 classes. However, classification accuracy is not above chance. It is possible that the algorithm will not correctly label the classes. In Table 3, for example, it is not clear what the maximum values in each column are. Even though the largest numbers are on the diagonals, the off-diagonal values are so similar that one cannot be confident that the algorithm correctly relabeled the classes.

Table 2.

Class Assignment Matrix.

Original Label True 1 True 2 True 3 New Label True 1 True 2 True 3
Assignment 1 4 147 2 Assignment 1 142 2 3
Assignment 2 142 2 3 Assignment 2 4 147 2
Assignment 3 4 1 145 Assignment 3 4 1 145

Table 3.

Example Of A Spurious Class Correction Using Tueller Et Al.’s Algorithm

Original Label True 1 True 2 True 3 New Label True 1 True 2 True 3
Mean constraints
Class probability constraints
Training set
Tueller et al.
Cho
Stephens
Equivalence class representative

Apart from classification accuracy, the algorithm requires that classes do not collapse into other classes. Collapsing is said to happen when most of the participants in a class have been incorrectly assigned to one or more other classes (Tueller et al., 2011). This results in a class assignment matrix, which has more than one maximum value in a row. When this occurs, the values cannot be properly switched so that the largest values fall onto the diagonal. The example in Table 4 shows a class assignment matrix where Class 2 has collapsed into Class 1. In cases like these, the Tueller et al. (2011) algorithm does not relabel.

Table 4.

Example Of Collapsed Classes Which Tueller Et Al.’s Algorithm Cannot Relabel.

Label True 1 True 2 True 3
Assignment 1 132 146 2
Assignment 2 15 3 5
Assignment 3 3 1 143

Although the requirements of high classification accuracy and no collapsed classes can be considered limitations, the Tueller et al. (2011) algorithm is advantageous in that the process is comparably more transparent than the use of either of the a priori methods. A researcher does not have to be concerned that different choices of constraints could yield different permutations of the class labels. Additionally, any software choice that provides posterior probabilities and classification based on likeliest latent class (e.g., Latent Gold, Mx, SAS) can be used to obtain class assignment matrices and switch incorrect labels.

In her dissertation on respondent endorsing styles, Cho (2013) used the Tueller et al. (2011) algorithm in conjunction with one she developed for her study. Her research employed a mixture item response theory (IRT) models in an attempt to examine the accuracy of participant classification into one of four different response styles. Cho’s algorithm 2 exploited the distinctive order of the thresholds that characterized each of her response styles. To provide a simple example of how her algorithm works, consider a two class mixture model with eight items, five response categories, and a sample size of 1,000. For each replication, the mean values of the four thresholds are calculated over all eight items. One replication with the means of the thresholds for two classes is shown in Table 5.

Table 5.

Example of Means of Thresholds for One Replication in a Mixture Item Response Theory Model.

Class 1 Class 2
Replication δ1 δ2 δ3 δ4 δ1 δ2 δ3 δ4
Mean 0.73 0.40 −0.42 −0.76 −1.25 −0.25 0.35 1.20

If the researcher knows that latent Class A has ordered thresholds where δ1 < δ2 < δ3 < δ4 and latent Class B has thresholds where δ1 > 0 and δ4 < 0, she can put this information into the algorithm. In doing so, the algorithm will relabel Class 1 in Table 5 as Class B and Class 2 as Class A. For this algorithm to work properly, it is necessary that the classes have unique ordering of parameter values and that the order does not change across replications. While Cho (2013) used a mixture IRT model and looked at item thresholds, her algorithm could be extended to other mixture models with different choice of the parameter (e.g., factor means).

Although many label switching algorithms exist to address this issue by working with successive MCMC draws within a Bayesian analysis, the label switching methods proposed by Stephens (2000) and Papastamoulis and Iliopoulos (2010) 3 , can also be used across simulation replications, making them flexible options that can translate to the frequentist context. The main idea behind Stephens’s algorithm is to make the permuted latent classes across replications agree on the person × latent classes matrix of classification (posterior) probabilities. Stephens proposed using the Kullback–Leibler divergence between an averaged matrix of classification probabilities across the replications such that the classification matrix for each replication is minimized in an iterative fashion. Papastamoulis (2016) reported that Stephens’s algorithm was very efficient in terms of finding the correct relabeling, but requires extensive storage of the posterior probabilities—especially as the number of latent classes and replications increase.

At the heart of the equivalence class representatives (ECRs; Papastamoulis & Iliopoulos, 2010) algorithm is the notion of equivalent allocation vectors. Two allocation vectors are called equivalent if the first one arises from the second by simply permuting its labels. The ECR algorithm partitions the set of allocation vectors into equivalence classes and selects a representative from each class. Then, the algorithm finds that permutation for a particular replication that reorders the corresponding allocations that is identical to the representative of its class. Here, we implement an iterative version of the ECR algorithm (Rodriguez & Walker, 2014) that like Stephens’s algorithm requires the knowledge of classification probabilities across replications within a cell of the simulation. Papastamoulis (2016) described this algorithm as an allocation vectors version of Stephens’s algorithm, yet notes that it is more computationally efficient than the iterative Kullback–Leibler distance algorithm by Stephens.

The four post hoc algorithms use different sources of information, and it is possible that in some cases, one algorithm will detect switched labels when the others do not (see Cho (2013) for an example of this situation). The difference in the algorithms’ performance is the catalyst for the current research, which seeks to determine best methods for detecting label switching under various conditions.

Primary Research Question and Hypotheses

The questions that the current research seeks to answer include

  1. What is the classification accuracy of each of the different algorithms/combinations of these algorithms under the various conditions?

  2. Under what model conditions do the algorithms have the highest accuracy?

Classification accuracy will be calculated in two different ways:

  1. A percentage of correctly labeled data sets out of the total number of converged data sets will be calculated for each condition so the percentages can be compared across the different algorithms. For each condition, a baseline percentage of correctly labeled data sets will be calculated as well since it is expected that some of the data sets will have been correctly labeled without the need for an algorithm.

  2. The percentage of correctly labeled individuals within each of the converged data sets will also be calculated for the baseline conditions, as well as for each of the a priori algorithms.

We hypothesize that a combination of algorithms would provide superior classification accuracy compared to solo algorithms because it is assumed that an additional post hoc algorithm would be able to correct additional labels. For conditions where class separation is larger, we expect that all of the algorithms will perform better than in conditions where the latent classes are not as well-separated.

Method

Given that none of the algorithms require a specific type of mixture model to be effectively implemented, the current study uses a linear latent GMM (see Enders & Tofighi, 2008, for an in-depth explanation of this model). GMMs are a direct extension of the latent growth model (McArdle & Epstein, 1987; Meredith & Tisak, 1990), which resembles a restricted confirmatory factor analytic model with structured mean vector and covariance matrix. The latter structure is decomposed into between-individual variability via the inclusion of random effects and within-individual variability, allowing an individual’s trajectory to not fit their repeated measure data exactly. Borrowing notation from McNeish and Harring (2017), the linear GMM can be written as

yi=Ληi+εiηi=αk+ζi,

where for the ith subject in the kth class ( k=1,,K ), yi is the set of n repeated measures, Λ is a fixed matrix of factor loadings, ηi is a vector of linear growth factors (i.e., intercept and slope), and εi is a vector of residuals where εi|k~N(0,Θik) . The linear growth factors can be decomposed into a vector of class-specific means, αk , and random effects, ζi , where ζi|k~N(0,Ψk) .

In GMMs, each observation receives a probability of membership in each of the estimated latent classes. Assuming multivariate normality, the composite density of a vector of continuous outcome variables for the ith individual, yi , can be written as

f(yi|φ,μi,Σi)=k=1Kφkfk(yi|μik,Σik),

where K is the number of latent classes the researcher typically specifies, fk is the component density for the kth class, μik=E(yi)=Λαk , is the model-implied mean vector for the kth class, Σik=var(yi)=ΛΨkΛ+Θik , is the model-implied covariance matrix for the kth class, and φk is the mixing proportion for the kth class where 0φk1 and φK=1h=1K1φh .

Estimation of GMMs

GMM parameters to be estimated include growth factor means, αk , unique growth factor variances and covariances in Ψk , residual variances in Θk , and latent class proportions, φ . A standard frequentist method of estimating parameters in these models is maximum likelihood via the EM algorithm (Dempster et al., 1977; McLachlan & Krishnan, 1997). In this iterative optimization procedure, the E-step consists of taking the expectation of the complete loglikelihood function, which translates into computing posterior probabilities of belonging to each latent class for each individual with regard to estimates of the parameters at the current iteration. In the M-step, the conditional expected loglikelihood function (i.e., the expectation taken in the E-step) is maximized to obtain the updated set of GMM parameters. The algorithm toggles back and forth between these two steps until some convergence criterion is satisfied. These longitudinal mixture models can also be estimated in a Bayesian framework (see, e.g., Depaoli, 2013; S. Y. Lee, 2007; Wang & McArdle, 2008) using an MCMC algorithm such as the Gibbs sampler (Casella & George, 1992; Gelfand & Smith, 1990; Kohli et al., 2015).

Data Generation

Population values used for the simulation resemble those used in a number of other simulation studies involving GMM (e.g., Enders & Tofighi, 2008; Li et al., 2014; Nylund et al., 2007; Nylund-Gibson & Masyn, 2016; Peugh & Fan, 2012, 2015; Tolvanen, 2008). Five repeated measures were created for each subject with the following growth modeling components:

Λ=[1111101234],Θik=Θi=diag(0.75,0.75,0.75,0.75,0.75),andΨk=Ψ=[1.50.20.20.4]

The sample size was fixed at 900 under all conditions, regardless of the number of latent classes. The design of this simulation study crosses three manipulated variables in a 3 (number of latent classes) × 2 (latent class probabilities) × 3 (class separation), yielding a total of 18 conditions. Mplus (Muthén & Muthén, 2017) was used to generate 500 data sets for each of the conditions. The number of latent classes are either two, three, or four. As mentioned above, the more latent classes, the more possible permutations of the class labels. The class probabilities are manipulated in the current study because it is expected that it will be difficult for some algorithms requiring high classification accuracy, to properly classify participants when the class proportions are equal. Class probabilities are either equal or unequal (with a 70/30 split in a two-class model, a 50/25/25 split in a three-class model, and a 35/25/20/20 split in a four-class model). Class separation in this study refers to the distance between the distributions for the latent growth factors. One way to calculate this distance is by using Mahalanobis distance (MD), which is calculated using the formula

MD=Δ=(α1α2)TΨ1(α1α2),

where Ψ1 represents the inverse of the covariance matrix between the random effects of the slope and intercept parameters, which remains fixed as [1.50.20.20.4] in the current study. The components of α2 are fixed at 1 and 0.5 while the values of α1 are computed at different values conforming to the number of classes and the MD values of 1.0, 1.5, and 2.0, which denote poor to high levels of separation. These values were based on those chosen by Depaoli (2013), who focused on class separation in a GMM. When there were three classes, the values for the first two classes were calculated and the values for the second class were then used to calculate values for the third class. For the four class models, the first two classes were calculated, the values for the third class were determined using the values from the second class, and the fourth class was determined using the values of the third class. See Table 6 for a list of the 18 condition combinations.

Table 6.

Condition Combinations.

Condition No. of classes Class probabilities Class separation Class number Class intercept Class slope
Condition 1 2 50/50 1.0 1 1.23 1.00
2 0.25 0.50
Condition 2 2 70/30 1.0 1 1.23 1.00
2 0.25 0.50
Condition 3 3 33/33/33 1.0 1 1.23 1.00
2 0.25 0.50
3 −0.97 0.25
Condition 4 3 50/25/25 1.0 1 1.23 1.00
2 0.25 0.50
3 −0.97 0.25
Condition 5 2 50/50 1.5 1 2.01 1.00
2 0.25 0.50
Condition 6 2 70/30 1.5 1 2.01 1.00
2 0.25 0.50
Condition 7 3 33/33/33 1.5 1 2.01 1.00
2 0.25 0.50
3 −1.59 0.25
Condition 8 3 50/25/25 1.5 1 2.01 1.00
2 0.25 0.50
3 −1.59 0.25
Condition 9 2 50/50 2.0 1 2.68 1.00
2 0.25 0.50
Condition 10 2 70/30 2.0 1 2.68 1.00
2 0.25 0.50
Condition 11 3 33/33/33 2.0 1 2.68 1.00
2 0.25 0.50
3 −2.20 0.25
Condition 12 3 50/25/25 2.0 1 2.68 1.00
2 0.25 0.50
3 −2.20 0.25
Condition 13 4 25/25/25/25 1.0 1 1.23 1.00
2 0.25 0.50
3 −0.97 0.25
4 0.01 0.75
Condition 14 4 35/25/20/20 1.0 1 1.23 1.00
2 0.25 0.50
3 −0.97 0.25
4 0.01 0.75
Condition 15 4 25/25/25/25 1.5 1 2.01 1.00
2 0.25 0.50
3 −1.59 0.25
4 0.17 0.75
Condition 16 4 35/25/20/20 1.5 1 2.01 1.00
2 0.25 0.50
3 −1.59 0.25
4 0.17 0.75
Condition 17 4 25/25/25/25 2.0 1 2.68 1.00
2 0.25 0.50
3 −2.20 0.25
4 0.23 0.75
Condition 18 4 35/25/20/20 2.0 1 2.68 1.00
2 0.25 0.50
3 −2.20 0.25
4 0.23 0.75

For each of these 18 condition combinations, different label switching corrections were tested as shown in Table 7, for a total of 342 unique combinations. It is important to note that the identifiability constraints in Mplus (Muthén & Muthén, 2017) were used in two ways; constraining the factor mean values and constraining class probabilities. Mplus code for the identifiability constraints and the training set is provided in the appendix.

Table 7.

Label Switching Algorithm Combinations Tested In Study.

Combination number
Algorithm Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Mean constraints
Class probability constraints
Training set
Tueller et al.
Cho
Stephens
Equivalence class representative

Note. All 19 of these combinations are examined for each of the 18 models.

Accuracy of label switching methods were then calculated to determine which algorithm or combination of algorithms is best to use under each condition. Although there are 500 data sets for each of the 18 conditions, there were times when the algorithms did not successfully converge, so the total number of cases is not always 500. In order to provide a fair comparison, within each of the 18 conditions, the classification rate for each algorithm or combination of algorithms was compared to a baseline classification rate where no correction is implemented. Examining classification rates within each cell helps to determine which algorithm functioned best under those particular conditions. Classification rates are also compared across conditions to see whether there are certain algorithms that do well regardless of the condition and to see what conditions are the most challenging in terms of providing accurate class labels.

The number of correctly classified cases out of the total number of cases were determined for the a priori algorithms through the use of Tueller et al.’s (2011) LBLinfo.R script, which gave a summary of the number of data sets that were properly labeled. Since the post hoc algorithms are manual, the classification rates were calculated in a postprocessing step, simply by adding up the number of data sets that the algorithm correctly labeled out of the total number of data sets. More specifically, the LBLinfo.R script provided the classification rate in the summary statistics for Tueller et al.’s algorithm. After Cho’s algorithm was run, this LBLinfo.R script was run to get the same summary statistics as a comparison. This script could not be run for Stephens’s or the ECR algorithms, so a few lines of code were written which considered a data set correctly classified if the order of the classes matched the true classes that the authors designated during the data simulation process.

In addition to the accuracy of the algorithms in classifying data sets, classification of the individuals within a data set was also calculated for the a priori methods by comparing the expected label for each individual and the label given under the various algorithms. After determining the number of individuals correctly classified within all data sets, the average percentage was determined and used as a comparison.

Simulation Results

Once the conditions were generated and the a priori algorithms were run, R (R Development Core Team, 2017) was used to review the analyses and determine the convergence rates and the classification rates. For all two class models, all data sets for all conditions converged. For the three class models, all baseline and training set conditions converged. The majority of mean and class constraints algorithms converged for each condition. For the four class models, all training set conditions converged, as well as most baseline conditions. Details can be found in Table 8.

Table 8.

Convergence Rates.

Condition Baseline (%) Class constraints (%) Mean constraints (%) Training set (%)
Condition 1 100 100 100 100
Condition 2 100 100 100 100
Condition 3 100 97.8 99 100
Condition 4 100 99 99.8 100
Condition 5 100 100 100 100
Condition 6 100 100 100 100
Condition 7 100 100 99.6 100
Condition 8 100 97.8 99.6 100
Condition 9 100 100 100 100
Condition 10 100 100 100 100
Condition 11 100 100 100 100
Condition 12 100 99.4 100 100
Condition 13 99.8 82.6 90.8 100
Condition 14 100 84.8 90.6 100
Condition 15 99.8 89 89.8 100
Condition 16 100 93.6 93.2 100
Condition 17 100 92.6 95.2 100
Condition 18 100 95.8 95 100

Classification accuracy was calculated in two different ways. For all algorithms, overall accuracy was calculated by counting the number of data sets with correctly ordered labels and those with incorrectly ordered labels. The percentage accuracy is (x/xnn)*100 , where x is the number of correctly labeled data sets and n is the total number of data sets that converged in that condition. The LBLinfo.R script (Tueller et al., 2011) was used to count the number of data sets that had correctly ordered labels and those who had incorrect labels for all a priori conditions and conditions where Tueller’s algorithm was used. Cho’s and Stephens’s algorithms calculated accuracy of their algorithms and combinations where their respective algorithms were used in the same way as Tueller’s, where a correctly classified data set is one where the algorithm assigns the order of the classes in the same way as they were set up in the simulation process. A baseline accuracy percentage was also calculated since some of the data sets will be correctly labeled at the start, before any algorithms are employed. A visual summary of the overall classification accuracy for each algorithm and combination of algorithms under each condition can be found in Figure 1a-1f.

Figure 1.

Figure 1.

(a) Two-class model classification accuracy for single algorithms. (b) Two-class model classification accuracy for combinations of algorithms. (c) Three-class model classification accuracy for single algorithms. (d) Three-class model classification accuracy for combinations of algorithms. (e) Four-class model classification accuracy for single algorithms. (f) Four-class model classification accuracy for combinations of algorithms.

The second way that classification accuracy was calculated was the percentage of correctly labeled individuals within each of the converged data sets. This calculation applies specifically to the a priori algorithms. To calculate classification accuracy in this way, the correct label was indicated within each data set. In the Mplus (Muthén & Muthén, 2017) .pro files, the assigned labels were given for each individual within each data set. For each a priori algorithm, an average classification accuracy was calculated so that the percentage of correct labels was calculated over all individuals over all data sets within each condition. A baseline was also calculated for comparison purposes. A visual summary for the baseline and a priori algorithms percent accuracy under each condition can be found in Figure 2a-2c.

Figure 2.

Figure 2.

(a) Two-class model average classification accuracy for individuals by condition and algorithm. (b) Three-class model average classification accuracy for individuals by condition and algorithm. (c) Four-class model average classification accuracy for individuals by condition and algorithm.

When reviewing the overall classification accuracy for all algorithms in Figures 1a-1f, the algorithms performed better for the two-class models in general. This finding is unsurprising, given that the number of possible label permutations increases as the number of classes increases, making the probability of accurate assignment less likely for three- and four-class models compared to two-class models. When class separation was greatest, the algorithms were also more accurate, which is to be expected because a larger amount of separation between classes would make for easier detection. The class probabilities and whether they were equal or unequal did not appear to be as large a factor in classification accuracy compared to class separation for the algorithms. It is important to note that Rodriguez and Walker (2014) either correctly ordered the labels for all data sets or incorrectly ordered the labels for all data sets. No pattern emerged to indicate that the algorithm correctly ordered data sets under particular conditions. Since we are determining accuracy based on the percentage of correctly ordered data sets, the algorithm appears to correctly classify data sets 0% or 100% of the time. However, it is necessary to say that this algorithm is quite powerful if a researcher is looking to aggregate class data and is unconcerned with the order of the labels. In this case, the algorithm consistently performed better than all the other algorithms tested.

When examining the performance of single algorithms under the two-class models, all the algorithms accurately classified labels when there was a large amount of class separation and unequal proportions, except for Stephens’s algorithm, which classified only 72% of data sets and Rodriguez and Walker’s algorithm, which switched the order of all labels. When there was a large amount of class separation and equal proportions, the class constraints algorithm, Stephens’s algorithm, and Rodriguez and Walker’s algorithm did not accurately classify over 95% of the data sets. In fact, the class constraints algorithm actually performed worse than baseline in this case, only correctly classifying 47% of data sets compared to the baseline accuracy of 72%. Stephens’s algorithm performed close to baseline, only accurately classifying 73% of the data sets. Rodriguez and Walker’s algorithm switched the order of all the labels. When there was minimal class separation, the training set algorithm was the most accurate. The training set algorithm correctly classified 93% of data sets when the proportions were equal and 94% when proportions were unequal.

Combining the class constraints algorithm with Stephens’s algorithm resulted in 100% classification rate of the data sets across all conditions. However, Tueller et al.’s and Cho’s algorithms are also quite effective under the two-class models when proportions were equal and class separation is high was equal or greater than 1.5. When MD was equal to 1.5, combining the class constraints algorithm with either Tueller et al.’s or Cho’s algorithms improved accuracy from 37% to 75% when proportions were equal and Stephens’s algorithm was able to classify 100% of the data sets. When MD was equal to 2, combining the class constraints algorithm with any of the post hoc algorithms increased accuracy from 47% to over 95% when proportions were equal. Combining Tueller et al.’s or Cho’s algorithm with the class constraints algorithm did not improve classification accuracy when proportions were unequal. Stephens’s algorithm was also the only post hoc algorithm used in conjunction with the mean constraints algorithm that resulted in improved accuracy for all two-class conditions; however, the classification rates were only 51% (equal class proportions) and 75% (unequal class proportions) when class separation was equal to 1.0. When paired with the mean constraints algorithm, Rodriguez and Walker’s (2014) algorithm correctly ordered 100% of the data sets where class proportions were unequal and separation was small or MD was equal to 1.5. The training set algorithm was very effective across all conditions, but combining it with Stephens’s algorithm when class separation was small resulted in 100% of data sets to be correctly classified.

Under the three-class models, the training set algorithm performed well across the board. Accuracy was 82% when class separation was smallest and class proportions were unequal, and 98% when class separation was smallest and class proportions were equal. Under all other conditions, the training set algorithm correctly classified 100% of the data sets. Rodriguez and Walker’s (2014) algorithm also performed well, correctly classifying 100% of the data sets for all conditions, except when MD was equal to 1.5 and class proportions were unequal. As with the two class models, the class constraints algorithm did not perform well. In fact, it performed worse than baseline when class separation was largest and proportions were equal, just like for the two-class models. While baseline accuracy was 32% under this condition, the class constraints algorithm had only 19% accuracy. When MD was equal to 1 or 1.5, the only algorithms that performed well were the training set and Rodriguez and Walker’s (2014) algorithm. The other tested algorithms were not reliable across these conditions. The class constraints algorithm accuracy ranged from 14% to 35% and the mean constraints algorithm ranged from 6% when class separation was smallest and proportions were unequal to 54% when class separation was 1.5 and proportions were equal. Apart from Rodriguez and Walker’s (2014) algorithm, the post hoc algorithms also struggled to accurately classify data sets when class separation was smallest and proportions were unequal (3% under Cho’s algorithm, 7% under Tueller et al.’s, and 26% under Stephens’s). The mean constraints and post hoc algorithms performed better when class separation was high, but their performance was not accurate enough to recommend their use under these conditions.

The only post hoc algorithm that consistently improved the mean constraints and training set algorithms was Stephens’s, although Rodriguez and Walker’s algorithm was very effective when class separation was low. Combining Stephens’s algorithm with the mean constraints algorithm was also particularly helpful when class separation was low, with increases in accuracy from 34% to 69% when class proportions were equal and 6% to 72% when class proportions were unequal. While performance was not accurate enough to recommend their use when other options are available, Stephens’s algorithm is a good post hoc choice to correct labels if mean constraints were used a priori. As with the two-class models, the training set was effective across all conditions, classifying 100% of data sets accurately when class separation was equal to 1.5 or 2. Combining Stephens’s algorithm with the training set algorithm when class separation was equal to 1.0 resulted in a jump from 82% to 100% of data sets being correctly classified when class proportions were unequal and an increase from 98% to 100% of data sets being correctly classified when class proportions were equal.

Tueller et al.’s and Cho’s algorithms consistently improved the classification accuracy of the class constraints algorithm. Tueller et al.’s (2011) algorithm combined with the class constraints algorithm more accurately classified data sets than when the class constraints algorithm was combined with Cho’s (2013) algorithm. In fact, when combining the class constraints algorithm with Tueller et al.’s (2011) algorithm, classification of data sets jumped from 23% to 96% when class separation was equal to 1.5 and class proportions were equal and from 19% to 100% when class separation was the largest and class proportions were equal. Using Cho’s (2013) algorithm, under the same conditions, the percent accuracy increased from 23% to 60% and 19% to 57%, respectively. It is, therefore, recommended that the class constraints algorithm be combined with Tueller et al.’s (2011) algorithm for more accurate results. As with the other conditions, Rodriguez and Walker’s (2014) algorithm either accurately ordered the data sets 100% of the time or not at all. When combined with the class constraints, the algorithm performed accurately when class separation was highest, as well as when class separation was lowest and class proportions were equal.

Under the four-class models, the only algorithm that was effective on its own in classifying data sets was the training set. Classification accuracy ranged from 74% of data sets when class separation was smallest and proportions were unequal to 92% and 93% when class separation was larger and proportions were equal. All the other algorithms failed to classify more than 20% of data sets correctly in any condition, except for Rodriguez and Walker’s algorithm, which performed well when MD was equal to 1.5 and class proportions were equal. The mean constraints algorithm was not effective in classifying data sets, even when combined with a post hoc algorithm. Of the four post hoc algorithms, Tueller et al.’s was the most effective in classifying data sets when combined with the class constraints, particularly when class proportions were equal. When class separation was highest and class proportions were equal, the percent classification accuracy improved from 4% using just the class constraint algorithm to 76% accuracy when combining Tueller et al.’s algorithm. Despite this marked improvement, the class constraints and mean constraints did not perform well under the four-class models, even when combined with post hoc algorithms. While the training set algorithm was effective in classifying data sets on its own, combining it with Stephens’s or Rodriguez and Walker’s algorithms resulted in 100% data set classification accuracy across all conditions.

In summary, the training set was the most accurate across two-, three-, and four-class models when considering overall classification accuracy. Classification accuracy of the mean constraints algorithm was improved by combining it with Stephens’s algorithm, but mean constraints do not appear to be an effective option compared to other available ones, unless class separation is high and there are only two or three classes. Overall, the class constraints algorithm was not a reliable solution to the label switching problem, however, combining the class constraints algorithm with a post hoc algorithm did improve classification accuracy. Rodriguez and Walker’s (2014) algorithm either correctly ordered all data sets in a condition or did not correctly order any. While inconsistent in determining the correct order, a researcher interested in aggregating the results without an interest in whether the order is “correct” would be recommended to use this algorithm under all conditions tested. However, given its inconsistency in accurate ordering, a summary of its performance under the various models will not be described below.

In the case of two-class models, combining the class constraints algorithm with Stephens’s algorithm improved performance substantially, while performance was much improved when adding Tueller et al.’s (2011) algorithm to the class constraints algorithm under the three-class models. Under the four-class models, Tueller et al.’s algorithm performed better than the other post hoc algorithms when combined with the class constraints, but not well enough to be recommended if other options are available. If a researcher is able to use the training set algorithm, it is the best choice under all conditions. It is also the only reliable algorithm when class separation is very small. If an a priori algorithm was not used, a post hoc algorithm does improve baseline accuracy. However, the post hoc algorithms only perform well on their own under two-class models. If a two-class model is used with little separation between classes, Stephens’s algorithm is the best choice of the four post hoc algorithms, while Tueller et al.’s and Cho’s algorithms are a better choice when class separation is high.

Examining Figure 2a-2c, which highlights the average percentage of correctly labeled individuals within each converged data set, shows the training set to most accurately classify when comparing the a priori methods. When class proportions are equal, the class constraints algorithm has a lower classification accuracy of individuals within each data set than the baseline. For example, when class separation was highest and class proportions were equal, the baseline accuracy for the two-class models was 65% while accuracy dropped to 49% using the class constraints algorithm. Under the same conditions for a three-class model, the baseline accuracy was 50%, which dropped to 36% using the class constraints algorithm and under a four-class model, the baseline accuracy dropped from 31% to 26%. This finding is similar to what was found when looking at overall data set accuracy.

Under the two-class models, when class proportions were equal, the class constraints, mean constraints, and training set were relatively similar with regard to accuracy. When class separation was smallest, accuracy ranged from 63% to 65%. When class separation was at 1.5, accuracy ranged from 75% to 77%. When class separation was largest, accuracy ranged from 83% to 84%. This finding is interesting because the training set did a much better job classifying data sets, despite it doing a similar job to the other algorithms classifying individuals within the data sets.

As with overall accuracy, under the three-class models, the training set algorithm does a much better job than the other algorithms classifying individuals. The training set algorithm correctly classifies 58% to 60% of individuals when class separation is lowest, 68% to 70% when class separation is equal to 1.5, and 76% to 79% when class separation is greatest. The mean constraints algorithm performs well when class separation is highest with accuracy at 69% when class proportions are equal to 71% when they are unequal. When looking at individual accuracy findings, the results are similar to the overall data set accuracy findings with the training set algorithm performing better overall compared to the other algorithms.

Under the four-class models, the training set algorithm does the best job of classifying individuals, correctly classifying 47% to 48% of individuals when class separation is lowest, 54% to 56% when class separation is equal to 1.5, and 60% to 62% when class separation is largest. The mean constraints algorithm performs better than the class constraints algorithm, ranging from classification accuracy of 34% to 50% when labeling individuals. As with the three-class models, the a priori algorithms that perform well when classifying overall data sets also perform well when classifying the individuals within the data sets.

Discussion

In this study, a number of algorithms and combinations were tested under two-, three-, and four-class models with different class separation and class probabilities using maximum likelihood estimation. The results have yielded a number of recommendations when simulating linear latent GMMs:

  1. When planning to incorporate an a priori label correction procedure, the training set option is the most accurate under all tested conditions. It is the only a priori algorithm that accurately orders class labels when class separation is small.

  2. While Rodriguez and Walker’s (2014) algorithm is inconsistent in its ability to correctly order data sets, it is an incredibly powerful algorithm if being used to aggregate the output without interest in the accuracy of the class order. If a researcher is solely interested in aggregating the output at the class level, this algorithm is highly recommended under all conditions and serves as a very attractive post hoc option.

  3. If the training set option is used, combining Stephens’s (2000) algorithm yields better results, particularly when class separation is small or when there are more than three classes. If the researcher is interested in aggregating the output without regard for the correct ordering of classes, combining Rodriguez and Walker’s (2014) algorithm is also a great choice.

  4. If an a priori label correction procedure is not used and Rodriguez and Walker’s (2014) algorithm cannot be used, Stephens’s (2000) algorithm is a reasonable post hoc option when class separation is small, while Tueller et al.’s (2011) and Cho’s (2013) algorithms are better options when class separation is large or in other cases when there will likely be less correction needed. All post hoc algorithms perform reasonably well on their own under a two-class model and Tueller et al.’s (2011) algorithm performs better than the other Stephens’s (2000) and Cho’s (2013) under three-class models. Again, if the order of the labels is not important, Rodriguez and Walker’s (2014) algorithm is the only post hoc option we recommend be used on its own for four-class models. Of the remaining three, Stephens’s (2000) and Tueller et al.’s (2011) algorithms appeared to perform better than Cho’s (2013) algorithm under these conditions.

  5. The class constraints option is not an optimal choice for label correction, especially when class separation is minimal. However, if the class constraints algorithm was used a priori, combining it with Rodriguez and Walker’s (2014) or Stephens’s (2000) algorithm under a two-class model or Rodriguez and Walker’s (2014) or Tueller et al.’s (2011) algorithm under a three- or four-class model is recommended.

  6. When class separation is large, the mean constraints option performs as well as one of the post hoc algorithms on its own, but it should be used with caution when class separation is less than MD = 2.0 or when there are more than three classes.

This study provides an overview of several algorithms under a wide variety of conditions. Future studies focusing on the generalizability of these findings to other mixture models would be interesting. Since Tueller et al.’s (2011) algorithm was originally created for a latent variable mixture model and Cho’s (2013) algorithm was used for a mixture IRT model, it would be worthwhile to examine how the algorithms perform under these models in particular.

Additionally, while this study selected the values of the manipulated conditions based on what previous studies have suggested are commonly observed, looking at different levels of the manipulated conditions could also provide insight into the label switching issue. For example, when considering Cho’s (2013) algorithm, it appears to do a reasonable job of properly labeling two- and three-class models but struggles to do so for four-class models. While a larger number of classes results in more possible permutations of the class labels, which can lead to lower classification rates, it is likely that Cho’s algorithm was additionally challenged by the fact that the slope and intercept values that were chosen for the simulation were not ordered (see Table 6 for slope and intercept values). Tueller et al.’s (2011) algorithm uses a class assignment matrix, while Stephens’s (2000) and Rodriguez and Walker’s (2014) algorithms both use posterior class probabilities, in part, to determine how many cases were correctly classified. Cho’s (2013) method, on the other hand, relies on slope and intercept values to identify latent class membership. It is possible that Cho’s (2013) algorithm may have performed much better with ordered class slopes and intercepts, similar to those of a mixture IRT model.

Other promising frequentist label switching algorithms (Yao, 2015) were not used in the current study, but would be worthwhile to investigate in future studies. In one algorithm, Yao (2015) proposes to use the complete data log-likelihood which is not invariant to the permutation of component labels, since the class indicator variable carries the label information. This method requires specialized coding for its implementation. Similar in spirit to Stephens’s algorithm, Yao’s second algorithm does the labeling by minimizing the Euclidean distance between the classification probabilities and the true latent class labels.

This study looked at both frequentist and Bayesian label switching algorithms, where each algorithm has a different criterion for judging whether the labels should be switched. In MCMC, the number of samples is often in the thousands while the current study was over replications of 500 data sets within each condition. Additionally, the output from the label.switching package (Papastamoulis, 2016) was not the same as the output provided by Tueller et al.’s (2011) algorithm nor Cho’s (2013), which made it difficult to fairly compare. In order to make the comparison as fair as possible, the correct ordering of labels was taken into consideration in our calculation of accurate label switching. Doing so resulted in Rodriguez and Walker’s (2014) algorithm to either appear as correctly ordering all data sets or none of them. However, this algorithm is quite promising if a researcher is focused on aggregating class labels without regard to proper ordering of the classes. While Rodriguez and Walker’s (2014) algorithm is meant to correct label switching in a Bayesian context with MCMC iterations, it appears to perform well within a Monte Carlo simulation framework as well.

While the training set algorithm performed very well in the context of this study, one challenge to using it is determining the appropriate amount of data to allocate for training. Using training data can help “teach” the software what characteristics to look for to determine latent class membership, but if the training data are insufficient or not fully representative of the data set, it is possible that the algorithm could mislabel classes. Allocating too much of the data set for training, on the other hand, does not mimic statistical practice. Determining strategies for apportioning a sufficient amount of training data would merit future investigation.

Class enumeration in mixture models is not well understood. As Hipp and Bauer’s (2006) study discusses, the underlying question of whether a mixture exists in one’s data remains unresolved. Hipp and Bauer (2006) demonstrate that placing invariance constraints to minimize local optima does not completely exclude the possibility that they exist. Additionally, constraints placed on mixture models need to be consistent with theory as they have a large impact on model estimates. The current study contributes to the class enumeration issue through investigation of a number of a priori and post hoc algorithms, including mean and class identifiability constraints, and their combinations to determine what algorithms accurately identify classes within the data.

Acknowledgments

The authors would like to thank Dr. Stephen Tueller who graciously provided his assistance in thinking through modifications to the Tueller et al. R scripts so they would work with a growth mixture model.

Appendix

Mplus Sample Script for Constraining Means

Data Set 1 Condition 3: Three Classes With Equal Probabilities and MD = 1

DATA:

FILE IS dat-1.dat;

VARIABLE:

NAMES ARE id y1-y5 cls c1 c2 c3;

AUXILIARY IS cls;

USEVAR ARE y1-y5;

CLASSES = c(3);

ANALYSIS:

TYPE = MIXTURE;

STARTS = 50 10;

STITERATIONS = 50;

ITERATIONS = 1000;

SDITERATIONS = 250;

MITERATIONS = 500;

MCONVERGENCE = 1E-5;

MODEL:

%overall%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

i*1.5; s*.4; i WITH s*.2;

y1-y5*(vare);

%c#1%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*1.23](i1);

[s*1](s1);

%c#2%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*0.25](i2);

[s*0.5](s2);

%c#3%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*-0.97](i3);

[s*0.25](s3);

MODEL CONSTRAINTS:

i1 > i2;

s1 > s2;

i2 > i3;

s2 > s3;

SAVEDATA:

Results are cond3_dat1.par;

FILE IS class_cond3_dat1.pro;

SAVE IS CPROBABILITIES;

SAVE IS FSCORES;

OUTPUT:

TECH1 TECH4 TECH11;

Mplus Sample Script for Constraining Class Probabilities

Data Set 1 Condition 3: Three Classes With Equal Probabilities and MD = 1

DATA:

FILE IS dat-1.dat;

VARIABLE:

NAMES ARE id y1-y5 cls c1 c2 c3;

AUXILIARY IS cls;

USEVAR ARE y1-y5;

CLASSES = c(3);

ANALYSIS:

TYPE = MIXTURE;

STARTS = 50 10;

STITERATIONS = 50;

ITERATIONS = 1000;

SDITERATIONS = 250;

MITERATIONS = 500;

MCONVERGENCE = 1E-5;

MODEL:

%overall%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

i*1.5; s*.4; i WITH s*.2;

y1-y5*(vare);

[c#1](c1prop);

[c#2](c2prop);

%c#1%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*1.23];

[s*1];

%c#2%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*0.25];

[s*0.5];

%c#3%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*-0.97];

[s*0.25];

MODEL CONSTRAINT:

c1prop = 0.33;

c2prop = 0.33;

SAVEDATA:

Results are cond3_dat1.par;

FILE IS class_cond3_dat1.pro;

SAVE IS CPROBABILITIES;

SAVE IS FSCORES;

OUTPUT:

TECH1 TECH4 TECH11;

Mplus Sample Script for the Training Set

Data Set 1 Condition 3: Three Classes With Equal Probabilities and MD = 1

DATA:

FILE IS dat-1.dat;

VARIABLE:

NAMES ARE id y1-y5 cls c1 c2 c3;

AUXILIARY IS cls;

USEVAR ARE y1-y5 c1 c2 c3;

CLASSES = c(3);

TRAINING = c1 c2 c3 (MEMBERSHIP);

ANALYSIS:

TYPE = MIXTURE;

STARTS = 50 10;

STITERATIONS = 50;

ITERATIONS = 1000;

SDITERATIONS = 250;

MITERATIONS = 500;

MCONVERGENCE = 1E-5;

MODEL:

%overall%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

i*1.5; s*.4; i WITH s*.2;

y1-y5*(vare);

%c#1%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*1.23];

[s*1];

%c#2%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*0.25];

[s*0.5];

%c#3%

i s | y1@0 y2@1 y3@2 y4@3 y5@4;

[i*-0.97];

[s*0.25];

SAVEDATA:

Results are cond3_dat1.par;

FILE IS class_cond3_dat1.pro;

SAVE IS CPROBABILITIES;

SAVE IS FSCORES;

OUTPUT:

TECH1 TECH4 TECH11;

1.

Tueller et al.’s (2011) algorithm and information about how to use it can be found in their article. Copies of the R code that the authors of this study used to run the algorithm can be provided by the first author on request.

2.

Cho’s algorithm can be made available by the second author on request.

3.

Stephens’s (2000) and versions of Papastamoulis and Iliopoulos’s (2010) algorithms can be executed using the label.switching package (Papastamoulis, 2016) in R. A copy of the R code that the authors of the study wrote to compile the posterior probabilities and calculate the classification rate can be provided by the first author on request.

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: Kristina R. Cassiday Inline graphichttps://orcid.org/0000-0003-3491-3775

References

  1. Casella G., George E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167-174. 10.1080/00031305.1992.10475878 [DOI] [Google Scholar]
  2. Cho Y. (2013). The mixture distribution polytomous Rasch model used to account for response styles on rating scales: A simulation study of parameter recovery and classification accuracy. [Unpublished doctoral dissertation]. University of Maryland. [Google Scholar]
  3. Dempster A. P., Laird N. M., Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1-22. 10.1111/j.2517-6161.1977.tb01600.x [DOI] [Google Scholar]
  4. Depaoli S. (2013). Mixture class recovery in GMM under varying degrees of class separation: Frequentist versus Bayesian estimation. Psychological Methods, 18(2), 186-219. 10.1037/a0031609 [DOI] [PubMed] [Google Scholar]
  5. Diebolt J., Robert C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society, Series B, 56, 363-375. 10.1111/j.2517-6161.1994.tb01985.x [DOI] [Google Scholar]
  6. Enders C. K., Tofighi D. (2008). The impact of miss-specifying class-specific residual variances in growth mixture models. Structural Equation Modeling, 15(1), 75-95. 10.1080/10705510701758281 [DOI] [Google Scholar]
  7. Everitt B. S., Hand D. J. (1981). Finite mixture distributions. Springer. 10.1007/978-94-009-5897-5 [DOI]
  8. Gelfand A., Smith A. (1990). Sampling-based approaches to calculating marginal densities. Journal of American Statistical Association, 85(410), 398-409. 10.1080/01621459.1990.10476213 [DOI] [Google Scholar]
  9. Hipp J. R., Bauer D. J. (2006). Local solutions in the estimation of growth mixture models: Correction to Hipp and Bauer (2006). Psychological Methods, 11(3), 305. 10.1037/1082-989X.11.3.305 [DOI] [PubMed] [Google Scholar]
  10. Jasra A., Holmes C. C., Stephens D. A. (2005). Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50-67. 10.1214/088342305000000016 [DOI] [Google Scholar]
  11. Kohli N., Hughes J., Wang C., Zopluoglu C., Davison M. L. (2015). Fitting a linear–linear piecewise growth mixture model with unknown knots: A comparison of two common approaches to inference. Psychological Methods, 20(2), 259-275. 10.1037/met0000034 [DOI] [PubMed] [Google Scholar]
  12. Lee D. Y. (2019). Handling of missing data with growth mixture models [Unpublished doctoral dissertation]. University of Maryland. [Google Scholar]
  13. Lee S. Y. (2007). Structural equation modeling: A Bayesian approach. Wiley; 10.1002/9780470024737 [DOI] [Google Scholar]
  14. Li M., Harring J. R., Macready G. B. (2014). Investigating the feasibility of using Mplus in the estimation of growth mixture models. Journal of Modern Applied Statistical Methods, 13(1), 484-513. 10.22237/jmasm/1398918600 [DOI] [Google Scholar]
  15. McArdle J. J., Epstein D. (1987). Latent growth curves within developmental structural equation models. Child Development, 58(1), 110-133. 10.2307/1130295 [DOI] [PubMed] [Google Scholar]
  16. McLachlan G. J., Krishnan T. (2007). The EM algorithm and extensions (2nd ed.). Wiley. 10.1002/9780470191613 [DOI]
  17. McLachlan G., Peel D. (2000). Finite mixture models (Wiley Series in Probability and Statistics). Wiley. 10.1002/0471721182 [DOI]
  18. McNeish D., Harring J. R. (2017). The effect of model misspecification on growth mixture model class enumeration. Journal of Classification, 34(2), 223-248. 10.1007/s00357-017-9233-y [DOI] [Google Scholar]
  19. Meredith W., Tisak J. (1990). Latent curve analysis. Psychometrika, 55(1), 107-122. 10.1007/BF02294746 [DOI] [Google Scholar]
  20. Muthen B., Asparouhov T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313-335. [DOI] [PubMed] [Google Scholar]
  21. Muthén L. K., Muthén B. O. (2017). Mplus (Version 8). Muthén & Muthén. [Google Scholar]
  22. Neale M. C., Hunter M. D., Pritikin J. N., Zahery M., Brick T. R., Kirkpatrick R. M., Estabrook R., Bates T. C., Maes H. H., Boker S. M. (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535-549. 10.1007/s11336-014-9435-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Newcomb S. (1886). A generalized theory of the combinations of observations so as to obtain the best result. American Journal of Mathematics, 8(4), 343-366. 10.2307/2369392 [DOI] [Google Scholar]
  24. Nylund K. L., Asparouhov T., Muthén B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535-569. 10.1080/10705510701575396 [DOI] [Google Scholar]
  25. Nylund-Gibson K., Masyn K. E. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on enumeration. Structural Equation Modeling, 23(6), 782-797. 10.1080/10705511.2016.1221313 [DOI] [Google Scholar]
  26. Papastamoulis P. (2014). Handling the label switching problem in latent class models via the ECR algorithm. Communications in Statistics, 43(4), 913-927. 10.1080/03610918.2012.718840 [DOI] [Google Scholar]
  27. Papastamoulis P. (2016). label.switching: An R package for dealing with the label switching problem in MCMC outputs. Journal of Statistical Software, 69, 1-24. 10.18637/jss.v069.c01 [DOI] [Google Scholar]
  28. Papastamoulis P., Iliopoulos G. (2010). An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions. Journal of Computational and Graphical Statistics, 19(2), 313-331. 10.1198/jcgs.2010.09008 [DOI] [Google Scholar]
  29. Pearson K. (1894). Contribution to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A, 185, 71-110. 10.1098/rsta.1894.0003 [DOI] [Google Scholar]
  30. Peugh J. L., Fan X. (2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling, 19(2), 204-226. 10.1080/10705511.2012.659618 [DOI] [Google Scholar]
  31. Peugh J., Fan X. (2015). Enumeration index performance in generalized growth mixture models: A Monte Carlo test of Muthén’s (2003) hypothesis. Structural Equation Modeling, 22(1), 115-131. 10.1080/10705511.2014.919823 [DOI] [Google Scholar]
  32. R Development Core Team. (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org/ [Google Scholar]
  33. Rodriguez C., Walker S. (2014). Label switching in Bayesian mixture models: Deterministic relabeling strategies. Journal of Computational and Graphical Statistics, 23(1), 25-45. 10.1080/10618600.2012.735624 [DOI] [Google Scholar]
  34. Stephens M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society, Series B, 62(4), 795-809. 10.1111/1467-9868.00265 [DOI] [Google Scholar]
  35. Titterington D. M., Smith A. F. M., Makov U. E. (1985). Statistical analysis of finite mixture distributions. Wiley. [Google Scholar]
  36. Tolvanen A. (2008). Latent growth mixture modeling: A simulation study [Unpublished doctoral dissertation]. University of Jyväskylä. [Google Scholar]
  37. Tueller S. J., Drotar S., Lubke G. H. (2011). Addressing the problem of switched class labels in latent variable mixture model simulation studies. Structural Equation Modeling, 18, 110-131. 10.1080/10705511.2011.534695 [DOI] [Google Scholar]
  38. Vermunt J. K., Magidson J. (2013). Latent GOLD 5.0 upgrade manual. Statistical Innovations. [Google Scholar]
  39. Wang L., McArdle J. J. (2008). A simulation study comparison of Bayesian estimation with conventional methods for estimating unknown change points. Structural Equation Modeling, 15(1), 52-74. 10.1080/10705510701758265 [DOI] [Google Scholar]
  40. Yao W. (2015). Label switching and its solutions for frequentist mixture models. Journal of Statistical Computation and Simulation, 85(5), 1000-1012. 10.1080/00949655.2013.859259 [DOI] [Google Scholar]

Articles from Educational and Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES