Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Aug 1.
Published in final edited form as: Sociol Methodol. 2010 Aug;40(1):191–245. doi: 10.1111/j.1467-9531.2010.01224.x

Finite Normal Mixture SEM Analysis by Fitting Multiple Conventional SEM Models*

Ke-Hai Yuan 1, Peter M Bentler 2
PMCID: PMC3002113  NIHMSID: NIHMS187824  PMID: 21170153

Abstract

This paper proposes a two-stage maximum likelihood (ML) approach to normal mixture structural equation modeling (SEM), and develops statistical inference that allows distributional misspecification. Saturated means and covariances are estimated at stage-1 together with a sandwich-type covariance matrix. These are used to evaluate structural models at stage-2. Techniques accumulated in the conventional SEM literature for model diagnosis and evaluation can be used to study the model structure for each component. Examples show that the two-stage ML approach leads to correct or nearly correct models even when the normal mixture assumptions are violated and initial models are misspecified. Compared to single-stage ML, two-stage ML avoids the confounding effect of model specification and the number of components, and is computationally more efficient. Monte-Carlo results indicate that two-stage ML loses only minimal efficiency under the condition where single-stage ML performs best. Monte-Carlo results also indicate that the commonly used model selection criterion BIC is more robust to distribution violations for the saturated model than that for a structural model at moderate sample sizes. The proposed two-stage ML approach is also extremely flexible in modeling different components with different models. Potential new developments in the mixture modeling literature can be easily adapted to study issues with normal mixture SEM.

Keywords: Asymptotics, efficiency, distribution violation, model misspecification, model modification, model evaluation, sandwich-type covariance matrix

1. Introduction

In many disciplines, practical data may come from heterogeneous populations and the membership of each observation is unknown. Ignoring the heterogeneous nature of the sample can lead to false conclusions. As a methodology to account for heterogeneous populations with categorical data, latent class models have been well developed and widely used in social sciences (e.g., Clogg, 1995; Goodman, 1974a,b; Hagenaars & McCutcheon, 2002). Mixture models with continuous variables have also found wide applications in various disciplines (e.g., Everitt & Hand, 1981; McLachlan & Peel, 2000; Titterington, Smith & Makov, 1985).

An early study of mixture confirmatory factor analysis was conducted by Blåfield (1980) using single-stage maximum likelihood (ML). Yung (1994, 1997) proposed three approaches to estimate the parameters in normal mixture factor analysis, two single-stage ML methods by expectation-maximization (EM) and approximate scoring algorithms, and one ad hoc two-stage approach. Single-stage ML for mixture structural equation modeling (SEM) was studied by Jedidi, Jagpal, and DeSarbo (1997) and Dolan and van der Maas (1998). Muthén and Shedden (1999) extended the mixture model to including continuous and categorical variables. Arminger and Stein (1997), and Arminger, Stein and Wittenberg (1999) developed three procedures to model mixtures with conditional means and covariance structures, two single-stage ML and one two-stage procedure. In the first stage of their two-stage procedure, (conditional) saturated means and covariances are estimated together with their asymptotic covariance matrix. The generalized least squares (GLS) or asymptotically distribution free (ADF) approach is used in the second stage. Hoshino (2001) and Zhu and Lee (2001) developed Bayes approaches to mixture SEM. None of these approaches allows standard model evaluation as is done with conventional SEM models. In this paper, we propose a two-stage ML approach to normal mixture SEM. In the first stage, saturated model MLEs and their asymptotic covariance matrices are obtained. Rather than using an approximation to the information matrix to obtain the asymptotic covariance matrix as in Arminger et al. (1999), for greater robustness we propose using a sandwich-type covariance matrix as the asymptotic covariance matrix. In the second stage, we propose fitting conventional SEM models to the means and covariances for each component obtained at stage-1. The paper will develop statistical inference for the proposed two-stage ML, including consistent standard errors (SE) and proper statistics for overall model evaluation.

In the literature on mixture models, one of the criticisms of single-stage ML is that misspecified structural models tend to yield more components than really exist in the population (Bauer & Curran, 2004; Biernacki, Celeux & Govaert, 2000; Lubke & Neale, 2006). Another drawback of single-stage ML seldom discussed in the literature is that, even if the number of components is correctly identified, the problem remains of judging how good the model is when fitting the overall model. It is common knowledge with conventional SEM that any interesting model is only an approximation to the real world, and that substantively and statistically acceptable models are hard to specify. This problem becomes even more pronounced in mixture SEM, where a heterogeneous population will require multiple acceptable models, one for each component. AIC, BIC and other model selection criteria only offer us relative information. Although global fit indices might be constructed when fitting a normal mixture SEM (Bauer & Curran, 2004; Jedidi et al., 1997; Tofighi & Enders, 2008), it is not clear whether we can use established guidelines for the corresponding fit index in a conventional SEM model (Hu & Bentler, 1999) to judge the quality of the mixture model. An unsatisfactory fit index also does not tell us whether one component has a gross misspecification or whether several models are moderately/grossly misspecified. Furthermore, in single-stage ML, a misspecification of one component of the model will affect the estimation of the rest. There might be also a masking effect with multiple misspecifications. Because the number of components is confounded with model misspecification, it is close to impossible to develop a satisfactory procedure for SEM diagnosis.

In contrast to single-stage ML, the two-stage approaches advocated by Yung (1994) and Arminger et al. (1999) avoid the confounding effect of the number of components and misspecified structural models. In their second stage, the ADF or GLS approach still simultaneously evaluates all the structural models, which does not permit us to judge the quality of model-fit for each component. With multiple components, the ADF/GLS approach will have a much higher dimension than their conventional counterpart, and one can expect that neither the test statistic for overall model evaluation nor the SEs of parameter estimates would allow us to reliably evaluate the quality of the overall model or an individual parameter (Hu, Bentler & Kano, 1992; Yuan & Bentler, 1997).

The idea of the first stage proposed here is the same as in Yung (1994) and Arminger et al. (1999), where the saturated model is estimated by ML using EM algorithm to avoid the confounding of model misspecification and the number of components. In estimating the asymptotic covariance matrix of the obtained MLEs for the saturated means and covariances, we propose to use a sandwich-type covariance matrix that will account for possible violation of distribution conditions. For example, one or all of the component distributions may have heavier tails than that of the normal distribution. In that case, inverting the normal distribution based information matrix will not yield consistent model evaluation. The sandwich-type covariance matrix still permits obtaining consistent SEs and reliable overall model evaluation in the second stage. Details on the sandwich-type covariance matrix will be provided in section 2.

After obtaining the MLEs of means and covariances for all the components and consistent estimates of their asymptotic covariance matrices in the first stage, the problem turns into conventional SEM in the second stage. In particular, we can regard the obtained means and covariances for each component as the sample means and sample covariances from a distribution that may not be normally distributed. Thus, when fitting the means and covariances by the proposed structural models using the normal distribution based discrepancy function, corrections are needed using the obtained asymptotic covariance matrices from stage-1 to obtain consistent SEs and reliable statistics for overall model evaluation at stage-2. The number of models fitted at stage-2 is just the number of components identified at stage-1. Such a practice enjoys several merits. (I) The size of the estimation problem in the second stage is much smaller, thus the procedure should yield more stable estimates than the GLS/ADF approach proposed in Arminger et al. (1999). (II) Existing diagnostic techniques 3 in conventional SEM, such as model modification or Lagrange multiplier (LM) tests, are directly applicable to each of the models fitted at stage-2. (III) Model evaluation procedures (e.g., statistics and fit indices) developed for conventional SEM are ready to use to evaluate the model for each component. Although the commonly used likelihood ratio statistic is not valid anymore, there are still several statistics that have been shown both analytically and empirically to work reliably for overall model evaluation.

Mixture models for continuous and categorical data have been implemented in Mplus (Muthén & Muthén, 2007). The option “ESTIMATOR=MLR;” provides consistent standard errors when distribution is violated. It also provides model modification to guide the specification of a mixture model. Because no statistic exists to judge how good the overall model is in single-stage ML, it is difficult to know whether the modified model is statistically sound. We will illustrate this through an example.

Under a set of idealized conditions the two-stage ML approach is not as desirable as single-stage ML. That is, when (1) the sample is from a truly normal mixture; (2) the number of components is correctly specified; (3) sample size is very large; and (4) we have a correct structural model for each component, then two-stage ML will not generate parameter estimates as efficient as the single-stage ML estimates. But the above conditions are too strong to be satisfied in most real applications of normal mixture SEM. When any of the above conditions is violated, whether or not single-stage ML still maintains an advantage over two-stage ML cannot be established analytically. We will empirically compare the two approaches by examples and Monte Carlo with violation of conditions as well as when the normal mixture assumption perfectly holds.

In the first stage of two-stage ML, we still need to determine the number of components. Following the recommendation of the literature on model selection (e.g., Bauer & Curran, 2004; McLachlan & Peel, 2000), we will use BIC for such a purpose. Because, with distribution violations, BIC tends to choose more components than is necessary with structured models (Bauer & Curran, 2003), we will study its performance with the saturated model. Although the statistics for overall model evaluation at stage-2 allow us to judge whether substantively interesting models can fit the data adequately and it is unlikely for these models to fit the data well when the number of components is wrong, it is always reassuring when BIC for the saturated model performs better. Such a study will be conducted by Monte Carlo.

In contrast to the existing literature of normal mixture SEM, we will emphasize methods for valid statistical inference when conditions are violated. In section 2 we will provide the statistical development for our two-stage ML, including the way to obtain consistent SEs and test statistics for overall model evaluation. In section 3 we will illustrate the proposed procedure using examples with real and simulated data. In section 4, we will use Monte Carlo to compare efficiency and biases of parameter estimates obtained with single-stage and two-stage ML. We will also compare the performance of the model selection criterion BIC with these two approaches when conditions of normal mixture modeling are violated. Section 5 offers some conclusions and discussion. Technical details for obtaining the sandwich-type covariance matrix are given in an appendix.

2. Model Inference

In this section we provide statistics for model inference with two-stage ML. This includes obtaining the MLEs and the sandwich-type covariance matrices for the saturated means and covariances at stage-1, and test statistics for overall model evaluation and consistent SEs for the structural parameter estimates at stage-2.

2.1 The first-stage

Let x1, x2, …, xn be a sample from a p-variate population x. Under the normal mixture model the density function of x can be written as

f(x)=j=1mπjfj(x), (1)

where πj is the proportion of the jth component that satisfies j=1mπj=1, and

fj(x)=1(2π)p/2|Σj|1/2exp[(xμj)Σj1(xμj)/2]

is the density function of the p-variate normal distribution Np(μj, Σj) for the jth component. Let σj = vech(Σj) be the vector of stacking the columns of the lower-triangular part of Σj and βj=(μj,σj). Then the vector of parameters for the saturated model is

β=(π1,π2,,πm1;β1,β2,,βm).

Let

li(β)=log[j=1mπjfj(xi)]=log{j=1m1πj[fj(xi)fm(xi)]+fm(xi)}. (2)

The observed log likelihood function is

l(β)=i=1nli(β).

Let β̂ be the proper value that maximizes l(β), which can be obtained by the EM algorithm when component membership together with xi are the augmented complete data (see chapter 3 of McLachlan & Peel, 2000). To be self-contained, appendix A provides the formulas for obtaining β̂ using the EM algorithm. Unlike the conventional normal distribution based likelihood function, l(β) usually has multiple local maxima. In particular, there exist singular Σj at which l(β) = ∞. In order to get the proper MLE, multiple starting values are needed. At the end of each M-step, it is necessary to check whether the determinant of each Σj is close to zero. The iteration process of the EM algorithm needs to restart with a new set of starting values whenever any of the Σj (j = 1, 2, …, m) is close to being singular.

Let i(β) = ∂li(β)/∂β and i(β) = ∂2li(β)/∂ββ′. For a proper MLE β̂, there exists a β0 satisfying

E[l˙i(β0)]=0

such that β̂ is consistent for β0. Actually, the β̂ satisfies the estimating equation

i=1nl˙i(β^)=0.

When n is sufficiently large, β̂ will be in a neighborhood of β0. With a random sample xi, i = 1, 2, …, n, the condition for the consistency of β̂ is β ∈ ℬ, which is a compact subset of the Euclidean space ℛm−1+mp(p+3)/2. Under the additional regularity condition that β0 is an interior point of ℬ, we have (Yuan & Jennrich, 1998)

n(β^β0)N(0,Γsw), (3)

where

Γsw=A1BA1

with A = −E[i(β0)] and B=E[l˙i(β0)l˙i(β0)]. The result in (3) does not need the f(x) in (1) to truly describe the population distribution. When f(x) only approximately describes the population and m is correctly chosen, the means and covariances in β0 only approximate the population means and covariances. When the number of components m is not correct, then β0 still represents the best summary of the underlying population using m mean vectors and covariance matrices, as characterized by the Kullback Leibler discrepancy criterion. When β0 represents a good summary of the underlying population, we obtain a better understanding of the heterogeneous nature of the population than treating the sample as from a single homogeneous population. Note that β0 being an interior point of ℬ implies that the result in (3) does not hold when either certain Σj is singular or certain πj = 0. So we need to avoid near singular Σ̂j and/or π̂j that are close to 0 in the MLE β̂.

When the f(x) in (1) truly describes the population distribution of x, A is called the information matrix. We have A = B and

Γsw=Γin=A1

in such a case. When all the components have heavier tails than that of the normal distribution, BA will be a nonnegative or positive definite matrix. Thus, SEs based on Γsw tend to be greater than those based on Γin. Similarly, when all the components have tails lighter than that of the normal distribution, SEs based on Γsw tend to be smaller than those based on Γin. Whether B > A or not, when f(x) only approximately describes the distribution of x, consistent estimates of A and B are given by

A^=1ni=1nl¨i(β^)  and  B^=1ni=1nl˙i(β^)l˙i(β^).

The explicit formulas for i(β) and i(β) are rather complicated and are given in appendix B. These formulas are straightforward to evaluate and are only needed once at the proper MLE β̂.

Arminger et al. (1999) used the inverse of the information matrix A−1 to describe the asymptotic behavior of β̂. They used the cross-product of the first derivatives of the so-called “complete data” to approximate the information matrix, neither  nor .

The result in (3) implies that, for the estimates of the means and covariances of the jth component,

n(β^jβj0)Np+p*(0,Γjj), (4)

where Γjj is a (p + p*) × (p + p*) submatrix of Γ = Γsw or Γin corresponding to β̂j, with p* = p(p + 1)/2. The result in (4) parallels the following result for the sample mean vector ȳ and sample covariance matrix S based on a sample of size n from a homogeneous population y:

n(y¯μvech(S)vech(Σ))Np+p*(0,Γy), (5)

where μ = E(y), Σ = Cov(y) and Γy = Cov(y*) with y* = [y′, vech′{(yμ)(yμ)′}]′. Comparing (4) with (5), we have the same amount of information to evaluate any particular structural model μj0 = μ(θj0) and Σj0 = Σj(θj0) for the jth component in the mixture modeling context as in conventional SEM or SEM with one component.

So we may just treat μ̂j as the sample mean vector and Σ̂j as the sample covariance matrix of the jth component for the next stage analysis. Here our interest is in SEM. The second-stage analysis can be principal components, exploratory factory analysis or any multivariate analysis involving the sample mean vector and sample covariance matrix. Actually, multilevel SEM has been developed using essentially the same idea (Yuan & Bentler, 2007). Compared to single-stage ML, the two-stage approach is more crucial with normal mixture SEM than with multilevel modeling because it allows us to segregate various effects and to be able to use the accumulated knowledge of conventional SEM to study the more challenging problems associated with normal mixture SEM (see Bauer, 2007).

2.2 The second stage

We need to choose a method to fit the structural model (μj(θj), Σj(θj)) to (μ̂j, Σ̂j). We will use the normal distribution based discrepancy function1

FML(θj)=tr[Σ^jΣj1(θj)]log|Σ^jΣj1(θj)|+[μ^jμj(θj)]Σj1(θj)[μ^jμj(θj)]p (6)

for mean and covariance structure analysis and

FMLc(θcj)=tr[Σ^jΣj1(θcj)]log|Σ^jΣj1(θcj)|p (7)

for just covariance structure analysis. Let θ̂j minimize FML(θj) and θ̂cj minimize FMLc(θcj), we will discuss the properties of these estimators shortly. We choose (6) and (7) because (i) minimizing these functions generates more efficient parameter estimates than the GLS/ADF procedure in the context of conventional SEM even when y does not follow a normal distribution (Yuan & Bentler, 1997), (ii) there exist several statistics for overall model evaluation with nice properties, (iii) these discrepancy functions are most widely used in practice and are the default procedure in essentially all SEM software. Of course, one may choose another discrepancy function if needed. Actually, with (4), we are back to conventional SEM. Any existing method in conventional SEM can be applied if deemed necessary. For example, it is totally legitimate to be interested in mean and covariance structures for the jth component and only the covariance structure for the kth component. This poses no extra difficulty for the proposed two-stage ML.

Let

βj(θj)=(μj(θj)σj(θj))  β˙j(θj)=βj(θj)/θj,  σ˙j(θcj)=σj(θcj)/θcj;Wcj(θcj)=12Dp[Σj1(θcj)Σj1(θcj)]Dp,  and  Wj(θj)=diag[Σj1(θj),Wcj(θj)].

We will omit the arguments of the above functions when evaluated at θj0 or θcj0. With the above notation, we have

n(θ^jθj0)Nqj(0,Ωj), (8a)

where

Ωj=(β˙jWjβ˙j)1(β˙jWjΓjjWjβ˙j)(β˙jWjβ˙j)1; (8b)

and

n(θ^cjθcj0)Nqcj(0,Ωcj), (9a)

where

Ωcj=(σ˙jWcjσ˙j)1(σ˙jWcjΓcjjWcjσ˙j)(σ˙jWcjσ˙j)1, (9b)

with Γcjj being the submatrix of Γjj corresponding to σ̂j. Consistent estimators of Ωj and Ωcj are obtained when replacing θj0 by θ̂j and Γjj by Γ̂jj in (8); and replacing θcj0 by θ̂cj and Γcjj by Γ̂cjj in (9). Notice that Γjj does not equal Wj even when x truly follows a normal mixture model. So the SEs of θ̂j or Wald statistics regarding elements of θj always need to be based on the above sandwich-type covariance matrices Ωj or Ωcj.

We next turn to overall model evaluation. Let us denote TMLj = nFML(θ̂j). For a sample from a normally distributed population in conventional SEM, and under the null hypothesis, TML asymptotically follows a chi-square distribution with p + p* − qj degrees of freedom, where qj is the number of parameters in θj. In the above two-stage ML approach for the mixture model, TMLj does not follow χp+p*qj2 even asymptotically. We will introduce four statistics that have been shown to perform well when (5) holds. They are proposed to test the structural model βj(θj) here because the parallel nature of (4) and (5). Let

TRADFj=ne^j{Γ^jj1Γ^jj1β˙j(θ^j)[β˙j(θ^j)Γ^jj1β˙j(θ^j)]1β˙j(θ^j)Γ^jj1}e^j, (10)

be the residual based ADF statistic (Browne, 1984), where êj = β̂jβj(θ̂j). The first proposed statistic is the corrected residual-based ADF statistic (Yuan & Bentler, 1998)

TCRADFj=TRADFj1+TRADFj/n. (11)

Notice that, according to (4), both TRADFj and TCRADFj asymptotically follow χp+p*qj2. But TCRADFj has been shown to possess better finite sample properties. The second proposed statistic is the residual-based F-statistic (Yuan & Bentler, 1998)

FRj=[n(p+p*qj)]TRADFj(n1)(p+p*qj), (12)

which is also asymptotically distribution free when referred to the F-distribution with degrees of freedom p + p* − qj and n − (p + p* − qj).

Let

Uj=WjWjβ˙j(β˙jWjβ˙j)1β˙jWj.

The third one is the rescaled statistic (Satorra & Bentler, 1994)

TRMLj=(p+p*qj)tr(U^jΓ^jj)TMLj. (13)

The statistic TRMLj is not asymptotically distribution free; it approaches a distribution with mean equal to p + p* − qj.

Let

a^j=tr[(U^jΓ^jj)2]/tr(U^jΓ^jj),  and  b^j=[tr(U^jΓ^jj)]2/tr[(U^jΓ^jj)2].

The fourth proposed statistic is the adjusted statistic

TMLaj=TMLj/a^j, (14)

using TMLaj~χb^j2 for inference. Like TRMLj, TMLaj does not follow χbj2 in general, where bj is the population counterpart of j . The first and second moments of TMLaj asymptotically equal those of χbj2. Note that the degrees of freedom in χb^j2 is estimated rather than fixed. Recent simulation results show that χb^j2 better describes the behavior of TMLaj than χp+p*qj2 for TRMLj when Uj and Γjj are controlled (Yuan & Bentler, 2009).

For each of the proposed statistics, we have only given its explicit formulation with mean and covariance structure analysis. Parallel asymptotic distribution free statistics in just covariance structure analysis will be obtained when β̂j, βj(θj), θ̂j, Γ̂jj, p+p* and qj in (10) to (12) are replaced by σ̂j, σj(θcj), θ̂cj, Γ̂cjj, p* and qcj, respectively. Similarly, rescaled and adjusted statistics are obtained when TMLj, Wj, β̇j and Γjj in the formulation of (13) and (14) are replaced by TMLcj = nFMLc(θ̂cj), Wcj, σ̇j and Γcjj, respectively. The degrees of freedom for the reference chi-square distribution in covariance structure analysis are p* − qcj, and for the reference F-distribution are p* − qcj and n − (p* − qcj).

In conventional SEM, TCRADFj, FRj, TRMLj and TMLaj have been shown to perform quite well (see Bentler & Yuan, 1999; Fouladi, 2000; Yuan & Bentler, 1998). We expect that they will perform equally well in the context of mixture normal SEM due to the parallelism of (4) and (5). Because fitting βj(θj) to β̂j is just conventional SEM, fit indices such as CFI, RMSEA, etc. can be computed in the usual way (see e.g., Hu & Bentler, 1999).

All the four proposed statistics are available in the current version of EQS (Bentler, in press). For the adjusted statistic TMLaj, EQS prints out the integer part of the degrees of freedom j and uses it to compute the p-value. TRMLj and TMLaj are available in Mplus (Muthén & Muthén, 2007). The statistic TRMLj also exists in LISREL (Jöreskog et al., 2000, Ch. 4).

3. Illustrations

In this section we use three examples to illustrate the applications of the proposed two-stage ML when the structural model and/or distribution is misspecified. We will use

BIC=2l(β^)+qlog(n)

to determine the proper number of components at stage-1, where q is the number of parameters and n is the sample size. Although we recommend that our methodology primarily be used in a confirmatory context, to recreate the typical practice of empirical model modification that is often used when SEM is applied in a partially exploratory way (Jöreskog, 1993), we also show that the proposed procedures are capable of recovering the generating model in the more difficult context of mixture modeling.

Example 1

Holzinger and Swineford (1939) contains test scores of n = 145 students on the following subtests or variables: Visual Perception, Cubes, Lozenges, Paragraph Comprehension, Sentence Completion, Word Meaning, Addition, Counting Dots, Straight-Curved Capitals. The first three variables were designed to measure “spatial ability”, the next three variables were designed to measure “verbal ability”, and the last three variables were administered with a limited time and were designed to measure a “speed” factor in performing the tasks. Thus, a three-factor model will reflect the original design well. One may also assume that all the nine variables measure “general intelligence” and use a one-factor normal mixture model to fit the sample. Of course, we can also fit the sample by a normal mixture with saturated means and covariances. Parallel to the set-up of Yung (1997), who used normal mixture factor model to fit a different set of variables from the same population, at m = 2 we set the factor loadings and factor covariances for both the one- and three-factor models equal across the components while the intercepts and error variances are free to vary. Table 1 contains the log likelihood and the BIC for the model with saturated means and covariances, the three-factor model, and the one-factor model at m = 1 and 2, respectively.

Table 1.

Fitting statistics at stage-1 for Example 1 (Holzinger and Swineford, 1939), with m = 1 and 2 components.

(a) saturated model

m l(β̂) BIC π̂
1 −4522.561 9313.866 π̂1 = 1.000
2 −4461.783 9466.030 π̂1 = .816, π̂2 = .184

(b) three-factor model

m l(θ̂) BIC π̂
1 −4548.332 9245.967 π̂1 = 1.000
2 −4507.782 9259.424 π̂1 = 0.483, π̂2 = 0.517

(c) one-factor model

m l(θ̂) BIC π̂
1 −4613.768 9361.908 π̂1 = 1.000
2 −4558.568 9346.066 π̂1 = 0.673, π̂2 = 0.327

(d) model modification with two-component one-factor model

model l(θ̂) BIC

M1 −4558.568 9346.066
M2(ψ45(1),ψ78(2))
−4539.393 9317.669
M3(ψ78(1),ψ89(2))
−4523.860 9296.556
M4(ψ56(1),λ6(1)λ6(2))
−4515.359 9289.509

Each log likelihood at m = 2 is obtained using the proper maximum with 50 starting values. For the saturated model, BIC corresponding to one normal component is smaller. BIC also suggests one component if we use the three-factor model. However, it chooses two components if we fit the data using the one-factor model. Obviously, model specification and the number of components are closely related in mixture modeling (Bauer & Curran, 2004; Biernacki, Celeux & Govaert, 2000). Actually, we do not know the true model for this sample; the three-factor model at m = 1 does not fit the sample well, none of the p-values corresponding to the four proposed statistics is above .01. The LM test or model modification index suggests that allowing the 9th variable to load on the 1st factor will greatly improve the model-fit. This parameter is also supported by substantive meaning of the variables (see Sörbom, 1989). After adding this parameter, all the p-values corresponding to the four proposed statistics are above .20, which give us the needed confidence that the model fits the data well.

In the context of mixture model by single-stage ML, existing procedures do not allow us to judge the fit of the model as with the conventional single component SEM model. We will further illustrate this with the two-component one-factor model using the software Mplus (Muthén & Muthén, 2007). The estimation method is specified by “ESTIMATOR=MLR;”. Fifty starting values were used, and each run included up to 30 iterations in the initial maximization stage. Five starting values corresponding to 5 of the maximum log likelihoods from the initial 50 entered into the final stage maximization. The model modification index is specified by “STANDARDIZED MODINDICES (3.84);”. Let M1 denote the two-component 1-factor model with equal factor loadings across the components. The output contains l(θ̂) = −4558.568 and BIC = 9346.066 together with AIC and sample size adjusted BIC. The model modification index suggested significance of 3 across-component constraints of factor loadings, 13 correlated errors for the first component and 5 correlated errors for the second component. The parameter corresponds to most significant reduction of the likelihood function is ψ45 in component 1 and ψ78 in component 2. Let M2 be the model after adding these two parameters, model modification index continues to suggest extra parameters that will significantly improve model-fit. The l(θ̂) and BIC correspond to M2 are reported in Table 1(d), where the parameters in the parentheses are the extra parameters for the model over the previous one. Because BIC for M2 still does not tell us how good the model-fit is, we proceed with model modification until model M4. Further model modification after M4 leads to a model identification problem. We may stop at M4 because it corresponds to the smallest BIC. But we still do not know whether model M4 fits the data statistically. As discussed in the introduction, such a problem is the nature of mixture modeling where the likelihood ratio statistic does not asymptotically follow a known distribution, not due to a limitation of Mplus software.

We may notice that in Table 1(a) to (c), the two BICs for the saturated model at m = 1 and 2 differ most. This may imply that BIC with a correctly specified model is more effective in distinguishing the number of components than that with misspecified models.

We also like to note that this sample is not normally distributed. Its multivariate skewness is 283.54437, having a p-value of 2.672 × 10−8 when referred to χ1652. Its standardized multivariate kurtosis is 3.037, having a p-value of 0.001 when referred to N(0, 1) (Mardia, 1970). Thus, at n = 145, moderate nonnormality of the sample does not make BIC choose a 2-component model. We will study the effect of sample size and distribution violations on BIC in section 4.

For the sample in this example, few researchers would fit it using a one-factor model because, since Jöreskog (1969), the data set has been studied by many researchers using various methods. The example is just to illustrate a typical case of application of mixture modeling when the structure of the population is not so well understood and the number of components is not clear. When both of these are clear, what left in applying a mixture model is just a standard ML procedure for generating parameter estimates and their SEs. Then there is no need for using the proposed two-stage ML.

Example 2

This example contrasts the single-stage and two-stage ML methods when model is misspecified. A sample2 of size n = 400 is generated from the population

f(x)=π1N9(μ1,Σ1)+π2N9(μ2,Σ2),

where π1 = 0.5, π2 = 0.5; μ1 = 0, μ2 = (3, 3, …, 3)′,

Σ1=Λ1Φ1Λ1+Ψ1  and  Σ2=Λ2Φ2Λ2+Ψ2,

with

Λ1=(1.01.01.00001.0001.0001.01.01.00000001.0001.01.01.0),Φ1=(1.00.50.50.51.00.50.50.51.0),

Ψ1 = I9;

Λ2=(1.01.01.00000000001.01.01.00000000001.01.01.0),Φ2=(1.00.50.50.51.00.50.50.51.0),

all the diagonal elements of Ψ2 are 1.0, nonzero off-diagonal elements of Ψ2 are ψ15 = ψ51 = 0.8, ψ68 = ψ86 = 0.8. It is easy to see that the nine marginals in each component have the same variance of 2.0, and the marginal means of the two components are 3/22.121 standard deviation apart.

We fit the sample by m = 1-, 2- and 3-component normal mixtures with a saturated mean vector for each component; the variance-covariance matrix for each component is a three-factor model,

Σj(θcj)=ΛjΦjΛj+Ψj,j=1,2,3,

where each Λj is a 9 × 3 matrix such that each factor is measured by three unidimensional indicators; each Φj is a free 3 × 3 matrix; and each Ψj is a diagonal matrix so that the errors in each component are specified as uncorrelated. The first factor loading for each factor (λ11, λ42 and λ93) is fixed at 1.0 for model identification. Thus, the factor loading matrix for the first component is misspecified and the error variance-covariance matrix for the second component is misspecified. Such misspecified structural models better reflect the practice of normal mixture SEM, where it is unlikely to have a model that perfectly follows the unknown population3. With 50 random starting values for the two- and three-component normal mixture models using the EM algorithm, the obtained values of the log likelihood and BIC are given in Table 2(a). Although BIC still chooses two normal components with a misspecified structural model, the estimates π̂1 and π̂2 contain substantial biases. If we stop at this three-factor model with unidimensional indicators and start to elaborate on the parameter estimates, then we are short of the truth. Even when the model for each component is correctly specified, BIC or other model selection criteria with single-stage ML still does not allow us to endorse the model, as has been illustrated. The two-stage ML allows us to evaluate the goodness of model-fit and locate which model causes the problem of lack-of-fit.

Table 2.

Fitting statistics at stage-1 for Example 2 (a simulated sample from a mixture of two multivariate normal distributions), with m = 1, 2 and 3 components.

(a) structural model

m l(θ̂) BIC π̂
1 −6299.607 12778.958 π̂1 = 1.000
2 −6201.834 12769.148 π̂1 = .356, π̂2 = .644
3 −6137.616 12826.447 π̂1 = .268, π̂2 = .475, π̂3 = .257

(b) saturated model

m l(β̂) BIC π̂
1 −6225.866 12775.272 π̂1 = 1.000
2 −6045.523 12744.116 π̂1 = .517, π̂2 = .483
3 −5966.523 12915.646 π̂1 = .479, π̂2 = .471, π̂3 = .053

Table 2(b) contains the model fitting statistics for m = 1, 2 and 3 components with saturated means and covariances; those for m = 2 and 3 are obtained with 50 random starting values. As expected, BIC also chooses m = 2 as the proper number of components and the estimates π̂1, π̂2 are now more appropriate. Let Σ̂1 and Σ̂2 be estimates of the two saturated covariance matrices; Γ̂cin11/n and Γ̂csw11/n be estimates of the asymptotic covariance matrices of σ̂1 = vech(Σ̂1) using Γ̂in and Γ̂sw, respectively; and Γ̂cin22/n and Γ̂csw22/n be estimates of the asymptotic covariance matrices of σ̂2 = vech(Σ̂2) using Γ̂in and Γ̂sw, respectively. We next fit the two covariance matrices using EQS, starting with the unidimensional three-factor model and default LM test.

Appendix C contains the EQS syntax of fitting Σ̂1 and using Γ̂cin11 for consistent SEs and proposed statistics, where the asymptotic covariance matrix Γ̂cin11 is within the file “Gamma_cin11.dat”. Because the population distribution is truly a mixture normal, we expect Γ̂cin11 to provide a better estimate of Γc11 than Γ̂csw11. Notice that the sample size is set at n = 400 according to the result in (4). Table 3(a) contains the proposed statistics for overall model evaluation, these are part of the output of the EQS program. We also list the likelihood ratio statistic TML1 in the table to show that it does not work well in the second-stage of the two-stage ML. The results under the LM test in the last column of Table 3(a) are also from the EQS output, only one parameter corresponding to the most significant LM statistic is reported4. Model M1 is the three-factor model with unidimensional indicators; M2 to M5 are the modified models by adding the parameter suggested by the LM test based on running the previous model. Below each of the six statistics is the p-value associated with the proposed distribution for the statistic5; those below .001 are not reported. According to these statistics, model M1 does not fit Σ̂1 well. The default LM test in EQS suggests that allowing the first variable to load on the second factor would reduce the chi-square value (TML1) by approximately 68.508. Although M2 fits Σ̂1 better than M1, all the statistics are still highly significant. The first model with a good fit is M4, which is identical to the population structure that generated the data. At M4, the LM test continues to suggest adding parameter λ61, adding this parameter leads to model M5 with an even better fit, although both the absolute value of the estimate λ̂61 = 0.224 and the corresponding z-score (= 2.410) are the smallest among all the parameter estimates. Then the LM test further suggests adding λ82; following the suggestion leads to an estimate λ̂82 = −0.353 and a corresponding z = −1.1866. So we may regard model M4 or M5 as our final model.

Table 3.

Table 3(a). Test statistics for overall model evaluation of the first component at stage-2 for Example 2 (a simulated sample from a mixture of two multivariate normal distributions), Γ̂c11 = Γ̂cin11, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 205.164 90.085 73.833(20) 52.780 2.389 (24,376) 68.508 (λ12)
p 0.001
M2 151.213 60.081 57.463 (20) 44.150 2.040 (23,377) 65.341 (λ43)
p 0.005 0.004
M3 93.362 40.857 37.366 (20) 30.929 1.444 (22,378) 47.390 (λ71)
p 0.009 0.011 0.098 0.090
M4 42.347 18.590 18.090 (20) 17.193 0.813 (21,379) 7.714 (λ61)
p 0.004 0.611 0.581 0.699 0.704 0.005
M5 33.980 14.930 14.566 (20) 13.507 0.666 (20,380) 4.657 (λ82)
p 0.026 0.780 0.801 0.855 0.860 0.031
Table 3(b). Test statistics for overall model evaluation of the first component at stage-2 for Example 2 (a simulated sample from a mixture of two multivariate normal distributions), Γ̂c11 = Γ̂csw11, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 205.164 87.740 61.698 (17) 62.109 2.890 (24,376) 68.508 (λ12)
p
M2 151.213 63.430 49.945 (17) 48.194 2.253 (23,377) 65.341 (λ43)
p 0.002 0.001
M3 93.362 38.465 29.921 (17) 36.224 1.716 (22,378) 47.390 (λ71)
p 0.016 0.027 0.029 0.024
M4 42.347 17.321 14.175 (17) 16.979 0.802 (21,379) 7.714 (λ61)
p 0.004 0.691 0.655 0.712 0.718 0.005
M5 33.980 13.689 11.203 (16) 14.442 0.714 (20,380) 4.657 (λ82)
p 0.026 0.846 0.797 0.807 0.813 0.031
Table 3(c). Structural parameter estimates by the single-stage and two-stage ML for the first component of Example 2 (a simulated sample from a mixture of two multivariate normal distributions), the models M2 to M5 under the two-stage ML follow from the results of the univariate LM test.

single-stage ML 2-stage ML

θc1 M1 M1 M2 M3 M4 M5
λ21 0.489 0.506 1.342 0.986 0.974 0.824
λ31 0.410 0.470 1.274 0.929 0.946 0.804
λ52 0.811 0.463 0.495 1.130 0.998 1.057
λ62 0.617 0.458 0.491 1.107 0.962 0.765
λ83 0.356 0.486 0.469 0.508 1.130 1.135
λ93 0.317 0.446 0.426 0.458 1.039 1.040
ϕ11 2.471 3.224 0.587 1.072 1.033 1.413
ϕ21 1.926 2.557 0.776 0.379 0.479 0.447
ϕ31 1.844 2.600 1.064 1.405 0.510 0.618
ϕ22 2.017 2.775 2.657 0.639 0.805 0.806
ϕ32 1.225 2.367 2.217 0.778 0.364 0.306
ϕ33 2.240 3.042 3.161 2.982 0.753 0.738
ψ11 1.021 1.030 1.195 0.973 0.916 0.932
ψ22 0.904 1.070 0.840 0.854 0.917 0.937
ψ33 1.179 1.176 0.935 0.962 0.961 0.973
ψ44 1.229 1.166 1.284 1.383 1.051 0.945
ψ55 0.755 1.177 1.120 0.955 0.969 0.872
ψ66 0.956 1.026 0.967 0.824 0.863 0.911
ψ77 0.874 .917 0.798 0.977 1.144 1.133
ψ88 1.436 1.256 1.278 1.205 1.012 1.022
ψ99 1.586 1.485 1.518 1.465 1.277 1.292
λ12 0.716 1.359 1.199 1.081
λ43 0.583 1.249 1.358
λ71 1.003 0.854
λ61 0.224

Parameter estimates by the single-stage ML for M1 and the two-stage ML for M1 to M5 are reported in Table 3(c). Due to model misspecification, the three estimates of factor variances under the single-stage ML have biases over 1.0; four of the six factor loading estimates have biases over .50. The estimates by the two-stage ML under M1 have even more biases, mainly because a smaller model has less freedom to buffer the discrepancy between the data and the model. The biases in parameter estimates by the two-stage ML become smaller as the model moves from M1 to M4. Model M5 is still correctly specified although it is over-parameterized. Parameter biases under M5 are still a lot smaller than those under M1 by both single-stage and two-stage ML.

Parallel to those in Table 3(a), Table 3(b) contains the statistics for overall model evaluation when Γ̂csw11 is used to obtain the proposed statistics. Because both Γ̂cin11 and Γ̂csw11 are consistent, the results in Table 3(a) and (b) are very comparable, though there exist some differences due to more sampling errors in Γ̂csw11. In particular, the proposed statistics do not support the structural model until M4. The LM test continues to suggest M5 although M4 is the true model. Notice that the parameter estimates only depend on Σ̂1, using Γ̂cin11 or Γ̂csw11 generates the same set of θ̂c1, as reported in Table 3(c). The SEs of θ̂c1 (not reported to save space) associated with Γ̂cin11 and Γ̂csw11 have only a small difference. For example, the z-score (= 2.541) for λ̂61 = 0.224 in M5 is the smallest among all the parameter estimates; the z-score (= −1.166) for λ̂82 = −0.353 following the modification of M5 is not statistically significant at the .05 level.

Table 4(a) contains the results of fitting Σ̂2 using Γ̂cin22 to obtain the proposed statistics, starting from the three-factor model with unidimensional indicators. As expected, model M1 does not fit Σ̂2 well, none of the statistics has a p-value close to .001. Following the default LM test in EQS, we fit the model M2 (adding λ63), M3 (further adding λ12) and M4 (further adding λ82) in sequence. None of the proposed statistics endorses any of the 4 models. There exists a convergence problem when further including the parameter λ51 as evaluated by EQS, due to parameters being linearly dependent. Notice that the default LM test in EQS only identifies needed factor loadings, other needed parameters may have to be requested specifically. Starting from model M1, we can tell the LM test to search for correlated errors by adding “SET=PEE;” immediately below the default LM test. The sequence of models identified by the new LM test are M5 (adding ψ86), M6 (further adding ψ51), and M7 (further adding ψ98). It is clear from the lower portion of Table 4(a) that model M5 is not statistically supported by any of the test statistics. Model M6 is endorsed by all the statistics and is also the true model. When evaluating the over-parameterized model M7, both the estimate ψ̂98 (= 0.297) and its z-score (= 2.638) are the smallest among all the estimates. At M7, the LM test continues to suggest adding ψ95. When evaluated, the z-score corresponding to ψ̂95 equals 1.721, not statistically significant at the .05 level, so we stop the model modification process.

Table 4.

Table 4(a). Test statistics for overall model evaluation of the second component at stage-2 for Example 2 (a simulated sample from a mixture of two multivariate normal distributions), Γ̂c22 = Γ̂cin22, p-values below 0.001 are not reported.

TML2 TRML2 TMLa2 TCRADF2 FR2 LM
M1 529.987 244.070 178.544 (18) 99.290 5.194 (24,376) 50.431 (λ63)
p
M2 409.758 185.175 127.495 (16) 83.885 4.366 (23,377) 102.959 (λ12)
p
M3 300.651 137.581 110.341(18) 73.889 3.907 (22,378) 54.816 (λ82)
p
M4 248.304 115.962 95.531 (17) 68.655 3.753 (21, 379) 48.848 (λ51)
p

M1 529.987 244.070 178.544 (18) 99.290 5.194 (24,376) 199.101 (ψ86)
p
M5 299.251 138.697 114.132 (19) 62.400 3.040 (23,377) 215.712 (ψ51)
p
M6 44.879 21.004 20.226 (21) 18.200 0.821 (22,378) 12.900 (ψ98)
p 0.003 0.520 0.507 0.694 0.699
M7 31.712 14.903 14.491 (20) 14.578 0.684 (21,379) 6.677 (ψ95)
p 0.063 0.828 0.805 0.844 0.849 0.010
Table 4(b). Test statistics for overall model evaluation of the second component at stage-2 for Example 2 (a simulated sample from a mixture of two multivariate normal distributions), Γ̂c22 = Γ̂csw22, p-values below 0.001 are not reported.

TML2 TRML2 TMLa2 TCRADF2 FR2 LM
M1 529.987 217.602 139.175 (15) 105.997 5.673 (24,376) 50.431 (λ63)
p
M2 409.758 165.943 86.979 (12) 99.612 5.458 (23,377) 102.959 (λ12)
p
M3 300.651 121.131 76.612 (14) 84.312 4.606 (22,378) 54.816 (λ82)
p
M4 248.304 101.004 67.355 (14) 74.164 4.123 (21,379) 48.848 (λ51)
p

M1 529.987 217.602 139.175 (15) 105.997 5.673 (24,376) 199.101 (ψ86)
p
M5 299.251 123.979 85.974 (16) 68.418 3.394 (23, 377) 215.712 (ψ51)
p
M6 44.879 18.498 14.152 (17) 19.557 0.886 (22, 378) 12.900 (ψ98)
p 0.003 0.676 0.656 0.611 0.614
M7 31.712 13.586 11.034 (17) 16.570 0.782 (21,379) 6.677 (ψ95)
p 0.063 0.887 0.855 0.737 0.742 0.010
Table 4(c). Structural parameter estimates by the single-stage and two-stage ML for the second component of Example 2 (a simulated sample from a mixture of two multivariate normal distributions), the models M2 to M7 under the two-stage ML follow from the results of the univariate LM test.

single-stage ML 2-stage ML

θc2 M1 M1 M2 M3 M4 M5 M6 M7
λ21 1.119 1.075 0.548 1.393 1.355 1.013 1.147 1.163
λ31 1.122 1.074 0.545 1.472 1.427 0.996 1.162 1.176
λ52 1.701 1.255 2.989 4.310 5.089 1.375 1.274 1.273
λ62 1.511 1.137 0.731 .714 0.281 0.985 1.062 1.041
λ83 1.253 0.996 2.018 1.715 1.830 0.865 0.892 0.668
λ93 1.323 1.001 1.075 1.023 1.094 1.090 1.106 0.822
ϕ11 1.335 1.142 2.205 0.682 0.726 1.243 1.049 1.022
ϕ21 0.737 0.594 0.538 0.096 0.073 0.657 0.507 0.491
ϕ31 0.747 0.587 0.246 0.244 0.274 0.647 0.553 0.699
ϕ22 0.711 0.672 0.304 0.193 0.146 0.743 0.767 0.757
ϕ32 0.756 0.615 0.066 0.062 0.123 0.501 0.504 0.602
ϕ33 1.024 1.078 0.493 0.608 0.615 1.070 1.075 1.409
ψ11 1.400 1.228 0.165 1.007 1.014 1.127 1.359 1.373
ψ22 1.146 1.086 1.741 1.079 1.073 1.129 1.025 1.022
ψ33 1.124 1.198 1.859 1.036 1.038 1.281 1.099 1.102
ψ44 1.437 1.288 1.656 1.767 1.813 1.217 1.193 1.202
ψ55 1.313 1.226 −0.432 −1.298 −1.500 0.879 1.120 1.134
ψ66 0.690 0.693 0.676 0.660 0.642 0.901 0.782 .784
ψ77 1.168 1.157 1.741 1.627 1.619 1.164 1.160 .826
ψ88 0.787 0.870 −0.071 0.151 0.180 1.085 1.071 1.282
ψ99 1.140 1.122 1.631 1.565 1.466 0.931 0.888 1.250
λ63 1.116 1.079 1.160
λ12 1.448 1.638
λ82 −0.980
ψ86 0.801 0.783 0.795
ψ51 1.022 1.034
ψ98 0.297

The parameter estimates for θc2 by single-stage ML for model M1 and two-stage ML for M1 to M7 are reported in Table 4(c). As expected, those under M6 have the least overall bias. Models M2 to M4 all have negative error variance estimates, suggesting possible model misspecification although none of them is statistically significant at level .05. Compared to Table 3(c), we may find that omitting factor loadings leads to more biases in parameter estimates than omitting error covariances.

When Γ̂csw22 is used to obtain the proposed statistics, the results are reported in Table 4(b). Similar to those for modeling the first component, results in Tables 4(a) and 4(b) are very comparable. In particular, none of the statistics supports models M1 to M5. The estimate ψ̂98 has the smallest z-score of 2.150 in model M7. Further adding ψ95 leads to a none significant z-score of 1.723.

This example shows that all the knowledge accumulated in conventional SEM can be used to analyze the model structure for an individual component in stage-2 of the two-stage ML. In particular, the statistics that have been shown to work well in conventional SEM models allow us to judge whether a covariance matrix (mean vector as well) for an individual component is adequately explained. In the context of mixture analysis, when we have little knowledge about the population structure, these statistics offer us more guidance than is possible with single-stage ML.

Example 3

This example aims to see the robustness of two-stage ML to moderate distributional violations together with model misspecifications. Single-stage ML will not be studied in this example because it does not allow us to judge the goodness-of-fit in each component. Instead of sampling from a mixture normal distribution, a sample7 of size n = 400 is generated from the population

f(x)=π1Mt9(μ1,Σ1;8)+π2Mt9(μ2,Σ2;8),

where π1 = 0.5, π2 = 0.5, Mt9(μ1, Σ1; 8) and Mt9(μ2, Σ2; 8) represent two 9-variate t-distributions, each with 8 degrees of freedom; the population means μ1 and μ2, and covariance matrices Σ1 and Σ2 are identical to those in Example 2.

As in Example 2, we first fit the sample by saturated models at stage-1, using BIC to choose the number of components. With 50 random starting values for m = 2 and 3 components, the log likelihood l(β̂) and BIC are reported in Table 5. Obviously, using BIC leads to the correct number of components. The two estimates π̂1 and π̂2 are also reasonable. The two estimated covariance matrices, Σ̂1 and Σ̂2 and the asymptotic covariance matrices Γ̂cin11, Γ̂csw11, Γ̂cin22, Γ̂csw22 will be used for the second stage analysis below. Note that Γ̂cin11 is not consistent for Γc11 nor is Γ̂cin22 for Γc22. We include them in the study to see the difference between using a consistent estimator and an inconsistent estimator.

Table 5.

Fitting statistics at stage-1 for Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), saturated model with m = 1, 2 and 3 components.

m l(β̂) BIC π̂
1 −6674.715 13672.969 π̂1 = 1.000
2 −6501.935 13656.939 π̂1 = .524, π̂2 = .476
3 −6383.058 13748.716 π̂1 = .174, π̂2 = .394, π̂3 = .432

Parallel to Table 3(a) for Example 2, the results of fitting Σ̂1 starting with the three-factor unidimensional-indicator model (M1) and using Cov^(σ^1)=Γ^cin11/n are reported in Table 6(a). The model M4 fits the sample adequately by all standards, which is expected because it is the true model. At M4, the LM test continues to suggest adding the parameter λ82, but a further modified model leads to a z-score of −1.261 for the estimate λ̂82. So we stop at model M4. The parameter estimates θ̂c1 for models M1 to M4 are reported in Table 6(c). Most of the parameter estimates under M4 are very close to their population values, but two error variance estimates have biases close to 1.0. This is because the multivariate t-distribution has heavier tails than that of the corresponding normal distribution. The estimates Σ̂1 and Σ̂2 are not efficient due to the heavier tails. Any discrepancy between Σ̂1 and the population Σ1 will be inherited by the structural parameter estimates. Estimates by single-stage ML will suffer from the same kind of lack of efficiency.

Table 6.

Table 6(a). Test statistics for overall model evaluation of the first component at stage-2 for Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), Γ̂c11 = Γ̂cin11, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 139.734 67.511 58.317 (21) 45.527 2.018 (24,376) 58.300 (λ12)
p 0.005 0.003
M2 93.548 45.104 41.191 (21) 37.042 1.678 (23,377) 35.479 (λ43)
p 0.004 0.005 0.032 0.027
M3 60.282 29.112 27.613 (21) 27.498 1.272 (22,378) 8.840 (λ71)
p 0.142 0.151 0.193 0.186
M4 47.721 23.018 22.097 (20) 21.615 1.034 (21,379) 9.386 (λ82)
p 0.001 0.343 0.335 0.422 0.421 0.002
Table 6(b). Test statistics for overall model evaluation of the first component at stage-2 for Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), Γ̂c11 = Γ̂cin11, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 139.734 49.741 28.799 (14) 44.990 1.992 (24,376) 58.300 (λ12)
p 0.002 0.011 0.006 0.004
M2 93.548 33.173 20.644 (14) 36.777 1.665 (23,377) 35.479 (λ43)
p 0.078 0.111 0.034 0.029
M3 60.282 20.419 12.772 (14) 25.893 1.193 (22,378) 8.840 (λ71)
p 0.557 0.545 0.256 0.250
M4 47.721 15.649 9.608 (13) 17.249 0.816 (21,379) 9.386 (λ82)
p 0.001 0.789 0.726 0.696 0.701 0.002
Table 6(c). Structural parameter estimates by the two-stage ML for the first component of Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), the models M2 to M4 follow from the results of the univariate LM test.

θc1 M1 M2 M3 M4
λ21 0.517 1.426 1.039 1.050
λ31 0.515 1.436 1.059 1.064
λ52 0.498 0.520 1.181 1.068
λ62 0.455 0.470 1.042 0.935
λ83 0.518 0.511 .540 1.108
λ93 0.557 0.547 .571 1.156
ϕ11 4.473 0.729 1.351 1.304
ϕ21 3.573 1.122 0.568 0.673
ϕ31 3.233 1.259 1.687 0.756
ϕ22 3.644 3.600 0.863 1.029
ϕ32 2.930 2.811 3.083 0.553
ϕ33 3.180 3.231 1.066 0.860
ψ11 1.332 1.578 1.344 1.281
ψ22 1.387 1.101 1.124 1.147
ψ33 1.487 1.172 1.159 1.199
ψ44 1.353 1.397 1.580 1.347
ψ55 1.815 1.745 1.513 1.543
ψ66 1.344 1.303 1.160 1.197
ψ77 1.713 1.662 1.810 2.006
ψ88 1.906 1.915 1.860 1.703
ψ99 2.204 2.222 2.185 2.039
λ12 0.722 1.351 1.232
λ43 0.628 1.217
λ71 0.795

Table 6(b) contains the results of fitting the same set of models as in Table 6(a) but the matrix Γ̂csw11 is used to obtain the proposed statistics. The results in Table 6(b) are comparable to those in Table 6(a) except all the models are judged more favorably by the four proposed statistics. This is because the sandwich-type covariance matrix takes the heavy tails in the underlying distribution into account, which means that > Â in the sense of positive definiteness. Thus, part of the discrepancy between Σ̂1 and the fitted model is accounted for by the heavy tails. The rest is due to the systematic difference between the data “Σ̂1” and the model Σ1(θc1). Although > Â and model M3 fits the data pretty well, the z-score corresponding to λ̂71 = 0.795 is 2.793. So the parameter λ71 is still needed statistically. Due to greater SEs using the sandwich-type covariance matrices, the z-score associated with λ̂82 is −1.131, smaller than that based on Γ̂cin11. Thus, λ82 is not needed statistically. The parameter estimates corresponding to Table 6(b) are identical to those for Table 6(a), as reported in Table 6(c).

Both Γ̂cin11 and Γ̂csw11 thus allow us to evaluate the model for component 1, and using Γ̂csw11 gives us more confidence in the results due to larger p-values associated with the four proposed statistics at model M4 and a smaller z-score for the unnecessary parameter λ82.

Turning to the second component, the results of fitting Σ̂2 using Cov^(σ^2)=Γ^cin22/n are reported in Table 7(a). The program has a convergence problem due to linearly related parameters when further adding λ63 after M4. The LM test for correlated errors, starting from M1, suggests a sequence of new models (M5 to M8). The model M6 (containing the covariance parameters λ51 and λ86) are evaluated favorably by all the proposed statistics, which is expected because it is the true model. Continuing the model modification after M8 leads to a z-score of 1.774 for ψ̂98 = 0.318. Both ψ̂42 in M7 and ψ̂95 in M8 are statistically significant at .05 level, although the estimates and their associated z-scores are the smallest among all the parameter estimates.

Table 7.

Table 7(a). Test statistics for overall model evaluation of the second component at stage-2 for Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), Γ̂c22 = Γ̂cin22, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 590.086 237.789 151.992 (15) 94.356 4.856 (24,376) 118.992 (λ51)
p
M2 416.523 167.754 114.830 (16) 78.919 4.044 (23,377) 92.486 (λ82)
p
M3 311.259 126.846 96.711 (17) 65.531 3.378 (22,378) 36.537 (λ12)
p
M4 277.596 116.447 92.005 (17) 65.258 3.531 (22,378) 32.904 (λ63)
p

M1 590.086 237.789 151.992 (15) 94.356 4.856 (24,376) 328.927 (ψ51)
p
M5 324.094 132.261 106.502 (19) 61.017 2.961 (23,377) 220.416 (ψ86)
p
M6 67.981 28.710 27.492 (21) 25.061 1.152 (22,378) 18.708 (ψ42)
p 0.153 0.155 0.294 0.289
M7 49.247 20.737 20.082 (20) 18.254 0.865 (21,379) 13.577 (ψ95)
p 0.475 0.453 0.633 0.637
M8 35.747 15.109 14.747 (20) 13.799 0.681 (20,3380) 6.610 (ψ98)
p 0.016 0.770 0.791 0.841 0.846 0.010
Table 7(b). Test statistics for overall model evaluation of the second component at stage-2 for Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), Γ̂c22 = Γ̂csw22, p-values below 0.001 are not reported.

TML1 TRML1 TMLa1 TCRADF1 FR1 LM
M1 590.086 174.560 93.277 (13) 84.495 4.212 (24,376) 118.992 (λ51)
p
M2 416.523 122.576 67.067 (13) 75.632 3.836 (23,377) 92.486 (λ82)
p
M3 311.259 94.870 56.546 (13) 65.884 3.400 (22,378) 36.537 (λ12)
p
M4 277.596 85.447 51.877 (13) 67.752 3.693 (21,379) 32.904 (λ63)
p

M1 590.086 174.560 93.277 (13) 84.495 4.212 (24,376) 328.927 (ψ51)
p
M5 324.094 92.484 52.516 (13) 56.300 2.694 (23,377) 220.416 (ψ86)
p
M6 67.981 20.135 12.292 (13) 25.247 1.161 (22,378) 18.708 (ψ42)
p 0.575 0.504 0.285 0.280
M7 49.247 14.806 9.954 (14) 17.095 0.808 (21,379) 13.577 (ψ95)
p 0.833 0.766 0.705 0.710
Table 7(c). Structural parameter estimates by the two-stage ML for the second component of Example 3 (a simulated sample from a mixture of two multivariate t-distributions, each with 8 degrees of freedom), the models M2 to M8 follow from the results of the univariate LM test.

θc2 M1 M2 M3 M4 M5 M6 M7 M8
λ21 0.507 .398 0.473 .462 1.136 1.185 1.205 1.223
λ31 0.488 .381 0.451 .483 1.157 1.157 1.246 1.257
λ52 2.480 .847 0.838 .342 1.101 1.117 1.114 1.060
λ62 0.994 2.655 4.426 4.992 1.131 .976 1.056 1.040
λ83 1.056 1.885 0.989 .987 1.166 1.078 1.096 1.048
λ93 1.192 1.191 1.265 1.256 1.122 1.260 1.259 1.178
ϕ11 2.618 3.199 2.780 2.970 1.066 1.070 0.978 .945
ϕ21 0.794 0.176 0.110 0.240 0.549 0.568 0.505 .521
ϕ31 0.742 0.389 0.679 0.789 0.670 0.583 0.560 .607
ϕ22 0.434 0.346 0.186 0.147 0.933 1.081 0.994 1.009
ϕ32 0.317 0.295 0.105 0.087 0.704 0.599 0.577 0.592
ϕ33 1.350 0.753 1.262 1.276 1.303 1.269 1.274 1.350
ψ11 0.251 −.329 0.089 0.227 1.755 1.765 1.785 1.797
ψ22 2.169 2.336 2.221 2.208 1.469 1.341 1.443 1.463
ψ33 2.520 2.679 2.578 2.452 1.718 1.713 1.626 1.652
ψ44 2.229 2.317 2.477 2.517 1.730 1.582 1.674 1.661
ψ55 −0.005 1.179 1.115 1.041 1.601 1.407 1.467 1.561
ψ66 1.393 −.619 −1.832 −1.832 0.628 0.891 0.817 .759
ψ77 1.221 1.817 1.308 1.294 1.267 1.302 1.297 1.221
ψ88 1.025 −.146 0.807 .805 0.758 1.055 1.043 1.010
ψ99 0.915 1.765 0.812 .820 1.193 0.818 0.814 .953
λ51 0.577 0.682 0.708
λ82 1.158 1.317
λ12 −0.972
ψ51 1.400 1.396 1.393 1.400
ψ86 0.850 0.835 0.773
ψ42 0.414 0.416
ψ95 0.212

The parameter estimates for M1 to M8 are reported in Table 7(c). There is a negative estimate of error variance in M1; including λ51 (M2) results in three negative estimates of error variances; ψ̂66 continues to be negative until M4. Although none of the z-scores corresponding to these negative estimates is statistically significant, literature on conventional SEM suggests that negative estimates of error variances are associated with model misspecification. There are no negative error variance estimates in models M5 to M8. The parameter estimates from M5 to M8 are very comparable although M5 is literally a misspecified model. Such a phenomenon was also observed for Example 2, where omitting error covariances causes less biases than omitting relevant factor loadings.

Table 7(b) contains the results of fitting Σ̂2 when Γ̂csw22 is used to obtain the proposed statistics. Similar to those in Table 7(a), all the p-values are below .001 until model M6. Different from those in Table 7(a) is that ψ̂95 = 1.817 is not statistically significant at the .05 level. Actually, the z-score for ψ̂42 = 0.414 in M7 equals 1.970, only marginally significant at the .05 level. If one thinks that an error covariance with such a significance level should not be included in the model, then we end up with M6, which is the model that generates the population. For the second component, both Γ̂cin22 and Γ̂csw22 lead us to the true model. But using Γ̂csw22 allows us to identify the true model easier due to less significant z-scores for unnecessary parameters.

This example shows that, when the component distributions and the structural models are misspecified, two-stage ML with its diagnostic tools still leads us to correct or nearly correct models. It also shows that, when the underlying component distributions have heavier tails than that of the normal distribution, using the sandwich-type covariance matrices to obtain the proposed statistics and SEs will lead to better model evaluation.

4. Efficiency of Parameter Estimates and Robustness of BIC

This section contains two Monte Carlo studies. The first aims to see how much better single-stage ML is over two-stage ML in parameter estimates under the conditions that single-stage ML is designed for. The second study compares the performance of BIC in selecting the correct number of components when distribution conditions are violated. Our purpose is to see whether BIC for the saturated model performs any better than that for a structured model.

4.1 Efficiency of parameter estimates under idealized conditions

Since efficiency of parameter estimates in mixture model is closely related to the separation of components, two normal mixture models are used for the study. Both have two components while they differ in separations.

For the first mixture population, π1 = π2 = 1/2; μ1 = 0, μ2 = (3, 3, …, 3)′; the two covariance matrices are equal, each is specified through a confirmatory factor model with

Σ=ΛΦΛ+Ψ,

where

Λ=(1.01.01.0000000001.01.01.0000000001.01.01.0),Φ=(1.00.50.50.51.00.50.50.51.0), (15)

Ψ = I9. Same as in Examples 2 and 3, the marginal means of the two components are 3/22.121 standard deviation apart. Parameters in the second mixture population are the same as in the first one except μ2 = (1.5, 1.5, …, 1.5)′. Thus, the marginal means of the two components are approximately 1.061 standard deviation apart. The sample size is n = 400; 500 replications are generated from each of the two populations.

The same model is used when fitting the two populations: The means are saturated for both components; the covariance structures for the two components are specified correctly and estimated independently, that is, each nonzero entry of the population matrices Λ, Φ and Ψ is specified as a free parameter in models for both components except λ11, λ42 and λ73, which are fixed at 1.0 for model identification. We apply single-stage ML and two-stage ML to each sample. EM algorithms are used for single-stage ML and the first stage of two-stage ML. Fisher-scoring algorithm is used for each component in the second stage of two-stage ML. Because the true population values of each component are known, they are used as the starting values instead of multiple random starting values in both the EM and Fisher-scoring algorithms. At the convergence of each sample we obtain, for each component, the estimates for the six free factor loadings, six factor variances-covariances, and nine error variances. We also obtain 9 saturated means for each component, but we will only focus on the structural parameters of the covariance matrices.

Let θ̂i be the estimate of parameter θ at the ith replication and θ0 be its population value;

mean=1500i=1500θ^i,bias=1500i=1500θ^iθ0,

and

SE=[1500i=1500(θ^imean)2]1/2.

Table 8(a) and (b) contains the results of fitting the first and second components of the first population, respectively. At the bottom of each table are the average absolute bias, average variance, and average mean square error (MSE) across the q = 21 covariance parameters. These empirical results indicate that, under ideal conditions with component means being about 2 standard deviation apart, single-stage ML does yield more accurate parameter estimates. But the differences in biases, variances, and MSEs between the two procedures are in the third decimal place.

Table 8.

Efficiency and accuracy of parameter estimates, µ1 = (0,…, 0)‘ and µ2 = (3,…, 3)′, three-factor model, unidimensional measurement in the population, the model is correctly specified in number of components, number of population and covariance structures.

(a) Component 1

2-stage ML single-stage ML


θc1 Mean Bias SE Mean Bias SE
λ21 1.014 0.014 0.174 1.009 0.009 0.160
λ31 1.028 0.028 0.170 1.025 0.025 0.161
λ52 1.041 0.041 0.180 1.030 0.030 0.171
λ62 1.033 0.033 0.190 1.022 0.022 0.158
λ83 1.025 0.025 0.179 1.024 0.024 0.169
λ93 1.027 0.027 0.179 1.027 0.027 0.172
ϕ11 1.037 0.037 0.300 1.020 0.020 0.268
ϕ21 0.525 0.025 0.200 0.509 0.009 0.168
ϕ31 0.535 0.035 0.208 0.513 0.013 0.175
ϕ22 1.013 0.013 0.299 1.007 0.007 0.268
ϕ32 0.523 0.023 0.205 0.506 0.006 0.173
ϕ33 1.037 0.037 0.297 1.010 0.010 0.260
ψ11 0.988 −0.012 0.167 0.986 −0.014 0.165
ψ22 0.988 −0.012 0.154 0.991 −0.009 0.151
ψ33 0.979 −0.021 0.152 0.982 −0.018 0.149
ψ44 0.994 −0.006 0.156 0.990 −0.010 0.154
ψ55 0.974 −0.026 0.163 0.977 −0.023 0.156
ψ66 0.984 −0.016 0.168 0.987 −0.013 0.161
ψ77 0.995 −0.005 0.165 0.995 −0.005 0.163
ψ88 0.970 −0.030 0.176 0.973 −0.027 0.169
ψ99 0.976 −0.024 0.157 0.976 −0.024 0.154

Average bias Var MSE bias Var MSE
0.023 0.039 0.040 0.016 0.033 0.033
Table 8(b) Component 2

2-stage ML single-stage ML


θc2 Mean Bias SE Mean Bias SE
λ21 1.003 0.003 0.172 1.002 0.002 0.160
λ31 1.016 0.016 0.185 1.014 0.014 0.170
λ52 1.024 0.024 0.174 1.015 0.015 0.162
λ62 1.023 0.023 0.175 1.018 0.018 0.170
λ83 1.023 0.023 0.178 1.022 0.022 0.166
λ93 1.019 0.019 0.176 1.021 0.021 0.171
ϕ11 1.041 0.041 0.301 1.029 0.029 0.269
ϕ21 0.512 0.012 0.201 0.504 0.004 0.172
ϕ31 0.518 0.018 0.210 0.504 0.004 0.177
ϕ22 1.009 0.009 0.282 1.009 0.009 0.259
ϕ32 0.507 0.007 0.195 0.496 −0.004 0.169
ϕ33 1.012 0.012 0.281 0.996 −0.004 0.253
ψ11 0.976 −0.024 0.165 0.979 −0.021 0.161
ψ22 1.003 0.003 0.161 1.004 0.004 0.159
ψ33 0.981 −0.019 0.182 0.982 −0.018 0.170
ψ44 0.985 −0.015 0.154 0.985 −0.015 0.153
ψ55 0.975 −0.025 0.155 0.980 −0.020 0.149
ψ66 0.977 −0.023 0.171 0.977 −0.023 0.164
ψ77 0.981 −0.019 0.165 0.983 −0.017 0.159
ψ88 0.976 −0.024 0.171 0.982 −0.018 0.163
ψ99 0.980 −0.020 0.162 0.981 −0.019 0.160

Average bias Var MSE bias Var MSE
0.018 0.038 0.039 0.014 0.033 0.033

Table 9(a) and (b) contains the parallel results for the second population. With component means approximately one standard deviation apart, both single-stage ML and two-stage ML lead to factor loadings and error variances with little biases. Actually, all the biases for these two sets of parameters following two-stage ML are in the 2nd decimal place while most of those following single-stage ML are also in the 2nd decimal place. But there exist substantial positive biases in the estimates of factor variances and covariances. On average, two-stage ML results in slightly more biases than single-stage ML in Table 9(a) while the two methods have comparable biases in Table 9(b). With respect to efficiency and accuracy, single-stage ML performs better in Table 9(a) and worse in Table 9(b). Overall, two-stage ML leads to slightly more efficient and accurate parameter estimates but with slightly more biases. Although we expect single-stage ML to perform uniformly better, the results in Table 9(b) are not a surprise because statistical theory only says that single-stage ML generates most efficient estimator asymptotically.

Table 9.

Efficiency and accuracy of parameter estimates, µ1 = (0, …, 0)′ and µ2 = (1.5, …, 1.5)′, three-factor model, unidimensional measurement in the population, the model is correctly specified in number of components, number of population and covariance structures.

(a) Component 1 (based on 491 converged replications)

2-stage ML single-stage ML


θc1 Mean Bias SE Mean Bias SE
λ21 1.069 0.069 0.746 1.040 0.040 0.447
λ31 1.076 0.076 0.800 1.089 0.089 0.587
λ52 1.069 0.069 0.556 1.074 0.074 0.437
λ62 1.065 0.065 0.644 1.081 0.081 0.766
λ83 1.041 0.041 0.363 1.042 0.042 0.409
λ93 1.040 0.040 0.346 1.067 0.067 0.486
ϕ11 1.316 0.316 0.565 1.217 0.217 0.577
ϕ21 0.784 0.284 0.336 0.699 0.199 0.400
ϕ31 0.784 0.284 0.332 0.713 0.213 0.411
ϕ22 1.319 0.319 0.513 1.228 0.228 0.545
ϕ32 0.796 0.296 0.344 0.695 0.195 0.380
ϕ33 1.331 0.331 0.515 1.248 0.248 0.531
ψ11 0.931 −0.069 0.473 0.924 −0.076 0.345
ψ22 0.936 −0.064 0.412 0.895 −0.105 0.331
ψ33 0.967 −0.033 0.334 0.901 −0.099 0.348
ψ44 0.944 −0.056 0.346 0.932 −0.068 0.338
ψ55 0.956 −0.044 0.354 0.916 −0.084 0.350
ψ66 0.939 −0.061 0.382 0.907 −0.093 0.323
ψ77 0.958 −0.042 0.367 0.957 −0.043 0.381
ψ88 0.918 −0.082 0.406 0.902 −0.098 0.341
ψ99 0.935 −0.065 0.413 0.892 −0.108 0.363

Average bias Var MSE bias Var MSE
0.129 0.225 0.254 0.118 0.200 0.218
Table 9(b) Component 2 (based on 492 converged replications)

2-stage ML single-stage ML


θc1 Mean Bias SE Mean Bias SE
λ21 1.024 0.024 0.279 1.014 0.014 0.453
λ31 1.056 0.056 0.361 1.031 0.031 0.405
λ52 1.045 0.045 0.406 1.020 0.020 0.334
λ62 1.049 0.049 0.406 1.058 0.058 0.498
λ83 1.044 0.044 0.480 1.040 0.040 0.585
λ93 1.052 0.052 0.396 1.108 0.108 1.633
ϕ11 1.333 0.333 0.568 1.274 0.274 0.535
ϕ21 0.800 0.300 0.344 0.732 0.232 0.371
ϕ31 0.802 0.302 0.361 0.737 0.237 0.386
ϕ22 1.323 0.323 0.508 1.278 0.278 0.501
ϕ32 0.786 0.286 0.350 0.735 0.235 0.374
ϕ33 1.319 0.319 0.578 1.248 0.248 0.518
ψ11 0.946 −0.054 0.417 0.884 −0.116 0.322
ψ22 0.967 −0.033 0.317 0.927 −0.073 0.304
ψ33 0.933 −0.067 0.397 0.901 −0.099 0.306
ψ44 0.968 −0.032 0.380 0.915 −0.085 0.303
ψ55 0.937 −0.063 0.363 0.901 −0.099 0.290
ψ66 0.931 −0.069 0.376 0.926 −0.074 0.307
ψ77 0.934 −0.066 0.430 0.915 −0.085 0.322
ψ88 0.951 −0.049 0.378 0.923 −0.077 0.316
ψ99 0.948 −0.052 0.344 0.921 −0.079 0.326

Average bias Var MSE bias Var MSE
0.125 0.167 0.196 0.122 0.278 0.300

We would like to note that the results in Table 9(a) and (b) are based on 491 and 492 converged replications, respectively. All the nonconvergences occurred with the Fisher-scoring algorithm at the second stage of the two-stage ML, which is expected because nonconvergences have been repeatedly reported when fitting conventional SEM models (see e.g., Boomsma, 1985; Anderson & Gerbing, 1984). Neither the single-stage ML nor the first stage of the two-stage ML has any convergence problem due to the fact that EM algorithm always converges to a local/global maxima (Wu, 1983). The results for single-stage ML in Table 9 were based on the same sets of samples that the Fisher-scoring algorithm converged. Although EM algorithm always converges, the rate of convergence with the structured model is slow. On a Pentium(R)4CPU3.40GHz desktop computer, for the second population, the single-stage ML8 used 43 minutes to complete the simulation while the two-stage ML only took 9 minutes.

Comparing Table 8 and Table 9, we may notice that model separation has a strong effect on the efficiency of all the parameter estimates by both two-stage ML and single-stage ML. With respect to bias, its effect is mainly on the estimates of factor variances-covariances, not much on those of factor loadings or error variances. This implies that, when the model separation is one standard deviation apart, the estimates of the structural model in finite normal mixture SEM may not be reliable by either of the methods even when the underlying population distribution is truly a mixture normal .

The obtained empirical results on efficiency and MSE are informative but limited. A comprehensive Monte Carlo study might include conditions when the distribution for each component is not normally distributed as well as when the proportions of the components, the separation of the different components, the number of components and the sample size vary.

4.2 Robustness of BIC against distribution violations

The population in this subsection has one component while its distribution varies. We study the performance of the BICs when comparing the one-component saturated model against the two-component saturated model and when comparing a one-component structured model against a two-component structured model. The model corresponding to a smaller BIC is empirically preferred. Ideally, the empirically preferred model is always the one-component model. Because the saturated model is a special case of a structured model, which has been shown to not work well (Bauer & Curran, 2003), we do not expect that BIC with the saturated model performs ideally.

Let

x=μ+Λf+e, (16)

where μ is a 9 × 1 vector of 1.0’s, Λ is as given in (15), f is a 3 × 1 random vector with E(f) = 0 and Cov(f) = Φ = (ϕjk) also as given in (15), and e = (e1, e2, …, e9)′ is a 9 × 1 random vector with E(e) = 0 and Cov(e) = Ψ = I9. Let Φ1/2 be the positive definite symmetric matrix that satisfies Φ1/2Φ1/2 = Φ and

f=Φ1/2z3, (17)

where z3 = (z1, z2, z3)′ and z1, z2, z3 are independent and standardized random variables. Eleven distribution conditions are listed in the first column of Table 10, where f~χm2(0,Φ) implies that f is generated according to (17) with z1 to z3 each following independent and standardized9 χm2; the notation e~χm2(0,Ψ) with Ψ = I implies that e1 to e9 are independent and each follows independent and standardized χm2. Similarly, the notation f ~ log N(0, Φ) implies that f is generated according to (17) with z1 to z3 each following independent and standardized10 log N(0, 1); and e ~ log N(0, Ψ) with Ψ = I implies that ek are independent and each follows a standardized log N(0, 1).

Table 10.

Model selections using BIC: Saturated model vs. structured model.

n = 500 n = 1000


distribution
condition
saturated
model
structured
model
saturated
model
structured
model
x ~ N(µ,Σ) 496(4) 499(1) 491(9) 496(4)
x ~ Mt(µ, Σ, 8) 340 6 0 0
x ~ Mt(µ, Σ, 5) 0 0(1) 0 0(2)
f~χ32(0,Φ),e~N(0,Ψ)
500 463 487 27
f~N(0,Φ),e~χ32(0,Ψ)
376 0 1 0
f~χ32(0,Φ),e~χ32(0,Ψ)
83 0 0 0
f~χ12(0,Φ),e~N(0,Ψ)
230 1 0 0
f~N(0,Φ),e~χ12(0,Ψ)
0 0 0 0
f~χ12(0,Φ),e~χ12(0,Ψ)
0 0 0 0
f ~ log N(0,Φ), e ~ N(0,Ψ) 30(4) 0(2) 0(1) 0(2)
f ~ N (0,Φ), e ~ log N(0,Ψ) 0 0(1) 0 0(14)

Two sample sizes, n = 500 and 1000, are chosen to see the effect of n on BIC. Each sample is fitted by four models: (I) one-component model with saturated means and covariances; (II) two-component model with two saturated mean vectors and covariance matrices; (III) one-component factor model as in (16) with a saturated mean vector, Λ containing 6 free factor loadings together with λ11 = λ42 = λ73 = 1, corresponding to the nine none zero elements in (15), Φ being a free covariance matrix and Ψ being a diagonal matrix; (IV) two-component model with two saturated mean vectors, two factor loading matrices each contains 6 free factor loadings, two free factor covariance matrices and two diagonal error covariance matrices. Obviously, both the one-component saturated and structured models are correct while both the two-component models are over-parameterized or incorrect.

The EM algorithm was used in both the saturated and structured two-component models. Fisher-scoring algorithm was used in estimating the one-component structured model. The starting values for the one-component structured models are set at those corresponding to the population values that generate the data. For the starting values in the two-component models, the sample was first split by comparing the simple summation of the 9-observed variables to the summation of the population mean (which is 9 here), cases with xi1 + xi2 + … + xi9 > 9 are in one group and the rest are in another group. Let 1 and 2 be the vectors of sample means of the two groups. Ten starting values for the means of the two components are randomly drawn from N9(1, S) and N9(2, S), respectively, where S is the sample covariance matrix of the whole sample. The starting values for the sample covariance matrices for the two-component saturated model are both set at S. The starting value of the factor loading matrices, factor covariance matrices and unique variances for the two-component structured model are both set at the population values that generated the data. Because the log likelihood function never decreases in the EM algorithm, we define the convergence as when log likelihood function increases less than 0.0001 from the previous iteration. Let τ1 ≥ τ2 ≥ … ≥ τp be the eigenvalues of the covariance matrix of either of the two components. To avoid converging to near singular covariance matrices we define the current replication as unable to reach a convergence whenever τp1 < 0.001 for either of the components at the end of an M-step.

With 500 replications, the frequencies for BIC to choose the one-component models for different distribution conditions are reported in Table 10. The number of non-converged replications (after all the 10 starting values) are reported in parentheses, all of which occurred with the two-component models. When the population is normally distributed, BIC always chooses the correct number of components when evaluated at either the saturated or structured models for both n = 500 and 1000. At n = 500, when f~χ32(0,Φ) and e ~ N(0, Ψ), BIC with the saturated model identifies the correct number of components 100% of the time while that with the structured model identifies the correct number of components 92% of the time. At n = 500, BIC also performs better for the saturated model than for the structured model under the conditions with moderate distribution violations. In particular, when x ~ Mt(μ, Σ, 8), BIC with the saturated model chooses the correct number of components 340 times while that with the structured model only chooses the correct model 6 times; when f ~ N(0, Φ) and e~χ32(0,I), BIC with the structured model always chooses the wrong number of components while BIC with the saturated model still chooses the correct number of components about 75%; when f~χ12(0,Φ) and e ~ N(0, Ψ), BIC with the saturated model selects the correct number of components 46% while BIC with the structured model selects the correct number of components only once. At n = 1000, BIC with the saturated model obviously performed better when f~χ32(0,Φ) and e ~ N(0, Ψ). For the rest of the nonnormal distribution conditions at n = 1000, neither of the BICs could identify the correct number of components.

Notice that the skewness and kurtosis values of the normal distribution, the t-distribution with 8 degrees of freedom, the t-distribution with 5 degrees of freedom, the chi-square distribution with 3 degrees of freedom, the chi-square distribution with 1 degree of freedom, and the lognormal distribution are respectively (0, 0), (0, 1.5), (0, 6), (1.633, 4), (2.828, 12), and (6.185, 110.936). Obviously, the ability of BIC to choose the one-component model decreases when either f or e departs from normality as well as when n increases. At n = 500, when e ~ N(0, Ψ), BIC for both the saturated and structured models can endure mild nonnormality of f. At n = 1000 and when e ~ N(0, Ψ), BIC for the saturated model also allows mild nonnormality of f while BIC for the structured model is much less tolerant. The results for the structured model in Table 10 agree well with what Bauer and Curran (2003) have found.

It is nice to see that BIC with the saturated model performs better than that with the structured model, especially for moderate n. But BIC with the saturated model is still strongly affected by distribution violations at a large n. This is because BIC itself is constructed using the normal distribution assumption. As the sample size increases, a slight departure from normal distribution will be magnified at the sample level. When the sample size is huge, a slight distribution violation might cause BIC to select more components than are needed.

In this section we only studied the performances of the two BICs when the underlying population distribution has one component. A more thorough Monte Carlo study might include conditions when the underlying population has more than one component and the separation of the components varies. Different components may have different distributions as well as different model structures. The proportions of different components might also vary.

5. Conclusion and Discussion

In this paper we developed a two-stage ML approach to normal mixture SEM. The most important feature of two-stage ML is that all the techniques accumulated in the conventional SEM literature can be used to study the model structure for each component. In particular, four statistics can be used to judge whether a particular component is adequately fitted by the structural model. Another feature is that the components are segregated at stage-2, so that model misspecification in one component does not affect the estimation and evaluation of models for other components. The third feature is its flexibility in modeling different components with different models. One component may be fitted by a SEM model suggested by a firm substantive theory while the structure of another component can be explored using principal components or exploratory factor analysis. Also associated with two-stage ML is its readiness to adopt any advances in the mixture model literature. A new statistical development in mixture model is typically directed first towards the model with saturated means and covariances. Once a more reliable method is available, we can immediately apply it to the first stage of the two-stage ML. Compared to single-stage ML, two-stage ML is also computationally more efficient11. Arminger et al. (1999) noted that, with the second stage being a simultaneous GLS procedure, their two-stage approach is a lot faster than single-stage ML. Our limited experience also indicates that the proposed two-stage ML is a lot faster than single-stage ML. This is because in the second-stage of two-stage ML the dimension of the problem is much smaller and the discrepancy function in (6) or (7) is easier to minimize than the simultaneous ADF/GLS function.

Considering that any interesting model is at best only an approximation to the real world, throughout the paper we have emphasized valid statistical inference when conditions are not ideal. When there is a possibility that a population is heterogeneous, we may use a normal mixture to approximate the distribution. If (μ̂j, Σ̂j) is fitted well by a theoretical model (μ(θj), Σj(θj)), then the mixture model confirms our theory for the jth component. Most likely, the initial model has to be modified. If the modified model can be explained via the substantive meaning of the variables, then the mixture model allows us to better understand the relationship among the variables for the jth component. It is possible that a well fitted model is fundamentally different from an established theory. Then one may need to reexamine the theory, or possibly a mixture model is not a proper statistical technique for extracting information from the data. The proposed statistics facilitate the validation of a theory with the two-stage ML approach to normal mixture SEM. We illustrated the application of the LM test for model modification because of its standard use in statistics (e.g., Buse, 1982) and its appropriate results when carried out in an a priori manner. We do not recommend empirical model modification without a theoretical rationale as a general practice, since it is well known that this can lead to capitalization on chance (e.g., MacCallum, Roznowski, & Necowitz, 1992). In a purely exploratory context, other search methods also can be consulted (e.g., tabu search; Marcoulides, Drezner, & Schumacker, 1998).

The four proposed statistics for overall model evaluation have been shown to work well in conventional SEM. They also perform consistently in the examples in section 3. In practice, they should agree for most data sets. If they do not agree with a relatively large n, we should trust FR or TCRADF because their reference distributions are asymptotically correct. If the n is relatively small, we should trust TMLa or TRML more, because they depend less on the asymptotic covariance matrix Γsw. With a really small n, mixture modeling may not be the best way to do data analysis, since any of the statistics should be considered as a rough description rather than tests that control type I or II errors. Although tempting in current practice, the likelihood ratio statistic TML should never be used to evaluate the overall model structure at stage-2. With practical data, all the proposed statistics should be evaluated using the sandwich-type covariance matrix Γ̂swjj or Γ̂cswjj ; SEs should be estimated using (8) or (9) together with Γ̂swjj or Γ̂cswjj. Of course, when one is 100% confident that the population distribution is mixture normal, using Γ̂injj or Γ̂cinjj will lead to more accurate inference due to less sampling error. In this paper we did not report any of the popular fit indices. Obviously, they can be used to evaluate the overall model structure at stage-2. Because TML is no longer valid for model inference, these fit indices should be defined using TRML, TMLa or TCRADF. We also would like to note that, although these statistics worked well with conventional SEM, further study on their behavior with mixture models is still valuable. In particular, the input for the second stage analysis depends on the goodness of the first stage ML, which has been shown to not work well when different components are not well separated and when sample size is not large enough (see e.g., Hosmer, 1973). Monte Carlo study on the behavior of these statistics would be informative when varying the sample size, the proportions of different components, as well as the separation among the components.

The Monte Carlo results in section 4 indicated that distribution violations affect the ability of BIC to identify the correct number of components, although BIC with the saturated model performs better than that with a structured model. Since such a phenomenon is rooted in the distributional assumption underlying BIC’s formulation, it is wise to always cast doubt on the number of components corresponding to the smallest BIC. The four statistics at stage-2 of two-stage ML are especially valuable in providing alternative evaluations on the overall model structure for each component. When an individual component cannot be fitted adequately by a substantively meaningful model, it is likely that the number of components is not correctly identified. Then one may next study the individual component that corresponds to a smaller m with a greater BIC. If the components with smaller m can be adequately fitted by substantively meaningful models, then the mixture model with a smaller m is preferred. If the components with a smaller m still cannot be well fitted by substantively interesting models, it is most likely that finite normal mixture model is not a good description of the underlying population.

Arminger et al. (1999) mainly considered single-stage ML and a two-stage approach for models with covariates. The two-stage ML approach developed here can also be extended to models with covariates. One just needs to change the first-stage ML to conditional ML, using a sandwich-type covariance matrix to estimate the asymptotic covariance matrix of the conditional means and variances-covariances, parallel to (3). At the second stage, the conditional means and covariances are fitted by the model with the same covariates as at stage-1 through minimizing the normal distribution based discrepancy functions.

In the first stage of two-stage ML, we did not consider any constraints across covariances or means. These can also be easily incorporated into two-stage ML. For example, if there is a strong theory to support that the covariances are homogeneous across the mixture components, then it is necessary and straightforward to incorporate such a constraint into the first stage. The second stage will be the same once the MLEs of the means and covariance matrices of the first stage are obtained together with Γ̂sw. In this paper, we did not consider any across-component constraints on parameters at stage-2. If such constraints are deemed necessary, then the m discrepancy functions at stage-2 need to be minimized simultaneously under the constraints. Standard errors and test statistics for overall model evaluation can be developed, similar to multiple-group analysis while the mean vectors and covariance matrices in different groups are correlated through the estimate of Γsw in (3). Because misspecification in such constraints is confounded with misspecification of the overall model in each component, it is wise to evaluate the adequacy of the model for each component first, parallel to the advised configural invariance test in multiple-group analysis.

We have only considered finite normal mixture SEM with continuous variables. The development in Muthén and Shedden (1999) and Muthén et al. (2002) allows the probabilities of categorical outcomes to be accounted for by covariates. Because the procedure is a single-stage ML, the limitation that no test statistics exist for the overall model evaluation also applies. Any future development to effectively evaluate the adequacy of the overall model with categorical outcome variables would be an important contribution to mixture modeling.

A final note is that the two-stage ML developed here will be available in the next version of EQS. The LM test at stage-2 will be based on the sandwich-type covariance matrix Γ̂sw, not based on the inverse of the information matrix, as is currently formulated when running the program in appendix C.

Acknowledgment

We would like to thank three reviewers and the editor for comments that helped in improving the paper.

Appendix A

This appendix contains the EM algorithm for obtaining the MLE of a mixture normal distribution with saturated means and covariances. For a m-component normal mixture, let the starting values be π1(0),π2(0),,πm1(0);(µ1(0),Σ1(0)),(µ2(0),Σ2(0)),,(µm(0),Σm(0)). The normal density function evaluated at xi and (μj(0),Σj(0)) is denoted as fij(0). Let

πm(0)=1j=1m1πj(0),fi(0)=j=1mπj0fij(0),

and

wij(0)=πj(0)fij(0)fi(0).

Then the updated values are

πj(1)=1ni=1nwij(0),j=1,2,,(m1);
μj(1)=i=1nwij(0)xii=1nwij(0),j=1,2,,m;
j(1)=i=1nwij(0)xixii=1nwij(0)μj(1)μj(1),j=1,2,,m.

A β̂ will be obtained when using the updated values as new starting values and reevaluating the above equations until convergence. At the end of each cycle, one needs to check that no Σj(1) is close to being singular. This can be done by specifying a small number ε and redoing the above procedure with a new set of starting values whenever |Σj(1)|<ε|S|, where S is the sample covariance matrix of the whole sample. With many different (random) starting values, we will get the MLE by choosing β̂ that maximizes l(β̂). At the final estimates, any tiny π̂j may indicate that m is overspecified, as in Table 2(b).

Appendix B

This appendix contains the score vectors i(θ) and Hessian matricesi(θ). These are used to obtain the matrices  and in section 2. Following the notation of section 2, let fj(xi) be the density function of xi ~ Np(μj, Σj) and

lNij(βj)=logfj(xi)=p2log(2π)12log|Σj|12(xiμj)Σj1(xiμj)/2,

where the subscript N denotes the likelihood of a normal distribution to distinguish it from the li(β) in equation (2). Let Dp be the duplication matrix such that vec(Σj) = Dpvech(Σj),

Wcj=12Dp(Σj1Σj1)Dp,

and

Vj(xi)=Σj1(xiμj)(xiμj)Σj112Σj1.

Then standard differential rules lead to

lNij(βj)μj=Σj1(xiμj),
lNij(βj)σj=Wcjvech[(xiμj)(xiμj)Σj],
2lNij(βj)μjμj=Σj1,
2lNij(βj)μjσj={[(xiμj)Σj1]Σj1}Dp,
2lNij(βj)σjμj=[2lNij(βj)μjσj],
2lNij(βj)σjσj=Dp[Vj(xi)Σj1]Dp.

Applying the above notation on the differential of li(β) and noticing that

fj(xi)θ=fj(xi)logfj(xi)θ=fj(xi)lNij(βj)θ,

we have

li(β)πj=fj(xi)fm(xi)f(xi),j=1,2,,(m1),
li(β)μj=πjfj(xi)f(xi)lNij(βj)μj,j=1,2,,m,
li(β)σj=πjfj(xi)f(xi)lNij(βj)σj,j=1,2,,m.

The vector i(β̂) for evaluating the in section 2 is obtained when putting the above three sets of derivatives into a long vector.

Similarly, we will give the elements of i(β) using the second derivatives of li(β) with respect to the mean vector and covariance matrix for each component distribution,

2li(β)πjπk=[fj(xi)fm(xi)][fk(xi)fm(xi)]f2(xi)  ,j,k=1,2,,(m1);
2li(β)πjμj=fj(xi)f(xi){1πj[fj(xi)fm(xi)]f(xi)}lNij(βj)μj,  1j(m1);
2li(β)πjσj=fj(xi)f(xi){1πj[fj(xi)fm(xi)]f(xi)}lNij(βj)σj,  1j(m1);
2li(β)πjμk=πkfk(xi)[fj(xi)fm(xi)]f2(xi)lNik(βk)μk,  1jk(m1);
2li(β)πjσk=πkfk(xi)[fj(xi)fm(xi)]f2(xi)lNik(βk)σk,  1jk(m1);
2li(β)πjμm=fm(xi)f(xi){1+πm[fj(xi)fm(xi)]f(xi)}lNim(βm)μm,  1j(m1);
2li(β)πjσm=fm(xi)f(xi){1+πm[fj(xi)fm(xi)]f(xi)}lNim(βm)σm,  1j(m1);
2li(β)μjμj=πjfj(xi)f(xi){1πjfj(xi)f(xi)}lNij(βj)μjlNij(βj)μj+πjfj(xi)f(xi)2lNij(βj)μjμj,  1jm;
2li(β)μjσj=πjfj(xi)f(xi){1πjfj(xi)f(xi)}lNij(βj)μjlNij(βj)σj+πjfj(xi)f(xi)2lNij(βj)μjσj,  1jm;
2li(β)μjμk=[πjfj(xi)][πkfk(xi)]f2(xi)lNij(βj)μjlNik(βk)μk,  1jkm;
2li(β)μjσk=[πjfj(xi)][πkfk(xi)]f2(xi)lNij(βj)μjlNik(βk)σk,  1jkm;
2li(β)σjσj=πjfj(xi)f(xi){1πjfj(xi)f(xi)}lNij(βj)σjlNij(βj)σj+πjfj(xi)f(xi)2lNij(βj)σjσj,  1jm;
2li(β)σjσk=[πjfj(xi)][πkfk(xi)]f2(xi)lNij(βj)σjlNik(βk)σk,  1jkm.

Appendix C


/TITLE
Fitting the covariance matrix of component 1 using a three-factor model
/SPECIFICATION
weight=’d:\mixture\Gamma_cin11.dat’;
cases=400; variables=9; matrix=covariance;
analysis=covariance; methods=ML, robust;
/EQUATION
V1=1F1 +E1;
V2=1*F1+E2;
V3=1*F1+E3;
V4=1F2 +E4;
V5=1*F2+E5;
V6=1*F2+E6;
V7=1F3 +E7;
V8=1*F3+E8;
V9=1*F3+E9;
/VARIANCES
E1–E9=*;
F1=1.0*;
F2=1.0*;
F3=1.0*;
/COVARIANCES
F2,F1=0.5*;
F3,F1=0.5*;
F3,F2=0.5*;
/LMtest
/MATRIX
4.25392 1.60997 1.44642 2.62957 1.33318 1.47782 2.58713 1.01342 0.92652
1.60997 1.89629 0.95962 1.12648 0.41540 0.48102 1.45105 0.66431 0.49418
1.44642 0.95962 1.88683 0.99374 0.31901 0.45803 1.51408 0.57168 0.45506
2.62957 1.12648 0.99374 3.94126 1.33554 1.13264 2.41744 1.43989 1.41107
1.33318 0.41540 0.31901 1.33554 1.77134 0.78891 0.79523 0.31037 0.36376
1.47782 0.48102 0.45803 1.13264 0.78891 1.60733 1.02288 0.33154 0.53856
2.58713 1.45105 1.51408 2.41744 0.79523 1.02288 3.95897 1.46828 1.29994
1.01342 0.66431 0.57168 1.43989 0.31037 0.33154 1.46828 1.97394 0.85362
0.92652 0.49418 0.45506 1.41107 0.36376 0.53856 1.29994 0.85362 2.09065
/END

Footnotes

*

This research was supported by NSF grant DMS04-37167, grants DA01070 and DA00017 from the National Institute on Drug Abuse, and a grant from the National Natural Science Foundation of China (30870784).

1

Although we choose the discrepancy function in (6), we would like to note that Σ̂j cannot be regarded as the sample covariance matrix from a normal distribution. Thus, the second stage is not strictly ML. The first stage or the so-called single-stage ML is not strictly ML either with most practical data. Following the typical use of ML methodology in practice, we call the procedure of minimizing FML(θj) for parameter estimates “ML” rather than “pseudo ML”.

2

For readers who want to replicate the study, the sample was generated using SAS IML code (proc IML; seed=1111111111; x=j(400,9,0); do i=1 to 400; ui=uniform(seed); if ui<0.5 then do; yi1=sigh1*normal(j(9,1,seed))+mu1; x[i,]=yi1‘; end; else do; yi2=sigh2*normal(j(9,1,seed))+mu2; x[i,]=yi2‘; end; end;). In the above notation, mu1 is the mean vector of the first population, sigh1=Σ11/2 is the 9 × 9 symmetric matrix that satisfies Σ11/2Σ11/2=Σ1; mu2 is the mean vector of the second population, sigh2=Σ21/2 is the 9 × 9 symmetric matrix that satisfies Σ21/2Σ21/2=Σ2.

3

Although most researchers agree that models are at best only approximations to the real world, there is no agreement on how much difference between the population and initial theoretical model would best represent reality. We may think that the variables in Example 1 are well designed and the sample is well collected. There is still a significant gap between the initial unidimensional three-factor model and the sample/population. Actually, with λ11 = λ42 = λ73 = 1 for model identification, the loading estimates are λ̂21 = .442, λ̂62 = 2.242; λ̂31, λ̂52, λ̂83, λ̂93 are around 1.0. The extra loading identified by the LM test, λ̂91 = 3.368, is the greatest and is almost triple the average of the loadings in the initial model. If letting the LM test search for correlated errors in the initial model, three error covariances are identified as being significant: ψ̂78 = 172.957, ψ̂23 = 5.451, and ψ̂17 = −20.624. All are larger in absolute value than the smallest error variance ψ̂44 = 2.832.

4

Under the LM test in EQS output, there are other multivariate sequential statistics for improving model-fit. Here we report only the univariate LM test, which is equivalent to the model modification index in LISREL and Mplus.

5

The p-value under a LM statistic is obtained by comparing the LM statistic to χ12.

6

The LM test in EQS is based on the normal distribution assumption, the z-score reported here uses SEs based on the sandwich-type covariance matrix in equation (9).

7

For readers who want to replicate the study, the sample was generated using SAS IML code (proc IML; alpha=4; sight1=sigh1*sqrt(3/4); sight2=sigh2*sqrt(3/4); seed=1111111111; x=j(400,9,0); do i=1 to 400; ui=uniform(seed); if ui<0.5 then do; ui=rangam(seed,alpha)/alpha; sighi1=sight1/sqrt(ui); yi1=sighi1*normal(j(9,1,seed))+mu1; x[i,]=yi1‘; end; else do; ui=rangam(seed,alpha)/alpha; sighi2=sight2/sqrt(ui); yi2=sighi2*normal(j(9,1,seed))+mu2; x[i,]=yi2‘; end; end;). In the above notation, mu1 is the mean vector of the first population, sigh1=Σ11/2 is the 9 × 9 symmetric matrix that satisfies Σ11/2Σ11/2=Σ1; mu2 is the mean vector of the second population, sigh2=Σ21/2 is the 9 × 9 symmetric matrix that satisfies Σ21/2Σ21/2=Σ2.

8

The convergence for the single-stage ML is set as l(j+1)l(j) < .0001, where l(j) is the log likelihood function evaluated after the jth iteration. The convergence for the fist stage of the two-stage ML is also set as l(j+1)l(j) < .0001, and for the second stage it is set as when the sum of squares of the difference between θ(j+1) and θ(j) is less than .0001, where θ(j) is the vector of parameter after the jth iteration.

9

For a given random variable x, its standardized version is obtained by zx = [xE(x)]/{Var(x)}1/2.

10

A random variable x following the log-normal distribution log N(0, 1) is obtained by x = exp(z) and z ~ N(0, 1).

11

A reviewer conjectured that, with many variables and many components, the single-stage ML may be computationally more efficient than two-stage ML. Considering that with 9 variables and 2 components single-stage ML took 4.7 times longer than two-stage ML, we cannot endorse the conjecture before further research.

Contributor Information

Ke-Hai Yuan, University of Notre Dame.

Peter M. Bentler, University of California, Los Angeles

REFERENCES

  1. Anderson James C, Gerbing David W. The Effects of Sampling Error on Convergence, Improper Solutions and Goodness-of-Fit Indices for Maximum Likelihood Confirmatory Factor Analysis. Psychometrika. 1984;49:155–173. [Google Scholar]
  2. Arminger Gerhard, Wittenberg Jörg. Finite Mixtures of Covariance Structure Models with Regressors. Sociological Methods & Research. 1997;26:148–182. [Google Scholar]
  3. Arminger Gerhard, Stein Petra, Wittenberg Jörg. Mixtures of Conditional Mean- and Covariance Structure Models. Psychometrika. 1999;64:475–494. [Google Scholar]
  4. Bauer Daniel J. Observations on the Use of Growth Mixture Models in Psychological Research. Multivariate Behavioral Research. 2007;42:757–786. [Google Scholar]
  5. Bauer Daniel J, Curran Patrick J. Distributional Assumptions of Growth Mixture Models: Implications for Overextraction of Latent Trajectory Classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]
  6. Bauer Daniel J, Curran Patrick J. The Integration of Continuous and Discrete Latent Variable Models: Potential Problems and Promising Opportunities. Psychological Methods. 2004;9:3–29. doi: 10.1037/1082-989X.9.1.3. [DOI] [PubMed] [Google Scholar]
  7. Bentler Peter M. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software; in press. [Google Scholar]
  8. Bentler Peter M, Yuan Ke-Hai. Structural Equation Modeling with Small Samples: Test Statistics. Multivariate Behavioral Research. 1999;34:181–197. doi: 10.1207/S15327906Mb340203. [DOI] [PubMed] [Google Scholar]
  9. Biernacki Christophe, Celeux Gilles, Govaert Gérard. Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22:719–725. [Google Scholar]
  10. Eero Blåfield. Jyvaskyla Studies in Computer Science, Economics, and Statistics, 2. Finland: Jyvaskyla University; 1980. Clustering of Observations from Finite Mixtures with Structural Information. [Google Scholar]
  11. Boomsma Anne. Nonconvergence, Improper Solutions, and Starting Values in LISREL Maximum Likelihood Estimation. Psychometrika. 1985;50:229–242. [Google Scholar]
  12. Browne Michael W. Asymptotic Distribution-Free Methods for the Analysis of Covariance Structures. British Journal of Mathematical and Statistical Psychology. 1984;37:62–83. doi: 10.1111/j.2044-8317.1984.tb00789.x. [DOI] [PubMed] [Google Scholar]
  13. Buse Adolf. The Likelihood Ratio, Wald and Lagrange Multiplier Tests: An Expository Note. American Statistician. 1982;36:153–157. [Google Scholar]
  14. Clogg Clifford C. Latent Class Models. In: Arminger Gerhard, Clogg Clifford C, Sobel Michael E., editors. Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum; 1995. pp. 311–359. [Google Scholar]
  15. Dolan Conor V, van der Maas Han L J. Fitting Multivariate Normal Finite Mixtures Subject to Structural Equation Modeling. Psychometrika. 1998;63:227–253. [Google Scholar]
  16. Everitt Brian S, Hand David J. Finite Mixture Distributions. London: Chapman & Hall; 1981. [Google Scholar]
  17. Fouladi Rachel T. Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis under Conditions of Multivariate Nonnormality. Structural Equation Modeling. 2000;7:356–410. [Google Scholar]
  18. Goodman Leo A. Exploratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models. Biometrika. 1974a;61:215–231. [Google Scholar]
  19. Goodman Leo A. The Analysis of Systems of Qualitative Variables when Some of the Variables Are Unobservable: Part I-A Modified Latent Structure Approach. American Journal of Sociology. 1974b;79:1179–1259. [Google Scholar]
  20. Hagenaars Jacques A, McCutcheon Allan L., editors. Applied Latent Class Analysis. Cambridge, UK: Cambridge University Press; 2002. [Google Scholar]
  21. Holzinger Karl J, Swineford Frances. A Study in Factor Analysis: The Stability of a Bi-Factor Solution. University of Chicago: Supplementary Educational Monographs; 1939. No. 48. [Google Scholar]
  22. Hoshino Takahiro. Bayesian Inference for Finite Mixtures in Confirmatory Factor Analysis. Behaviormetrika. 2001;28:37–63. [Google Scholar]
  23. Hosmer David W. On MLE of the Parameters of a Mixture of Two Normal Distributions when the Sample Size is Small. Communication in Statistics. 1973;1:217–227. [Google Scholar]
  24. Hu Li-tze, Bentler Peter M. Cutoff Criterion for Fit Indices in Covariance Structure Analysis: Conventional Criteria versus New Alternatives. Structural Equation Modeling. 1999;6:1–55. [Google Scholar]
  25. Hu Li-tze, Bentler Peter M, Kano Yutaka. Can Test Statistics in Covariance Structure Analysis be Trusted? Psychological Bulletin. 1992;112:351–362. doi: 10.1037/0033-2909.112.2.351. [DOI] [PubMed] [Google Scholar]
  26. Jedidi Kamel, Jagpal Harsharanjeet S, DeSarbo Wayne S. Finite-Mixture Structural Equation Models for Response-Based Segmentation and Unobserved Heterogeneity. Marketing Science. 1997;16:39–59. [Google Scholar]
  27. Jöreskog Karl G. A General Approach to Confirmatory Maximum Likelihood Factor Analysis. Psychometrika. 1969;34:183–202. [Google Scholar]
  28. Jöreskog Karl G. In: Testing Structural Equation Models. Bollen Kenneth A, Long Scott J., editors. Newbury Park, CA: Sage; 1993. pp. 294–316. [Google Scholar]
  29. Jöreskog Karl G, Dag Sörbom D, Stephen du Toit, Mathilda du Toit . LISREL 8: New Statistical Features. Lincolnwood, IL: Scientific Software International; 2000. [Google Scholar]
  30. Lubke Gitta, Neale Michael C. Distinguishing between Latent Classes and Continuous Factors: Resolution by Maximum Likelihood? Multivariate Behavioral Research. 2006;41:499–532. doi: 10.1207/s15327906mbr4104_4. [DOI] [PubMed] [Google Scholar]
  31. MacCallum Robert C, Roznowski Mary, Necowitz Lawrence B. Model Modification in Covariance Structure Analysis: The Problem of Capitalization on Chance. Psychological Bulletin. 1992;111:490–504. doi: 10.1037/0033-2909.111.3.490. [DOI] [PubMed] [Google Scholar]
  32. Marcoulides George A, Drezner Zvi, Schumacker Randall E. Model Specification Searches in Structural Equation Modeling using Tabu Search. Structural Equation Modeling. 1998;5:365–376. [Google Scholar]
  33. Mardia Kanti V. Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika. 1970;57:519–530. [Google Scholar]
  34. McLachlan Geoffrey, Peel David. Finite Mixture Models. New York: Wiley; 2000. [Google Scholar]
  35. Muthén Bengt C, Brown Hendricks, Masyn Katherine, Jo Booil, Khoo Siek-Toon, Yang Chih-Chien, Wang Chen-Pin, Kellam Sheppard G, Carlin John B, Liao Jason. General Growth Mixture Modeling for Randomized Preventive Interventions. Biostatistics. 2002;3:459–475. doi: 10.1093/biostatistics/3.4.459. [DOI] [PubMed] [Google Scholar]
  36. Muthén Bengt, Shedden Kerby. Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm. Biometrics. 1999;55:463–469. doi: 10.1111/j.0006-341x.1999.00463.x. [DOI] [PubMed] [Google Scholar]
  37. Muthén Linda K, Muthén Bengt O. Mplus User’s Guide. 5th ed. Los Angeles, CA: Muthén & Muthén; 2007. [Google Scholar]
  38. Satorra Albert, Bentler Peter M. Corrections to Test Statistics and Standard Errors in Covariance Structure Analysis. In: von Eye Alexander, Clogg Clifford C., editors. Latent Variables Analysis: Applications for Developmental Research. Thousand Oaks, CA: Sage; 1994. pp. 399–419. [Google Scholar]
  39. Sörbom Dag. Model Modification. Psychometrika. 1989;54:371–384. [Google Scholar]
  40. Titterington David M, Smith Adrian FM, Makov UE. Statistical Analysis of Finite Mixture Distributions. New York: Wiley; 1985. [Google Scholar]
  41. Tofighi Davood, Enders Craig K. Identifying the Correct Number of Classes in Growth Mixture Models. In: Hancock Gregory R, Samuelsen Karen M., editors. Advances in Latent Variable Mixture Models. Charlotte, NC: IAP; 2008. pp. 317–341. [Google Scholar]
  42. Wu C F Jeff. On the Convergence Properties of the EM Algorithm. Annals of Statistics. 1983;11:95–103. [Google Scholar]
  43. Yuan Ke-Hai, Bentler Peter M. Improving Parameter Tests in Covariance Structure Analysis. Computational Statistics and Data Analysis. 1997;26:177–198. [Google Scholar]
  44. Yuan Ke-Hai, Bentler Peter M. Normal Theory Based Test Statistics in Structural Equation Modeling. British Journal of Mathematical and Statistical Psychology. 1998;51:289–309. doi: 10.1111/j.2044-8317.1998.tb00682.x. [DOI] [PubMed] [Google Scholar]
  45. Yuan Ke-Hai, Bentler Peter M. Multilevel Covariance Structure Analysis by Fitting Multiple Single-Level Models. Sociological Methodology. 2007;37:53–82. [Google Scholar]
  46. Yuan Ke-Hai, Bentler Peter M. Two Simple Approximations to the Distributions of Quadratic Forms. British Journal of Mathematical and Statistical Psychology. 2009 doi: 10.1348/000711009X449771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yuan Ke-Hai, Jennrich Robert I. Asymptotics of Estimating Equations under Natural Conditions. Journal of Multivariate Analysis. 1998;65:245–260. [Google Scholar]
  48. Yung Yiu-Fai. Finite Mixtures in Confirmatory Factor-Analytic Models. PhD Dissertation: UCLA; 1994. [Google Scholar]
  49. Yung Yiu-Fai. Finite Mixtures in Confirmatory Factor-Analytic Models. Psychometrika. 1997;62:297–330. [Google Scholar]
  50. Zhu Hong-Tu, Lee Sik-Yum. A Bayesian Analysis of Finite Mixtures in the LISREL Model. Psychometrika. 2001;66:133–152. [Google Scholar]

RESOURCES