Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 May 2.
Published in final edited form as: Educ Psychol Meas. 2011 Mar 22;71(2):325–345. doi: 10.1177/0013164410381272

Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models2

Peter M Bentler 1, Jiajuan Liang 2, Man-Lai Tang 3, Ke-Hai Yuan 4
PMCID: PMC3085489  NIHMSID: NIHMS284547  PMID: 21544234

Abstract

Maximum likelihood is commonly used for estimation of model parameters in analysis of two-level structural equation models. Constraints on model parameters could be encountered in some situations such as equal factor loadings for different factors. Linear constraints are the most common ones and they are relatively easy to handle in maximum likelihood analysis. Nonlinear constraints could be encountered in complicated applications. In this paper we develop an EM-type algorithm for estimating model parameters with both linear and nonlinear constraints. The empirical performance of the algorithm is demonstrated by a Monte Carlo study. Application of the algorithm for linear constraints is illustrated by setting up a two-level mean and covariance structure model for a real two-level data set and running an EQS program.

Keywords: EM algorithm, maximum likelihood estimation, mean and covariance structure, linear and nonlinear constraints, two-level structural equation model


Two-level mean and covariance structure models (or structural equation models, SEM for simplicity) have been applied to data analysis in various fields such as education, medicine, psychology and sociology. In many practical situations, the hierarchical structure that is implicit in the collected data should not be ignored as various potentially important sources of variance can be identified. For example, SAT (Scholastic Attitude Test) scores from students nested in different high schools not only can be expected to reveal individual student differences, but also to reflect differences among the implicit variable “schools”. Schools can be expected to vary in the extent of student preparation, their socioeconomic environments, their teaching facilities, and their teacher training. Hence, the SAT scores will reflect school differences as well as individual differences, and as a result of the dependence introduced by school differences, the SAT scores are best not considered as independent observations. In this case, variation is at two levels, with the students called the level-1 units (or individuals) and the schools called the level-2 units (groups or clusters). Furthermore, when considering latent sources of variation giving rise to the SAT scores, it is probable that the influences at the two levels are different. For instance, factors such as “general math ability” and “general writing ability” may explain performance at level one, while factors such as “school quality” or “teacher background” may explain school differences at level two. It is now widely recognized that to understand how the level-1 factors influence one another and their effects on the data, a level-1 model must be used to analyze the within (level-1) covariance structure; such a model may include the factor loadings, variances and covariances of factors and prediction errors. Similarly, to understand how the level-2 factors influence one another and their effects on the data, a level-2 model has to be set up to analyze the “between” (level-2) covariance structure. By observing the two different types of effects on two-level data, many researchers have proposed various formulations for two-level covariance structure models, see, for example, McDonald and Goldstein (1989), Muthén (1994), Lee and Poon (1998), Liang and Bentler (2004).

When a linear two-level SEM has been set up for analysis of the within and between covariance structures, the first problem to solve is estimation of model parameters. There are two basic methods for parameter estimation for two-level SEM in the literature: the maximum likelihood (ML) method and the generalized least squares (GLS) method. The ML method is closely related to the normal distributional assumption on the model and it is the most popular one in analysis of two-level SEM because of its easy implementation and some nice properties. Different ML methods have been proposed for analysis of two-level SEM and they are available in many standard statistical packages such as EQS (Bentler, 2006), LISREL (cf. du Toit & du Toit, 2008), and Mplus (Muthén & Muthén, 2004). Unfortunately, many existing algorithms for ML estimation do not allow general parameter constraints although they may allow simple linear constraints on parameters. Parameter constraints in ML analysis of statistical models are usually imposed as a result of substantive prior knowledge on the relationships among model parameters. For example, in a factor analysis model, if a factor is measured by indicator variables and prior knowledge implies that they are equally important, an equality constraint on the factor loadings may be imposed when implementing the ML estimation. Linear constraints are the simplest ones in ML analysis of statistical models with parameter constraints, and simple analytical solutions are usually available, see, for example, Kim and Taylor (1995), and Jamshidian (2004). Nonlinear functional constraints on parameters may appear in some complicated problems and this type of constraints has been less thoroughly studied. A detailed discussion on the necessity of constrained ML estimation can be dated back to Aitchison and Silvey (1958). Lee (1979) proposed an algorithm for both ML and weighted least squares estimation with general (linear and nonlinear) constraints for conventional (one-level) SEM. Lee and Tsang (1999) developed EM type algorithms for constrained ML estimation of two-level SEM with only covariance structures.

The importance of consideration of a nonsaturated mean structure in conventional SEM has been discussed by Yung and Bentler (1999). Liang and Bentler (2004) proposed a new formulation for two-level SEM and developed an EM algorithm that allows both mean and covariance structures in ML estimation without constraints. The study in this paper is a generalization of that in Liang and Bentler (2004) and an extension of Lee and Tsang (1999). The paper is organized as follows. Section 2 gives a simple review on the model in Liang and Bentler (2004). Section 3 presents the details of the algorithm associated with the Lagrange multiplier. Section 4 provides a limited Monte Carlo study on the performance of the proposed algorithm by using an artificial two-level SEM. Section 5 gives an illustration of the proposed algorithm with linear constraints by implementing a two-level SEM on EQS for a practical data set. Some concluding remarks are given in the last section.

A REVIEW ON THE EXISTING MODEL

Let {zg : g = 1,…,G} be a set of level-2 observations from G level-2 units such as financial sources for schools, and {ygi : i = 1,…,Ng; g = 1,…,G} a set of level-1 observations. For example, ygi may stand for the observation from the i-th level-1 unit (e.g., student) nested in the g-th level-2 unit (e.g., schools). Liang and Bentler (2004) proposed the following data formulation of two-level SEM

(zgygi)=(zgυg)+(0υgi) (1)

associated with the assumptions:

  • A1)

    υgi contains latent variables capturing level-1 effects. {υgi : i = 1,…,Ng} (for each fixed g) are are i.i.d. (independently identically distributed) and υgi ~ Np(0, ΣW), the p-dimensional normal distribution with ΣW > 0 (positive definite) for g = 1,…,G;

  • A2)

    υg contains latent variables capturing level-2 effects. {υg : g = 1,…,G} are i.i.d. and υg ~ Np(0, ΣB) with ΣB > 0;

  • A3)

    {zg : g = 1,…,G} are i.i.d. level-2 observations and zg ~ Nq(μz, Σzz) with Σzz > 0;

  • A4)
    the random vector (zg,υg) ((p + q) × 1) has a joint nonsingular multivariate normal distribution Np+q(μ, Σ̃B) with Σ̃B > 0 and
    μ(θ)=(μzμy),  Σ˜B(θ)=cov(zgυg)=(ΣzzΣzyΣyzΣB), (2)
    where Σzy=Σyz=cov(zg,υg);
  • A5)

    {zg, υg} is uncorrelated with {υgi : i = 1,…,Ng}.

The mean and covariance structures in (2) may be characterized by a common model parameter vector θ (r × 1) with r functionally independent model parameters. Nonsaturated mean structure in (2) implies that the means μz and μy may be also characterized by the common model parameter vector θ. Existing algorithms for ML analysis of two-level SEM usually treat the means from manifest variables as individual parameters separated from the covariance parameters, see, for example, McDonald and Goldstein (1989), Muthén (1994), Raudenbush (1995), and du Toit and du Toit (2008). When the number of functionally independent model parameters r in θ in (2) is less than the total number of means from all manifest (observable) variables in zg and ygi in formulation (1) plus all variances and nonduplicated co-variances, a model with formulation (1) is called a nonsaturated model. The purpose of ML analysis of a two-level SEM under formulation (1) is to estimate the common parameter vector θ and validate the nonsaturated mean and covariance structures (2) (the null hypothesis) versus the saturated model (the alternative hypothesis) that considers all means, variances and nonduplicated covariances as independent model parameters. Under the null hypothesis, Liang and Bentler (2004) developed an EM algorithm for ML estimation of θ and validated the nonsaturated model.

Under assumptions A1)–A5) on model formulation (1) and considering {ygi, zg: g = 1,…,G} as complete observations with missing values {υg : g = 1,…,G}, Liang and Bentler (2004) obtained the E-step function

M(θ*|θ)=N{log|ΣW(θ*)|+tr[ΣW1(θ*)SW(θ)]}+G{log|Σ˜(θ*)|+tr[Σ˜1(θ*)S˜(θ)]}, (3)

where both θ* and θ are two arbitrarily specified values of the same parameter θ, and

Σ˜(θ*)=(Σ˜B*+μ*μ*μ*μ*1),  S˜(θ)=(S˜B+dddd1),N=g=1GNg,  Σ˜B*=Σ˜B(θ*),  μ*=μ(θ*), (4)

where Σ̃B (θ*) and μ(θ*) are defined in (2) by taking θ = θ*. The simplified formulas for computing the terms SW, d and B in (4) can be found in Liang and Bentler (2004).

THE ALGORITHM ASSOCIATED WITH THE LAGRANGE MULTIPLIER

By resorting to the simple E-step function (3), we can give an algorithm for ML estimation with general constraints for model formulation (1) with mean and covariance structures (2). Assume that there are s general parameter constraints defined by

hi(θ)=0,   i=1,,s,  or   h(θ)=0,   0:s×1, (5)

where h(θ) = (h1(θ),…, hs(θ))′, s < r and r is the dimension of the parameter vector θ (r × 1) in the mean and covariance structures (2). hi(θ) is a scalar function that permits up to the second partial derivatives with regard to θ. The standard steps for applying the Lagrange multiplier method can be summarized as follows (cf. Bertsekas, 1976; Lee & Tsang, 1999).

  • Step 1. Construct the augmented Lagrangian function
    M(θ*|θ)+ξh(θ*)+ci=1sϕ[hi(θ*)], (6)
    where ξ = (ξ(1),…,ξ(s))′ contains s multipliers, c is a positive scalar constant and ϕ(·) is a penalty (positive) function, ϕ(x) = 0 if and only if x = 0, e.g., ϕ(t) = t2/2;
  • Step 2. For the current values of θ = θj, c = cj > 0 and ξ = ξj, search a minimum point of θ* = θj+1, say, such that the function
    Mj(θ*|θj)=M(θ*|θj)+ξjh(θ*)+cji=1sϕ[hi(θ*)], (7)
    is minimized at θ* = θj+1;
  • Step 3. Increase cj to another value cj+1 > cj > 0 (e.g., cj+1 = 1.5cj) and update ξj=(ξj(1),,ξj(s))toξj+1=(ξj+1(1),,ξj+1(s))byξj+1(1)=ξj(1)+cj+1ϕ˙[hi(θj+1)](i=1,,s), where ϕ̇ denotes the derivative of ϕ(·). Update j to j + 1 and go to step 2. The process is terminated at the (j + 1)-th iteration if the maximum absolute difference between θj and θj+1 is less than a pre-assigned small value ε > 0.

The choice for the increasing constant series {cj > 0} in the above Step 3 is somewhat uncertain. It controls the magnitude of penalty from the penalty function ϕ(·) that is related to convergent speed of the algorithm. There is no optional or definite rule for the choice. A rapidly increasing series {cj > 0} will generally result in faster convergence but may also lead to breakdown of the algorithm by generating non-positive definite estimates for the covariance matrices used in the algorithm. A convenient choice for updating the increasing constant series {cj > 0} is to choose cj+1 = γcj with constant γ > 1. The role of {cj > 0} is like that of the step-halving constant in iterative algorithms, which can be chosen as a decreasing positive constant series if some step length is too big and thus makes the algorithm break down, while the purpose of {cj > 0} is to increase the penalty step by step in the iteration to make it converge faster. More discussions on the multiplier method are referred to Bertsekas (1976).

The above iteration process was proved to be convergent for sufficiently large cj under mild conditions (Bertsekas, 1976). The key idea for minimizing Mj(θ*|θj) with respect to θ* at the current value θj is to find an updated value θ* = θj+1 by

θj+1=θj+ρΔθj, (8)

where ρ is the step-halving parameter (Kim & Taylor, 1995) that takes values like 1, 1/2, 1/4, so that

Mj(θj+1|θj)Mj(θj|θj),   j=1,2, (9)

This can be realized by using the EM gradient algorithm (Lange, 1995a, b) through choosing Δθj in (8) along the gradient direction at θ* = θj. This is related to the calculation of the first and the second derivatives of the function Mj(θ*|θj) in (7). The details on the calculation of the increment Δθj in (8) are given as follows. Let

M˙j(θj|θj)=Mj(θ*|θj)θ*|θ*=θj,h˙(θj)=h(θ*)θ*|θ*=θj,M˙(θj|θj)=M(θ*|θj)θ*|θ*=θj,ϕ˙[hi(θj)]=ϕ[hi(θ*)]θ*|θ*=θj. (10)

The following first order derivative was given by Liang and Bentler (2004):

M˙(θ|θ)=NΔW(ΣW1ΣW1)vec(ΣWSW)+GΔ˜(Σ˜1Σ˜1)vec(Σ˜S˜). (11)

where,

ΔW=(vecΣW)θ,   Δ˜=(vecΣ˜)θ,   Σ˜=Σ˜(θ*)|θ*=θ, (12)

with the terms SW, Σ̃ and are the same as in (3). Hence the first derivative of the function in (7) at the current value θ* = θj is calculated by

M˙j(θj|θj)=M˙(θj|θj)+h˙(θj)ξj+cji=1sϕ˙[hi(θj)], (13)

with the terms given in (10) and (11).

Next, we need to compute the expected value of the Hessian matrix (the second derivative) of the function in (7) at the current value θ* = θj. Let

M¨j(θj|θj)=2Mj(θ*|θ)θ*θ*|θ*=θ=θj,h¨i(θj)=2hi(θ)θθ|θ=θj,M¨(θj|θj)=2M(θ*|θ)θ*θ*|θ*=θ=θj,ϕ¨[hi(θj)]=2ϕ[hi(θ)]θθ|θ=θj. (14)

The following approximated expected Hessian matrix was given by Liang and Bentler (2004):

I(θ)=E{2θ*θ*M(θ*|θ)|θ*=θ}NΔW{2(ΣW1SewΣW1)ΣW1ΣW1ΣW1}ΔW+GΔ˜{2(Σ˜1S˜ebΣ˜1)Σ˜1Σ˜1Σ˜1}Δ˜, (15)

where ΔW and Δ̃ are given by (12),

Sew=2ΣB+ΣWg=1GNgN(AgwAgw),Agw=Σ2gΩg1Σ1,  Σ2g=(Σyz,1NgΣg), (16)

with

Ωg=(ΣzzΣzyΣyz1NgΣg),  Σg=ΣW+NgΣB,  Σ1=(Σyz,ΣB), (17)

for g = 1,…,G, and

Seb=(ΣzzAbAbΣB),Ab=(Σzz,Σzy)Ω¯1Σ1,  Ω¯1=1Gg=1GΩg1. (18)

Referring to (14) and (15), we obtain the approximated expected Hessian matrix of the function in (7):

Hjdef=E[M¨j(θj|θj)]=I(θj)+i=1sξj(i)h¨i(θj)+cji=1sϕ¨[hi(θj)], (19)

where (ξj(1),,ξj(s))=ξj contains the current values of the components in the multiplier ξj in (7).

When taking ϕ(t) = t2/2 as a quadratic penalty function in (7), the terms ϕ̇ [hi(θj)] in (10) and ϕ̈ [hi(θj)] in (14) reduce to

ϕ˙[hi(θj)]=hi(θj)hi(θ)θ|θ=θj, (20)

and

ϕ¨[hi(θj)]={hi(θ)2hi(θ)θθ+hi(θ)θhi(θ)θ}|θ=θj, (21)

respectively.

According to the EM gradient algorithm in Lange (1995a, b) and Lee and Tsang (1999), we can summarize the EM algorithm for computing the ML estimation of θ in the mean and covariance structures (2) subject to the general constraints (5) as follows.

  • Step 1. Given a current value of θ = θj, a current value of the multiplier ξj=(ξj(1),,ξj(s)), and a constant cj > 0, construct the Lagrangian function (6);

  • Step 2. Update θj to θj+1 by (8) with
    Δθj=Hj1M˙(θj|θj), (22)
    where (θj|θj) is computed by (13) and Hj by (19);
  • Step 3. Continue to update θj+1 in Step 2 by (8) and (22) and the iteration stops until RMSE(θj+1, θj) < ε (e.g., ε = 10−6), where RMSE stands for the Root Mean Square Error defined by
    RMSE(θj+1,θj)={1rθj+1θj2}1/2, (23)
    and the sign “‖ · ‖” denotes the usual Euclidean distance, r is the dimension of θ (r × 1).

When the model parameters in the mean and covariance structures (2) are subject to linear constraints, the above steps can be further simplified. General linear constraints can be expressed as

h(θ)=Cθa=0, (24)

where C is a known m × r matrix with rank(C) = m < r (θ : r × 1) and a is an m × 1 constant vector. Because the Lagrange multiplier method for ML estimation with linear constraints in the two-level SEM formulation (1) with mean and covariance structures (2) has been coded into EQS Version 6.1 (Bentler, 2006), we do not present the details here and guide readers to the practical application of the method, see the Appendix for a sample EQS program for ML estimation with linear constraints. Note that when the general constraint (5) becomes the linear constraint (24) with ϕ(t) = t2/2 in (7) as in EQS, the first order derivative (13) reduces to

M˙j(θj|θj)=M˙(θj|θj)+Cξi+cjC(Cθja), (25)

where (θj|θj) is given by (11) with θ = θj, and the expected Hessian matrix (19) reduces to

Hj=I(θj)+cjCC, (26)

where I(θj) is given by (15) with θ = θj.

A MONTE CARLO STUDY

In this section, we construct an artificial two-level SEM according to formulation (1) and generate the empirical two-level data from the model with the assumptions A1)–A5) on formulation (1). Then we use the algorithm in Section 3 to estimate the model parameters. The model is illustrated by the path diagram in Figure 1.

FIGURE 1.

FIGURE 1

Path diagram of the model in the simulation study

In the two-level SEM presented by Figure 1, the level-1 model is a factor analysis model with only one within factor FW that generates four indicator variables: Y1, Y2, Y3 and Y4 with independent residual errors. The level-2 model is not a factor analysis model, where the factor is affected by two correlated observable level-2 variables Z1 and Z2 with means μ1 and μ2, respectively. It has only one between factor FB that generates the same four indicator variables Yi (i = 1, 2, 3, 4) with independent residual errors. The path diagram uses the standard conventions in presenting a two-level SEM: a one-way arrow pointing to a variable indicates that the variable is influenced by another variable. A double arrowed curve means a covariance (or correlation, if standardized) between two variables. An arrow pointing to a variable but without starting from another variable indicates an influence by a random residual or error. The four rectangles corresponding to Yi (i = 1, 2, 3, 4) and Zj (j = 1, 2) are the data variables to be analyzed. The notation V999 in Figure 1 is the constant unit vector, and its effects on Z1 and Z2 represent the intercepts of these two variables. The constant factor loadings “1” at the path from FW to Y1 in the within model and at the path from FB to Y1 in the between model in Figure 1 are fixed as a reference for the purpose of model identification.

Now there are six observable variables Yi (i = 1, 2, 3, 4) and Zj (j = 1, 2) but only two free mean parameters μ1 = E(Z1) and μ2 = E(Z2). So we will have a nonsaturated mean structure in the model given by Figure 1. The exact mean and covariance structures corresponding to those given by (2) can be expressed as

μ=(μzμy),μz=(μ1μ2),μy=(μb,ϕ1μb,ϕ2μb,ϕ3μb),μb=E(FB)=ϕ11μ1+ϕ12μ2,ΣW=ΛWΦWΛW+ΨW,ΛW=(1,θ1,θ2,θ3),ΨW=diag(θ5,θ6,θ7,θ8),ΦW=var(FW)=(θ4),Σ˜B=cov(zgυg)=(ΣzzΣzyΣyzΣB),ΣB=ΛBΦBΛB+ΨB,ΛB=(1,ϕ1,ϕ2,ϕ3),ΦB(ϕ)=var(FB),ΨB(ϕ)=diag(ϕ4,ϕ5,ϕ6,ϕ7),Σzz=(ϕ8ϕ9ϕ9ϕ10),Σzy=Σyz=(cov(Zj,Yi)),i=1,2,3,4;j=1,2, (27)

where

var(FB)=υb=ϕ8ϕ112+ϕ10ϕ122+2ϕ9ϕ11ϕ12+ϕ13,cov(Z1,Y1)=υ1=ϕ8ϕ11+ϕ9ϕ12,   cov(Z1,Y2)=ϕ1υ1,cov(Z1,Y3)=ϕ2υ1,   cov(Z1,Y4)=ϕ3υ1,cov(Z2,Y1)=υ2=ϕ9ϕ11+ϕ10ϕ12,   cov(Z2,Y2)=ϕ1υ2,cov(Z2,Y3)=ϕ2υ2,   cov(Z2,Y4)=ϕ3υ2.

The first Monte Carlo experiment is on the performance of the algorithm for nonlinear constraints on the parameters in the model given by Figure 1. These constraints are defined by:

2θ1+θ2θ32=0,ϕ112ϕ12=0,ϕ12+2ϕ2ϕ3=0,θ12ϕ1ϕ2=0. (28)

We use the following sampling designs to generate the two-level data.

  1. Design D1: level 2 sample size G = 60, level-1 sample sizes Ng = 4 for g = 1,…,20; Ng = 6 for g = 21,…,40; Ng = 8 for g = 41,…,60;

  2. Design D2: level-2 sample size G = 60, level-1 sample sizes Ng = 8 for g = 1,…,20; Ng = 12 for g = 21,…,40; Ng = 16 for g = 41,…,60;

  3. Design D3: level-2 sample size G = 120, level-1 sample sizes Ng = 4 for g = 1,…,40; Ng = 6 for g = 41,…,80; Ng = 8 for g = 81,…,120;

  4. Design D4: level-2 sample size G = 120, level-1 sample sizes Ng = 8 for g = 1,…,40; Ng = 12 for g = 41,…,80; Ng = 16 for g = 81,…,120;

  5. Design D5: level-2 sample size G = 240, level-1 sample sizes Ng = 4 for g = 1,…,80; Ng = 6 for g = 81,…,160; Ng = 8 for g = 161,…,240;

  6. Design D6: level-2 sample size G = 240, level-1 sample sizes Ng = 8 for g = 1,…,80; Ng = 12 for g = 81,…,160; Ng = 16 for g = 161,…,240.

For each of the sampling designs, we use formulation (1) with assumptions A1)–A5) to generate the two-level data. The level-1 data for υgi in formulation (1) are generated according to the level-1 model in Figure 1, and the level-2 data for υg and zg in formulation (1) are generated according to the level-2 model in Figure 1. Then the two-level data {ygi : i = 1,…,Ng; g = 1,…,G} are obtained by ygi = υg + υgi as defined by formulation (1). The true values of the model parameters are chosen as in Table 1 and they satisfy the nonlinear constraints (28). The normal samples are generated by the internal normal random number generator in MATLAB Code. In the simulation, the constant sequences ξj and cj [see (7)] in the algorithm for nonlinear constraints are chosen as: the initial values are ξj=(ξ1(1),ξ1(2),ξ1(3),ξ1(4))=(1,1,1,1) (there are four constraints) and c1 = 1 (j = 1), then cj+1 = 1.5cj for j = 1, 2, 3,…, and ξj=(ξj(1),ξj(2),ξj(3),ξj(4)) is updated by

ξj+1(i)=ξj(i)+cj+1ϕ˙[hi(θ(j))]    (i=1,,4)

for iteration j = 1, 2, 3,… with ϕ(t) = t2/2. For each of the six sampling designs, the simulation was carried out with 200 replications. We take the average ML estimation for each parameter obtained from 200 replications and compute the standard deviation (S.D.) to show the accuracy of the estimation. The results are summarized in Table 1.

TABLE 1.

Average Estimates with Nonlinear Constraints

θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8
True Value 1.0000 2.0000 2.0000 0.1600 0.2500 0.2500 0.2500 0.2500
A.E.(D1) 0.9977 1.9819 1.9935 0.1640 0.2466 0.2518 0.2543 0.2454
A.E.(D2) 1.0036 2.0067 2.0033 0.1604 0.2485 0.2479 0.2481 0.2491
A.E.(D3) 0.9991 1.9890 1.9966 0.1601 0.2486 0.2480 0.2535 0.2493
A.E.(D4) 1.0004 2.0123 2.0032 0.1585 0.2494 0.2492 0.2488 0.2494
A.E.(D5) 1.0001 2.0015 2.0003 0.1593 0.2505 0.2507 0.2500 0.2496
A.E.(D6) 1.0006 2.0029 2.0009 0.1594 0.2480 0.2499 0.2508 0.2500









S.D.(D1) 0.0372 0.1687 0.0526 0.0201 0.0182 0.0228 0.0415 0.0348
S.D.(D2) 0.0218 0.0911 0.0273 0.0129 0.0161 0.0155 0.0266 0.0245
S.D.(D3) 0.0174 0.1037 0.0295 0.0139 0.0179 0.0157 0.0296 0.0266
S.D.(D4) 0.0159 0.0662 0.0200 0.0092 0.0117 0.0109 0.0168 0.0141
S.D.(D5) 0.0123 0.0686 0.0192 0.0082 0.0107 0.0106 0.0179 0.0166
S.D.(D6) 0.0135 0.0479 0.0154 0.0066 0.0082 0.0081 0.0124 0.0113
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8
True Value 0.5000 0.5000 1.2500 0.3600 0.3600 0.3600 0.3600 0.4900
A.E.(D1) 0.4899 0.5068 1.2595 0.3621 0.3720 0.3760 0.3429 0.4851
A.E.(D2) 0.5074 0.5002 1.2596 0.3626 0.3500 0.3611 0.3461 0.4788
A.E.(D3) 0.5003 0.4981 1.2475 0.3578 0.3565 0.3753 0.3467 0.4887
A.E.(D4) 0.4982 0.5030 1.2550 0.3488 0.3643 0.3573 0.3676 0.4910
A.E.(D5) 0.5002 0.5001 1.2508 0.3614 0.3589 0.3662 0.3596 0.4885
A.E.(D6) 0.5011 0.5003 1.2522 0.3477 0.3621 0.3520 0.3622 0.4876









S.D.(D1) 0.0765 0.0608 0.1089 0.1119 0.1101 0.0985 0.1295 0.0862
S.D.(D2) 0.0434 0.0314 0.0628 0.0759 0.0766 0.0853 0.0982 0.0867
S.D.(D3) 0.0306 0.0246 0.0521 0.0584 0.0653 0.0551 0.0844 0.0572
S.D.(D4) 0.0273 0.0245 0.0497 0.0537 0.0489 0.0644 0.0741 0.0589
S.D.(D5) 0.0212 0.0167 0.0366 0.0478 0.0421 0.0440 0.0575 0.0460
S.D.(D6) 0.0238 0.0161 0.0374 0.0439 0.0387 0.0397 0.0490 0.0429
ϕ9 ϕ10 ϕ11 ϕ12 ϕ13 μ1 μ2
True Value 0.3000 0.4900 1.0000 0.5000 0.1600 1.0000 1.0000
A.E.(D1) 0.2925 0.4908 0.9965 0.4983 0.1500 1.0030 1.0131
A.E.(D2) 0.2899 0.4751 0.9911 0.4956 0.1470 1.0066 0.9977
A.E.(D3) 0.3028 0.4974 1.0052 0.5026 0.1599 0.9973 0.9910
A.E.(D4) 0.2985 0.4854 0.9918 0.4959 0.1594 1.0115 1.0037
A.E.(D5) 0.2994 0.4918 0.9994 0.4997 0.1563 0.9970 0.9960
A.E.(D6) 0.3020 0.4916 0.9982 0.4991 0.1621 1.0037 1.0008








S.D.(D1) 0.0746 0.0927 0.0698 0.0349 0.0660 0.0851 0.0924
S.D.(D2) 0.0747 0.0894 0.0491 0.0246 0.0629 0.0986 0.0872
S.D.(D3) 0.0509 0.0680 0.0376 0.0188 0.0516 0.0606 0.0562
S.D.(D4) 0.0523 0.0637 0.0369 0.0184 0.0445 0.0575 0.0592
S.D.(D5) 0.0362 0.0443 0.0307 0.0154 0.0345 0.0433 0.0435
S.D.(D6) 0.0383 0.0487 0.0284 0.0142 0.0287 0.0451 0.0502

Note: In Table 1, A.E.(Di)=Average Estimate from 200 simulation replications, i = 1,…,6; S.D.(Di)=Standard Deviation from 200 simulation replications, i = 1,…,6. RMSE in (23) is ≤ 10−4.

The second simulation experiment is to show the performance of the algorithm for linear constraints on the parameters in the model given by Figure 1. The linear constraints are defined by:

θ1+2θ2θ3=2,θ3ϕ3=0.5,θ12ϕ1=0,ϕ1+ϕ2+ϕ3ϕ11=0,θ2+ϕ2=2,ϕ112ϕ12=0.5. (29)

The sampling designs Di (i = 1,…,6) are the same as in Table 1. The results from the EM algorithm for the linear constraint (29) are summarized in Table 2.

TABLE 2.

Average Estimates with Linear Constraints

θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8
True Value 1.0000 1.0000 1.0000 1.0000 0.2500 0.2500 0.2500 0.2500
A.E.(D1) 0.9997 0.9994 0.9986 1.0073 0.2459 0.2496 0.2508 0.2500
A.E.(D2) 1.0002 0.9993 0.9988 1.0008 0.2486 0.2495 0.2496 0.2495
A.E.(D3) 0.9981 0.9996 0.9974 1.0004 0.2495 0.2510 0.2504 0.2479
A.E.(D4) 0.9988 1.0000 0.9988 0.9980 0.2495 0.2490 0.2495 0.2498
A.E.(D5) 0.9978 1.0016 1.0011 1.0017 0.2499 0.2517 0.2515 0.2499
A.E.(D6) 0.9994 1.0003 0.9999 1.0023 0.2503 0.2503 0.2502 0.2491









S.D.(D1) 0.0212 0.0116 0.0179 0.0896 0.0220 0.0271 0.0261 0.0252
S.D.(D2) 0.0150 0.0092 0.0145 0.0598 0.0168 0.0191 0.0209 0.0172
S.D.(D3) 0.0145 0.0071 0.0125 0.0627 0.0184 0.0195 0.0185 0.0181
S.D.(D4) 0.0108 0.0059 0.0093 0.0450 0.0133 0.0121 0.0124 0.0113
S.D.(D5) 0.0102 0.0052 0.0091 0.0406 0.0134 0.0146 0.0142 0.0133
S.D.(D6) 0.0081 0.0042 0.0069 0.0316 0.0100 0.0089 0.0097 0.0095
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7 ϕ8
True Value 0.5000 1.0000 0.5000 0.1600 0.1600 0.1600 0.1600 0.3600
A.E.(D1) 0.4999 1.0006 0.4986 0.1655 0.1651 0.1663 0.1561 0.3567
A.E.(D2) 0.5001 1.0007 0.4988 0.1614 0.1539 0.1621 0.1575 0.3549
A.E.(D3) 0.4990 1.0004 0.4974 0.1621 0.1586 0.1593 0.1573 0.3535
A.E.(D4) 0.4994 1.0000 0.4988 0.1623 0.1610 0.1553 0.1613 0.3575
A.E.(D5) 0.4989 0.9984 0.5011 0.1554 0.1605 0.1625 0.1624 0.3637
A.E.(D6) 0.4997 0.9997 0.4999 0.1603 0.1547 0.1619 0.1606 0.3482









S.D.(D1) 0.0106 0.0116 0.0179 0.0591 0.0481 0.0621 0.0488 0.0634
S.D.(D2) 0.0075 0.0092 0.0145 0.0501 0.0392 0.0495 0.0392 0.0655
S.D.(D3) 0.0073 0.0071 0.0125 0.0369 0.0345 0.0387 0.0324 0.0457
S.D.(D4) 0.0054 0.0059 0.0093 0.0378 0.0313 0.0394 0.0285 0.0417
S.D.(D5) 0.0051 0.0052 0.0091 0.0261 0.0231 0.0271 0.0222 0.0328
S.D.(D6) 0.0041 0.0042 0.0069 0.0266 0.0204 0.0231 0.0193 0.0338
ϕ9 ϕ10 ϕ11 ϕ12 ϕ13 μ1 μ2
True Value 0.2000 0.3600 2.0000 0.7500 0.2500 1.0000 1.0000
A.E.(D1) 0.1951 0.3580 1.9990 0.7495 0.2444 1.0006 1.0114
A.E.(D2) 0.2008 0.3569 1.9997 0.7498 0.2391 1.0057 1.0014
A.E.(D3) 0.1983 0.3589 1.9968 0.7484 0.2466 1.0001 0.9963
A.E.(D4) 0.2021 0.3646 1.9982 0.7491 0.2526 1.0067 1.0103
A.E.(D5) 0.2007 0.3540 1.9983 0.7492 0.2545 1.0000 1.0027
A.E.(D6) 0.1913 0.3539 1.9994 0.7497 0.2487 1.0055 1.0095








S.D.(D1) 0.0553 0.0700 0.0254 0.0127 0.0978 0.0769 0.0801
S.D.(D2) 0.0552 0.0695 0.0180 0.0090 0.0706 0.0790 0.0781
S.D.(D3) 0.0384 0.0484 0.0183 0.0091 0.0611 0.0538 0.0565
S.D.(D4) 0.0384 0.0469 0.0131 0.0065 0.0570 0.0492 0.0490
S.D.(D5) 0.0207 0.0291 0.0129 0.0064 0.0493 0.0380 0.0426
S.D.(D6) 0.0277 0.0372 0.0100 0.0050 0.0373 0.0345 0.0323

Note: In Table 2, A.E.(Di)=Average Estimate from 200 simulation replications, i = 1,…,6; S.D.(Di)=Standard Deviation from 200 simulation replications, i = 1,…,6. RMSE in (23) is ≤ 10−4.

The simulation results in Tables 12 show that the performance of the algorithm for both linear and nonlinear parameter constraints is generally acceptable. By comparing the standard deviation (S.D.) row by row among the six sampling designs Di (i = 1,…,6), we can summarize the following two empirical conclusions from Tables 12:

  1. The algorithm performs better for a larger level-1 sample size than for a smaller level-1 sample size in the sense that the S.D. becomes smaller for a larger Ng with the same G. For example, for all cases of the within parameter estimates, S.D.(D2) <S.D.(D1), S.D.(D4) <S.D.(D3), and S.D.(D6) <S.D.(D5). The between parameter estimates are not generally improved (i.e., with smaller S.D.) when simply increasing the level-1 sample sizes Ng. This verifies the theoretical results in Yuan and Bentler (2006): the level-1 sample size mainly affects the standard errors of within parameter estimates;

  2. The algorithm performs better for a larger level-2 sample size than for a smaller level-2 sample size in the sense that the S.D. becomes smaller for a larger G with the same Ng. For example, for all cases of the within and between parameter estimates, S.D.(D5) <S.D.(D3) <S.D.(D1), and S.D.(D6) <S.D.(D4) <S.D.(D2). Both within and between parameter estimates are generally improved (i.e., with smaller S.D.) when increasing the level-2 sample sizes G. This also verifies the theoretical results in Yuan and Bentler (2006): the level-2 sample size affects the standard errors of both within and between parameter estimates.

PRACTICAL ILLUSTRATION ON LINEAR CONSTRAINTS

In this section we illustrate the algorithm in Section 3 by setting up a two-level SEM with formulation (1) and the mean and covariance structures (2) and running the model with the proposed linear constraints on EQS 6.1 (Bentler, 2006). The real data set is from the National Education Longitudinal Study (NELS: 88) and the full data set is available from the authors upon request. The data set contains measurements of some variables for schools and for 5,198 students nested in 235 schools. There are 21 columns in the data set. We only take the data in columns 7–10 that record the students’ scores from four math tests (level-1 observations, denoted by indicator variables Mi, i = 1, 2, 3, 4); the data in columns 11–14 that record the students’ scores from four science tests (level-1 observations, denoted by indicator variables Si, i = 1, 2, 3, 4); and the data in columns 20–21 that record the school-level (level-2) observations from two level-2 observable variables: Z1=minority and Z2=school type. It is empirically known that the students’ scores are affected by three within-school (level-1) factors: Fw1 =general math ability, Fw2 =general science ability, and Fw3 =general writing ability; and two between-school (level-2) factors: Fb1 =general background and Fb2 =math background. The two-level SEM is set up as:

  1. The within (level-1) model is a factor model: both Fw1 and Fw3 generate (subject to independent measurement errors without further mention as follows) the four math tests. In addition, Fw1 also influences the second science test S2. The factor loadings of M1 on Fw1 and Fw3 are fixed as the constant “1” as reference for model identification. Both Fw2 and Fw3 influence all four science tests Si (i = 1, 2, 3, 4). All three level-1 factors have unknown variances as free parameters, the covariances cov(Fw1, Fw3) and cov(Fw2, Fw3) are free parameters, but cov(Fw1, Fw2) = 0;

  2. The between (level-2) model is not a factor model: Fb1 influences all eight tests Mi and Si (i = 1, 2, 3, 4). Fb2 influences the four math tests Mi (i = 1, 2, 3, 4) and the science test S2. The factor loadings of M1 on Fb1 and Fb2 are fixed as the constant “1” as reference for model identification. In addition, the two factors Fb1 and Fb2 are predicted by the two school-level variables Z1 and Z2 with unknown intercepts and correlated disturbance errors. The additional unknown intercepts are also imposed on M3 and S1, respectively, according to some prior knowledge. The observations from both Z1 and Z2 are obtained subject to correlated random residuals. The following covariances are free parameters: the covariance between the two disturbance errors for Fb1 and Fb2, the covariance between the two residuals for M3 and for Z2, the covariance between the two residuals for Z1 and for Z2.

The above measurement relationships can be visualized in Figure 2, using the same conventions as in Figure 1. Some nonzero covariances in the within and between models as mentioned above are released as free parameters based on some preliminary EQS runs by using the Lagrange Multiplier Test (command LMTEST in EQS) for adding parameters. Note: this test should not be confused with the use of augmented Lagrangians in optimization as discussed above. See for example, Buse (1982) for a discussion of this test, which is asymptotically equivalent to the chi-square difference test and the Wald test.

FIGURE 2.

FIGURE 2

Path diagram of the model for the selected school data

In the model given by Figure 2, there are a total of ten indicator variables Mi and Si (i = 1, 2, 3, 4), and Z1 and Z2. But there are only six free mean parameters: the two means from Z1 and Z2, the two nonzero intercepts from the prediction equations of Fb1 and Fb2, and the the two nonzero intercepts from the prediction equations of M3 and S1. Therefore we have a nonsaturated mean structure for this model. The mean and covariance structures for the model given by Figure 2 can be clearly expressed as some functions of the model parameters (indicated by the star sign “*” in Figure 2) like those in (27). Since some of the latent factors in the model have almost the same sets of indicator variables, equality of factor loadings on a given factor is used to help differentiate or identify the various factors. For example, the four variables loading on Fw1 are constrained to have equal loadings. These considerations, along with some preliminary EQS runs, yield 10 equality constraints overall. Thus we set up the model of Figure 2 with EQS incorporating these 10 restrictions, which are shown in the Appendix in the sections labeled /CONSTRAINTS. The EQS output provides information on the statistical adequacy of these constraints as follows.

As expected, the p-values in Table 3 verify that all chi-square tests for the equality constraints are insignificant at the usual statistical levels (1%, 5% and 10%). So all null hypotheses of equality constraints should not be rejected, i.e., they are suitable for the data. The ML estimates with the above ten constraints are provided by the following EQS output for the measurement equations, where the number with the star sign “*” is an estimate for a factor loading parameter, the number with “*” before the variable V999 means that it is an estimate of an intercept or a mean parameter, F1=Fw1, F2=Fw2, and F3=Fw3 in the within model; F1=Fb1, and F2=Fb2 in the between model.

  • Measurement equations for the within model:
    M1=V7=1.000F1+1.000F3+1.000E1M2=V8=1.039*F1+1.047*F3+1.000E2M3=V9=1.039*F1+.682*F3+1.000E3M4=V10=1.039*F1+1.047*F3+1.000E4S1=V11=1.000F2+.036*F3+1.000E5S2=V12=1.039*F1+.699*F2+.180*F3+1.000E6S3=V13=.699*F2+.180*F3+1.000E7S4=V14=.699*F2+.180*F3+1.000E8 (30)
  • Measurement equations for the between model:
    M1=V7=1.000F1+1.000F2+1.000E1M2=V8=1.070*F1+1.276*F2+1.000E2M3=V9=.596*F1+.454*F2+.655*V999+1.000E3M4=V10=1.123*F1+.498*F2+1.000E4S1=V11=.961*F1+.324*V999+1.000E5S2=V12=.855*F1+.332*F2+1.000E6S3=V13=.855*F1+1.000E7S4=V14=.855*F1+.1.000E8Z1=V20=4.630*V999+1.000E9Z2=V21=1.170*V999+1.000E10F1=.159*V20+0.19*V21+3.373*V999+1.000D1F2=.061*V20.048*V21.107*V999+1.000D2 (31)

TABLE 3.

LMTEST for the Equality Constraints

CONSTRAINTS FROM GROUP 1 (within model)
CONSTRAINT # CONSTRAINT χ2(1)-statistic p-Value
CONSTR: 1 (M2,F1)−(M3,F1)=0; .001 .979
CONSTR: 2 (M2,F1)−(M4,F1)=0; .000 .985
CONSTR: 3 (M2,F1)−(S2,F1)=0; .288 .592
CONSTR: 4 (M2,F3)−(M4,F3)=0; .024 .876
CONSTR: 5 (S2,F3)−(S3,F3)=0; .033 .857
CONSTR: 6 (S2,F3)−(S4,F3)=0; .033 .857
CONSTR: 7 (S2,F2)−(S3,F2)=0; .001 .972
CONSTR: 8 (S2,F2)−(S4,F2)=0; .003 .956
CONSTRAINTS FROM GROUP 2 (between model)
CONSTR: 9 (S2,F1)−(S3,F1)=0; .849 .357
CONSTR: 10 (S2,F1)−(S4,F1)=0; .849 .357

Note: In Table 3, F1=Fw1, F2=Fw2, and F3=Fw3 in the within model; F1=Fb1, and F2=Fb2 in the between model. EQS notation such as (M2,F1) denotes the factor loading parameter of the second math test M2 on the first within-level factor F1=Fw1=general math ability.

As can be seen above, the equality constraints imposed on the model were implemented during estimation, that is, the relevant parameter estimates are in fact equal. For example, the value 1.039 describes four of the within factor loadings (on F1, or Fw1 in the diagram). The ML estimates for the variance-covariance parameters in the within and between models are summarized in Table 4.

TABLE 4.

ML Estimates for the Variance-Covariance Parameters

Within Model Variances-Covariances Parameter Estimates
E1 E2 E3 E4 E5 E6 E7 E8 F1 F2 F3
Variance .344 .257 1.328 2.463 .605 .629 .607 2.403 .153 .495 .887
Covariance cov(F1,F3)=−.146 cov(F2,F3)=.464

Between Model Variances-Covariances Parameter Estimates
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 D1 D2
Variance .016 .004 .033 .037 .017 .028 .009 .025 4.437 .142 .114 .023
Covariance cov(D1,D2)=−.005 cov(E3,E10)=−.024 cov(E9,E10)=−.365

Note: In Table 4, the residuals E1, E2, …, E8, and the three level-1 factors F1, F2, and F3 in the within model are the same as those in the measurement equations given by (30); the residuals E1, E2, …, E10, and the two disturbances D1 and D2 in the between model are the same as those in the measurement equations given by (31). All nonzero covariances are statistically significant at level 5% based on the individual t-test provided by EQS.

In addition to the ML parameter estimates subject to the linear constraints in Table 3, EQS provides additional information on the model in Figure 2. An important statistic is the model chi-square=53.8 with 48 degrees of freedom and a p-value=.261. This implies that the model is suitable for the selected data. The model contains the nonzero covariances presented in Table 4, all of which are statistically significant at level 5%. The LMTEST furthermore showed that all remaining covariances between any two residuals are statistically insignificant at level 5%. This implies that it is not necessary to release more covariance parameters. Finally, the overall necessity for using a two-level model rather than a one-level model can be evaluated by the model-based intraclass correlations. In this example, variables M1–M4 and S1–S4 respectively have .200, .224, .067, .092, .167, .150, .150, and .064 proportions of the total (within plus between) variance. The between-school effects on individual performance clearly should not be ignored. The above interpretation on the output may not be easily understandable for the readers who are not familiar with structural equation models. A good reference is Bentler (2006).

CONCLUDING REMARKS

The algorithm in this paper is a mixture of the EM approach (Dempster, Laird, & Rubin, 1977) and the Lagrange multiplier method. There is no guarantee for the final solution to be the global maximum point for the likelihood function under the parameter constraints. A theoretical discussion on the global convergence of this kind of mixed algorithm is beyond the scope of this paper. We are also unable to provide a comparison between our algorithm and some possible alternative ones, as we have not found parallel algorithms for handling the same model in the literature. However, the simulation experiments and the practical illustration provide some evidence that the current approach is feasible. Since complicated functional constraints with mean and covariance structures are not usually proposed in two-level statistical models, our evaluation was limited to simulation results. Linear constraints, especially equality constraints, are much more frequent in application, and we were able to illustrate them with the real school data set using the publicly available EQS program (Bentler, 2006). As a result, anyone with a working knowledge of EQS can easily run the program given in the Appendix, or, by referring to the EQS manual (Bentler, 2006), they can write their own setups for analysis of two-level SEM with both mean and covariance structures subject to general linear constraints on parameters.

This paper is focused on developing an algorithm for estimating parameters subject to general constraints in more complicated two-level SEM. Asymptotic properties such as the asymptotic normality and standard errors of the parameter estimates, and goodness-of fit test for justifying the imposed constraints can also be developed. Due to limited space of this paper, we will guide interested readers to our further research.

ACKNOWLEDGMENTS

The research was supported by National Institute on Drug Abuse grant DA01070, a grant from the Hong Kong Baptist University (project number FRG/07-08/II-35), and University of New Haven 2009 and 2010 Summer Research Grants.

APPENDIX

EQS Input Program for the model in Figure 2

/TITLE

Two-level analysis for the school data

(Two factors in the between model, three factors in the within model, 8 y-variables)

WITHIN MODEL FIRST

/SPECIFICATION

data=’school.dat’; case =5198; variable=21; method=ml;

matrix=raw; GROUP=2; analysis=covariance; MULTILEVEL=ML; CLUSTER= V19;

/LABELS

V7=M1; V8=M2; V9=M3; V10=M4; V11=S1; V12=S2; V13=S3; V14=S4;

V19=SCHOOL; V20=Z1; V21=Z2;

! F1=Fw1=math ability factor; F2=Fw2=science ability factor;

! F3=Fw3=general writing ability factor

/EQUATIONS

M1=1F1+1F3+E1; M2=*F1+*F3+E2; M3=*F1+*F3+E3; M4=*F1+*F3+E4;

S1=1F2+*F3+E5; S2=*F1+*F2+*F3+E6; S3=*F2+*F3+E7; S4=*F2+*F3+E8;

/VARIANCES

E1–E8=*; F1=*; F2=*; F3=*;

/COVARIANCES

F1,F2=0; F2,F3=0*; F1,F3=0*;

/CONSTRAINTS

(M2,F1)=(M3,F1)=(M4,F1)=(S2,F1); (M2,F3)=(M4,F3);

(S2,F3)=(S3,F3)=(S4,F3); (S2,F2)=(S3,F2)=(S4,F2);

/END

/TITLE

BETWEEN MODEL

/LABELS

V7=M1; V8=M2; V9=M3; V10=M4; V11=S1; V12=S2; V13=S3; V14=S4;

V19=SCHOOL; V20=Z1; V21=Z2;

! F1=Fb1==general background factor; F2=Fb2=math background factor;

/EQUATIONS

M1=1F1+1F2+E1; M2=*F1+*F2+E2; M3=*V999+*F1+*F2+E3; M4=*F1+*F2+E4;

S1=*V999+*F1+E5; S2=*F1+*F2+E6; S3=*F1+E7; S4=*F1+E8;

F1=*V999+*Z1+*Z2+D1; F2=*V999+*Z1+*Z2+D2; Z1=*V999+E9; Z2=*V999+E10;

/VARIANCES

E1–E10=0*; D1–D2=*;

/COVARIANCES

D1,D2=0*; E3,E10=0*; E9,E10=0*;

/CONSTRAINTS

(S2,F1)=(S3,F1)=(S4,F1);

/tech

itr=200; con=.000001;

/LMTEST

set=pee;

/END

Contributor Information

Peter M. Bentler, University of California, Los Angeles

Jiajuan Liang, University of New Haven.

Man-Lai Tang, Hong Kong Baptist University.

Ke-Hai Yuan, University of Notre Dame.

REFERENCES

  1. Aitchison J, Silvey SD. Maximum likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics. 1958;29:813–828. [Google Scholar]
  2. Bentler PM. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc.; 2006. ISBN: 1-885889-03-7 ( www.mvsoft.com). [Google Scholar]
  3. Bertsekas DP. Multiplier method: A survey. Automatica. 1976;12:133–145. [Google Scholar]
  4. Buse A. The likelihood ratio, Wald and Lagrange multiplier tests: An expository note. American Statistician. 1982;36:153–157. [Google Scholar]
  5. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm (with discussion) Journal of the Royal Statistical Society (Series B) 1977;39:1–38. [Google Scholar]
  6. du Toit S, du Toit M. Multilevel structural equation modeling. In: de Leeuw J, Meijer E, editors. Handbook of Multilevel Analysis (Chapter 12) New York: Springer Verlag; 2008. [Google Scholar]
  7. Jamshidian M. On algorithms for restricted maximum likelihood estimation. Computational Statistics & Data Analysis. 2004;45:137–157. [Google Scholar]
  8. Kim DK, Taylor JMG. The restricted EM algorithm for maximum likelihood estimation under linear restrictions on parameters. Journal of the American Statistical Association. 1995;90:708–716. [Google Scholar]
  9. Lange K. A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society (Series B) 1995a;57:425–437. [Google Scholar]
  10. Lange K. A Quasi-Newton acceleration of the EM algorithm. Statistica Sinica. 1995b;5:1–8. [Google Scholar]
  11. Lee SY. Constrained estimation in covariance structure analysis. Biometrika. 1979;66:539–545. [Google Scholar]
  12. Lee SY, Poon WY. Analysis of two-level structural equation models via EM type algorithms. Statistica Sinica. 1998;8:749–766. [Google Scholar]
  13. Lee SY, Tsang SY. Constrained maximum likelihood estimation of two-level covariance structure model via EM type algorithms. Psychometrika. 1999;64:435–450. [Google Scholar]
  14. Liang J, Bentler PM. A new EM algorithm for fitting two-level structural equation models. Psychometrika. 2004;69:101–122. [Google Scholar]
  15. McDonald RP, Goldstein H. Balanced versus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical & Statistical Psychology. 1989;42:215–232. [Google Scholar]
  16. McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. New York: Wiley; 1997. [Google Scholar]
  17. Muthén BO. Multilevel covariance structure analysis. Sociological Methods & Research. 1994;22:376–398. [Google Scholar]
  18. Muthén LK, Muthén BO. Mplus Version 3 User’s Guide. Los Angeles: Muthén & Muthén; 2004. [Google Scholar]
  19. Raudenbush SW. Maximum likelihood estimation for unbalanced multilevel covariance structure models via the EM algorithm. British Journal of Mathematical & Statistical Psychology. 1995;48:359–370. [Google Scholar]
  20. Yuan K-H, Bentler PM. Asymptotic robustness of standard errors in multilevel structural equation models. Journal of Multivariate Analysis. 2006;97:1121–1141. [Google Scholar]
  21. Yung YF, Bentler PM. On added information for ML factor analysis with mean and covariance structures. Journal of Educational & Behavioral Statistics. 1999;24:1–20. [Google Scholar]

RESOURCES