Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models

Peter M Bentler; Jiajuan Liang; Man-Lai Tang; Ke-Hai Yuan

doi:10.1177/0013164410381272

. Author manuscript; available in PMC: 2011 May 2.

Published in final edited form as: Educ Psychol Meas. 2011 Mar 22;71(2):325–345. doi: 10.1177/0013164410381272

Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models²

Peter M Bentler ¹, Jiajuan Liang ², Man-Lai Tang ³, Ke-Hai Yuan ⁴

PMCID: PMC3085489 NIHMSID: NIHMS284547 PMID: 21544234

Abstract

Maximum likelihood is commonly used for estimation of model parameters in analysis of two-level structural equation models. Constraints on model parameters could be encountered in some situations such as equal factor loadings for different factors. Linear constraints are the most common ones and they are relatively easy to handle in maximum likelihood analysis. Nonlinear constraints could be encountered in complicated applications. In this paper we develop an EM-type algorithm for estimating model parameters with both linear and nonlinear constraints. The empirical performance of the algorithm is demonstrated by a Monte Carlo study. Application of the algorithm for linear constraints is illustrated by setting up a two-level mean and covariance structure model for a real two-level data set and running an EQS program.

Keywords: EM algorithm, maximum likelihood estimation, mean and covariance structure, linear and nonlinear constraints, two-level structural equation model

Two-level mean and covariance structure models (or structural equation models, SEM for simplicity) have been applied to data analysis in various fields such as education, medicine, psychology and sociology. In many practical situations, the hierarchical structure that is implicit in the collected data should not be ignored as various potentially important sources of variance can be identified. For example, SAT (Scholastic Attitude Test) scores from students nested in different high schools not only can be expected to reveal individual student differences, but also to reflect differences among the implicit variable “schools”. Schools can be expected to vary in the extent of student preparation, their socioeconomic environments, their teaching facilities, and their teacher training. Hence, the SAT scores will reflect school differences as well as individual differences, and as a result of the dependence introduced by school differences, the SAT scores are best not considered as independent observations. In this case, variation is at two levels, with the students called the level-1 units (or individuals) and the schools called the level-2 units (groups or clusters). Furthermore, when considering latent sources of variation giving rise to the SAT scores, it is probable that the influences at the two levels are different. For instance, factors such as “general math ability” and “general writing ability” may explain performance at level one, while factors such as “school quality” or “teacher background” may explain school differences at level two. It is now widely recognized that to understand how the level-1 factors influence one another and their effects on the data, a level-1 model must be used to analyze the within (level-1) covariance structure; such a model may include the factor loadings, variances and covariances of factors and prediction errors. Similarly, to understand how the level-2 factors influence one another and their effects on the data, a level-2 model has to be set up to analyze the “between” (level-2) covariance structure. By observing the two different types of effects on two-level data, many researchers have proposed various formulations for two-level covariance structure models, see, for example, McDonald and Goldstein (1989), Muthén (1994), Lee and Poon (1998), Liang and Bentler (2004).

When a linear two-level SEM has been set up for analysis of the within and between covariance structures, the first problem to solve is estimation of model parameters. There are two basic methods for parameter estimation for two-level SEM in the literature: the maximum likelihood (ML) method and the generalized least squares (GLS) method. The ML method is closely related to the normal distributional assumption on the model and it is the most popular one in analysis of two-level SEM because of its easy implementation and some nice properties. Different ML methods have been proposed for analysis of two-level SEM and they are available in many standard statistical packages such as EQS (Bentler, 2006), LISREL (cf. du Toit & du Toit, 2008), and Mplus (Muthén & Muthén, 2004). Unfortunately, many existing algorithms for ML estimation do not allow general parameter constraints although they may allow simple linear constraints on parameters. Parameter constraints in ML analysis of statistical models are usually imposed as a result of substantive prior knowledge on the relationships among model parameters. For example, in a factor analysis model, if a factor is measured by indicator variables and prior knowledge implies that they are equally important, an equality constraint on the factor loadings may be imposed when implementing the ML estimation. Linear constraints are the simplest ones in ML analysis of statistical models with parameter constraints, and simple analytical solutions are usually available, see, for example, Kim and Taylor (1995), and Jamshidian (2004). Nonlinear functional constraints on parameters may appear in some complicated problems and this type of constraints has been less thoroughly studied. A detailed discussion on the necessity of constrained ML estimation can be dated back to Aitchison and Silvey (1958). Lee (1979) proposed an algorithm for both ML and weighted least squares estimation with general (linear and nonlinear) constraints for conventional (one-level) SEM. Lee and Tsang (1999) developed EM type algorithms for constrained ML estimation of two-level SEM with only covariance structures.

The importance of consideration of a nonsaturated mean structure in conventional SEM has been discussed by Yung and Bentler (1999). Liang and Bentler (2004) proposed a new formulation for two-level SEM and developed an EM algorithm that allows both mean and covariance structures in ML estimation without constraints. The study in this paper is a generalization of that in Liang and Bentler (2004) and an extension of Lee and Tsang (1999). The paper is organized as follows. Section 2 gives a simple review on the model in Liang and Bentler (2004). Section 3 presents the details of the algorithm associated with the Lagrange multiplier. Section 4 provides a limited Monte Carlo study on the performance of the proposed algorithm by using an artificial two-level SEM. Section 5 gives an illustration of the proposed algorithm with linear constraints by implementing a two-level SEM on EQS for a practical data set. Some concluding remarks are given in the last section.

A REVIEW ON THE EXISTING MODEL

Let {z_g : g = 1,…,G} be a set of level-2 observations from G level-2 units such as financial sources for schools, and {y_gi : i = 1,…,N_g; g = 1,…,G} a set of level-1 observations. For example, y_gi may stand for the observation from the i-th level-1 unit (e.g., student) nested in the g-th level-2 unit (e.g., schools). Liang and Bentler (2004) proposed the following data formulation of two-level SEM

(\begin{matrix} z_{g} \\ y_{gi} \end{matrix}) = (\begin{matrix} z_{g} \\ υ_{g} \end{matrix}) + (\begin{matrix} 0 \\ υ_{gi} \end{matrix})

(1)

associated with the assumptions:

A1)
υ_gi contains latent variables capturing level-1 effects. {υ_gi : i = 1,…,N_g} (for each fixed g) are are i.i.d. (independently identically distributed) and υ_gi ~ N_p(0, Σ_W), the p-dimensional normal distribution with Σ_W > 0 (positive definite) for g = 1,…,G;
A2)
υ_g contains latent variables capturing level-2 effects. {υ_g : g = 1,…,G} are i.i.d. and υ_g ~ N_p(0, Σ_B) with Σ_B > 0;
A3)
{z_g : g = 1,…,G} are i.i.d. level-2 observations and z_g ~ N_q(μ_z, Σ_zz) with Σ_zz > 0;
A4)
the random vector $(z_{g}^{'}, υ_{g}^{'})'$ ((p + q) × 1) has a joint nonsingular multivariate normal distribution N_p+q(μ, Σ̃_B) with Σ̃_B > 0 and
$μ (θ) = (\begin{matrix} μ_{z} \\ μ_{y} \end{matrix}), {\tilde{Σ}}_{B} (θ) = cov (\begin{matrix} z_{g} \\ υ_{g} \end{matrix}) = (\begin{matrix} Σ_{zz} & Σ_{zy} \\ Σ_{yz} & Σ_{B} \end{matrix}),$ (2)
where $Σ_{zy} = Σ_{yz}^{'} = cov (z_{g}, υ_{g})$ ;
A5)
{z_g, υ_g} is uncorrelated with {υ_gi : i = 1,…,N_g}.

The mean and covariance structures in (2) may be characterized by a common model parameter vector θ (r × 1) with r functionally independent model parameters. Nonsaturated mean structure in (2) implies that the means μ_z and μ_y may be also characterized by the common model parameter vector θ. Existing algorithms for ML analysis of two-level SEM usually treat the means from manifest variables as individual parameters separated from the covariance parameters, see, for example, McDonald and Goldstein (1989), Muthén (1994), Raudenbush (1995), and du Toit and du Toit (2008). When the number of functionally independent model parameters r in θ in (2) is less than the total number of means from all manifest (observable) variables in z_g and y_gi in formulation (1) plus all variances and nonduplicated co-variances, a model with formulation (1) is called a nonsaturated model. The purpose of ML analysis of a two-level SEM under formulation (1) is to estimate the common parameter vector θ and validate the nonsaturated mean and covariance structures (2) (the null hypothesis) versus the saturated model (the alternative hypothesis) that considers all means, variances and nonduplicated covariances as independent model parameters. Under the null hypothesis, Liang and Bentler (2004) developed an EM algorithm for ML estimation of θ and validated the nonsaturated model.

Under assumptions A1)–A5) on model formulation (1) and considering {y_gi, z_g: g = 1,…,G} as complete observations with missing values {υ_g : g = 1,…,G}, Liang and Bentler (2004) obtained the E-step function

M (θ^{*} | θ) = N {log | Σ_{W} (θ^{*}) | + tr [Σ_{W}^{- 1} (θ^{*}) S_{W} (θ)]} + G {log | \tilde{Σ} (θ^{*}) | + tr [{\tilde{Σ}}^{- 1} (θ^{*}) \tilde{S} (θ)]},

(3)

where both θ* and θ are two arbitrarily specified values of the same parameter θ, and

\begin{matrix} \tilde{Σ} (θ^{*}) = (\begin{matrix} {\tilde{Σ}}_{B}^{*} + μ^{*} μ^{*}' & μ^{*} \\ μ^{*}' & 1 \end{matrix}), \tilde{S} (θ) = (\begin{matrix} {\tilde{S}}_{B} + dd' & d \\ d' & 1 \end{matrix}), \\ N = \sum_{g = 1}^{G} N_{g}, {\tilde{Σ}}_{B}^{*} = {\tilde{Σ}}_{B} (θ^{*}), μ^{*} = μ (θ^{*}), \end{matrix}

(4)

where Σ̃_B (θ*) and μ(θ*) are defined in (2) by taking θ = θ*. The simplified formulas for computing the terms S_W, d and S̃_B in (4) can be found in Liang and Bentler (2004).

THE ALGORITHM ASSOCIATED WITH THE LAGRANGE MULTIPLIER

By resorting to the simple E-step function (3), we can give an algorithm for ML estimation with general constraints for model formulation (1) with mean and covariance structures (2). Assume that there are s general parameter constraints defined by

h_{i} (θ) = 0, i = 1, \dots, s, or h (θ) = 0, 0 : s \times 1,

(5)

where h(θ) = (h₁(θ),…, h_s(θ))′, s < r and r is the dimension of the parameter vector θ (r × 1) in the mean and covariance structures (2). h_i(θ) is a scalar function that permits up to the second partial derivatives with regard to θ. The standard steps for applying the Lagrange multiplier method can be summarized as follows (cf. Bertsekas, 1976; Lee & Tsang, 1999).

Step 1. Construct the augmented Lagrangian function
$M (θ^{*} | θ) + ξ' h (θ^{*}) + c \sum_{i = 1}^{s} ϕ [h_{i} (θ^{*})],$ (6)
where ξ = (ξ⁽¹⁾,…,ξ^(s))′ contains s multipliers, c is a positive scalar constant and ϕ(·) is a penalty (positive) function, ϕ(x) = 0 if and only if x = 0, e.g., ϕ(t) = t²/2;
Step 2. For the current values of θ = θ_j, c = c_j > 0 and ξ = ξ_j, search a minimum point of θ* = θ_j+1, say, such that the function
$M_{j} (θ^{*} | θ_{j}) = M (θ^{*} | θ_{j}) + ξ_{j}^{'} h (θ^{*}) + c_{j} \sum_{i = 1}^{s} ϕ [h_{i} (θ^{*})],$ (7)
is minimized at θ* = θ_j+1;
Step 3. Increase c_j to another value c_j+1 > c_j > 0 (e.g., c_j+1 = 1.5c_j) and update $ξ_{j} = (ξ_{j}^{(1)}, \dots, ξ_{j}^{(s)})' to ξ_{j + 1} = (ξ_{j + 1}^{(1)}, \dots, ξ_{j + 1}^{(s)})' by ξ_{j + 1}^{(1)} = ξ_{j}^{(1)} + c_{j + 1} \dot{ϕ} [h_{i} (θ_{j + 1})] (i = 1, \dots, s)$ , where ϕ̇ denotes the derivative of ϕ(·). Update j to j + 1 and go to step 2. The process is terminated at the (j + 1)-th iteration if the maximum absolute difference between θ_j and θ_j+1 is less than a pre-assigned small value ε > 0.

The choice for the increasing constant series {c_j > 0} in the above Step 3 is somewhat uncertain. It controls the magnitude of penalty from the penalty function ϕ(·) that is related to convergent speed of the algorithm. There is no optional or definite rule for the choice. A rapidly increasing series {c_j > 0} will generally result in faster convergence but may also lead to breakdown of the algorithm by generating non-positive definite estimates for the covariance matrices used in the algorithm. A convenient choice for updating the increasing constant series {c_j > 0} is to choose c_j+1 = γc_j with constant γ > 1. The role of {c_j > 0} is like that of the step-halving constant in iterative algorithms, which can be chosen as a decreasing positive constant series if some step length is too big and thus makes the algorithm break down, while the purpose of {c_j > 0} is to increase the penalty step by step in the iteration to make it converge faster. More discussions on the multiplier method are referred to Bertsekas (1976).

The above iteration process was proved to be convergent for sufficiently large c_j under mild conditions (Bertsekas, 1976). The key idea for minimizing M_j(θ*|θ_j) with respect to θ* at the current value θ_j is to find an updated value θ* = θ_j+1 by

θ_{j + 1} = θ_{j} + ρ Δ θ_{j},

(8)

where ρ is the step-halving parameter (Kim & Taylor, 1995) that takes values like 1, 1/2, 1/4, so that

M_{j} (θ_{j + 1} | θ_{j}) \leq M_{j} (θ_{j} | θ_{j}), j = 1, 2, \dots

(9)

This can be realized by using the EM gradient algorithm (Lange, 1995a, b) through choosing Δθ_j in (8) along the gradient direction at θ* = θ_j. This is related to the calculation of the first and the second derivatives of the function M_j(θ*|θ_j) in (7). The details on the calculation of the increment Δθ_j in (8) are given as follows. Let

\begin{matrix} {\dot{M}}_{j} (θ_{j} | θ_{j}) = \frac{\partial M_{j} (θ^{*} | θ_{j})}{\partial θ^{*}} |_{θ^{*} = θ_{j}}, & \dot{h} (θ_{j}) = \frac{\partial h' (θ^{*})}{\partial θ^{*}} |_{θ^{*} = θ_{j}}, \\ \dot{M} (θ_{j} | θ_{j}) = \frac{\partial M (θ^{*} | θ_{j})}{\partial θ^{*}} |_{θ^{*} = θ_{j}}, & \dot{ϕ} [h_{i} (θ_{j})] = \frac{\partial ϕ [h_{i} (θ^{*})]}{\partial θ^{*}} |_{θ^{*} = θ_{j}} . \end{matrix}

(10)

The following first order derivative was given by Liang and Bentler (2004):

\dot{M} (θ | θ) = N Δ_{W} (Σ_{W}^{- 1} \otimes Σ_{W}^{- 1}) vec (Σ_{W} - S_{W}) + G \tilde{Δ} ({\tilde{Σ}}^{- 1} \otimes {\tilde{Σ}}^{- 1}) vec (\tilde{Σ} - \tilde{S}) .

(11)

where,

Δ_{W} = \frac{\partial (vec Σ_{W})'}{\partial θ}, \tilde{Δ} = \frac{\partial (vec \tilde{Σ})'}{\partial θ}, \tilde{Σ} = \tilde{Σ} (θ^{*}) |_{θ^{*} = θ},

(12)

with the terms S_W, Σ̃ and S̃ are the same as in (3). Hence the first derivative of the function in (7) at the current value θ* = θ_j is calculated by

{\dot{M}}_{j} (θ_{j} | θ_{j}) = \dot{M} (θ_{j} | θ_{j}) + \dot{h} (θ_{j}) ξ_{j} + c_{j} \sum_{i = 1}^{s} \dot{ϕ} [h_{i} (θ_{j})],

(13)

with the terms given in (10) and (11).

Next, we need to compute the expected value of the Hessian matrix (the second derivative) of the function in (7) at the current value θ* = θ_j. Let

\begin{matrix} {\ddot{M}}_{j} (θ_{j} | θ_{j}) = \frac{\partial^{2} M_{j} (θ^{*} | θ)}{\partial θ^{*} \partial θ^{*}'} |_{θ^{*} = θ = θ_{j}}, & {\ddot{h}}_{i} (θ_{j}) = \frac{\partial^{2} h_{i} (θ)}{\partial θ \partial θ'} |_{θ = θ_{j}}, \\ \ddot{M} (θ_{j} | θ_{j}) = \frac{\partial^{2} M (θ^{*} | θ)}{\partial θ^{*} \partial θ^{*}'} |_{θ^{*} = θ = θ_{j}}, & \ddot{ϕ} [h_{i} (θ_{j})] = \frac{\partial^{2} ϕ [h_{i} (θ)]}{\partial θ \partial θ'} |_{θ = θ_{j}} . \end{matrix}

(14)

The following approximated expected Hessian matrix was given by Liang and Bentler (2004):

I (θ) = E {\frac{\partial^{2}}{\partial θ^{*} \partial θ^{*}'} M (θ^{*} | θ) |_{θ^{*} = θ}} \approx N Δ_{W} {2 (Σ_{W}^{- 1} S_{ew} Σ_{W}^{- 1}) \otimes Σ_{W}^{- 1} - Σ_{W}^{- 1} \otimes Σ_{W}^{- 1}} Δ_{W}^{'} + G \tilde{Δ} {2 ({\tilde{Σ}}^{- 1} {\tilde{S}}_{eb} {\tilde{Σ}}^{- 1}) \otimes {\tilde{Σ}}^{- 1} - {\tilde{Σ}}^{- 1} \otimes {\tilde{Σ}}^{- 1}} \tilde{Δ}',

(15)

where Δ_W and Δ̃ are given by (12),

\begin{matrix} S_{ew} = 2 Σ_{B} + Σ_{W} - \sum_{g = 1}^{G} \frac{N_{g}}{N} (A_{gw} - A_{gw}^{'}), \\ A_{gw} = Σ_{2 g} Ω_{g}^{- 1} Σ_{1}^{'}, Σ_{2 g} = (Σ_{yz}, \frac{1}{N_{g}} Σ_{g}), \end{matrix}

(16)

with

Ω_{g} = (\begin{matrix} Σ_{zz} & Σ_{zy} \\ Σ_{yz} & \frac{1}{N_{g}} Σ_{g} \end{matrix}), Σ_{g} = Σ_{W} + N_{g} Σ_{B}, Σ_{1} = (Σ_{yz}, Σ_{B}),

(17)

for g = 1,…,G, and

S_{eb} = (\begin{matrix} Σ_{zz} & A_{b} \\ A_{b}^{'} & Σ_{B} \end{matrix}), A_{b} = (Σ_{zz}, Σ_{zy}) {\bar{Ω}}^{- 1} Σ_{1}^{'}, {\bar{Ω}}^{- 1} = \frac{1}{G} \sum_{g = 1}^{G} Ω_{g}^{- 1} .

(18)

Referring to (14) and (15), we obtain the approximated expected Hessian matrix of the function in (7):

H_{j} \underset{=}{def} E [{\ddot{M}}_{j} (θ_{j} | θ_{j})] = I (θ_{j}) + \sum_{i = 1}^{s} ξ_{j}^{(i)} {\ddot{h}}_{i} (θ_{j}) + c_{j} \sum_{i = 1}^{s} \ddot{ϕ} [h_{i} (θ_{j})],

(19)

where $(ξ_{j}^{(1)}, \dots, ξ_{j}^{(s)})' = ξ_{j}$ contains the current values of the components in the multiplier ξ_j in (7).

When taking ϕ(t) = t²/2 as a quadratic penalty function in (7), the terms ϕ̇ [h_i(θ_j)] in (10) and ϕ̈ [h_i(θ_j)] in (14) reduce to

\dot{ϕ} [h_{i} (θ_{j})] = h_{i} (θ_{j}) \frac{\partial h_{i} (θ)}{\partial θ} |_{θ = θ_{j}},

(20)

and

\ddot{ϕ} [h_{i} (θ_{j})] = {h_{i} (θ) \frac{\partial^{2} h_{i} (θ)}{\partial θ \partial θ'} + \frac{\partial h_{i} (θ)}{\partial θ} \frac{\partial h_{i} (θ)}{\partial θ'}} |_{θ = θ_{j}},

(21)

respectively.

According to the EM gradient algorithm in Lange (1995a, b) and Lee and Tsang (1999), we can summarize the EM algorithm for computing the ML estimation of θ in the mean and covariance structures (2) subject to the general constraints (5) as follows.

Step 1. Given a current value of θ = θ_j, a current value of the multiplier $ξ_{j} = (ξ_{j}^{(1)}, \dots, ξ_{j}^{(s)})'$ , and a constant c_j > 0, construct the Lagrangian function (6);
Step 2. Update θ_j to θ_j+1 by (8) with
$Δ θ_{j} = - H_{j}^{- 1} \dot{M} (θ_{j} | θ_{j}),$ (22)
where Ṁ (θ_j|θ_j) is computed by (13) and H_j by (19);
Step 3. Continue to update θ_j+1 in Step 2 by (8) and (22) and the iteration stops until RMSE(θ_j+1, θ_j) < ε (e.g., ε = 10⁻⁶), where RMSE stands for the Root Mean Square Error defined by
$RMSE (θ_{j + 1}, θ_{j}) = {\frac{1}{r} {‖ θ_{j + 1} - θ_{j} ‖}^{2}}^{1 / 2},$ (23)
and the sign “‖ · ‖” denotes the usual Euclidean distance, r is the dimension of θ (r × 1).

When the model parameters in the mean and covariance structures (2) are subject to linear constraints, the above steps can be further simplified. General linear constraints can be expressed as

h (θ) = C θ - a = 0,

(24)

where C is a known m × r matrix with rank(C) = m < r (θ : r × 1) and a is an m × 1 constant vector. Because the Lagrange multiplier method for ML estimation with linear constraints in the two-level SEM formulation (1) with mean and covariance structures (2) has been coded into EQS Version 6.1 (Bentler, 2006), we do not present the details here and guide readers to the practical application of the method, see the Appendix for a sample EQS program for ML estimation with linear constraints. Note that when the general constraint (5) becomes the linear constraint (24) with ϕ(t) = t²/2 in (7) as in EQS, the first order derivative (13) reduces to

{\dot{M}}_{j} (θ_{j} | θ_{j}) = \dot{M} (θ_{j} | θ_{j}) + C ξ_{i} + c_{j} C' (C θ_{j} - a),

(25)

where Ṁ (θ_j|θ_j) is given by (11) with θ = θ_j, and the expected Hessian matrix (19) reduces to

H_{j} = I (θ_{j}) + c_{j} C' C,

(26)

where I(θ_j) is given by (15) with θ = θ_j.

A MONTE CARLO STUDY

In this section, we construct an artificial two-level SEM according to formulation (1) and generate the empirical two-level data from the model with the assumptions A1)–A5) on formulation (1). Then we use the algorithm in Section 3 to estimate the model parameters. The model is illustrated by the path diagram in Figure 1.

Path diagram of the model in the simulation study

In the two-level SEM presented by Figure 1, the level-1 model is a factor analysis model with only one within factor F_W that generates four indicator variables: Y₁, Y₂, Y₃ and Y₄ with independent residual errors. The level-2 model is not a factor analysis model, where the factor is affected by two correlated observable level-2 variables Z₁ and Z₂ with means μ₁ and μ₂, respectively. It has only one between factor F_B that generates the same four indicator variables Y_i (i = 1, 2, 3, 4) with independent residual errors. The path diagram uses the standard conventions in presenting a two-level SEM: a one-way arrow pointing to a variable indicates that the variable is influenced by another variable. A double arrowed curve means a covariance (or correlation, if standardized) between two variables. An arrow pointing to a variable but without starting from another variable indicates an influence by a random residual or error. The four rectangles corresponding to Y_i (i = 1, 2, 3, 4) and Z_j (j = 1, 2) are the data variables to be analyzed. The notation V999 in Figure 1 is the constant unit vector, and its effects on Z₁ and Z₂ represent the intercepts of these two variables. The constant factor loadings “1” at the path from F_W to Y1 in the within model and at the path from F_B to Y1 in the between model in Figure 1 are fixed as a reference for the purpose of model identification.

Now there are six observable variables Y_i (i = 1, 2, 3, 4) and Z_j (j = 1, 2) but only two free mean parameters μ₁ = E(Z₁) and μ₂ = E(Z₂). So we will have a nonsaturated mean structure in the model given by Figure 1. The exact mean and covariance structures corresponding to those given by (2) can be expressed as

\begin{matrix} μ = (\begin{matrix} μ_{z} \\ μ_{y} \end{matrix}), μ_{z} = (\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}), \begin{matrix} μ_{y} = (μ_{b}, ϕ_{1} μ_{b}, ϕ_{2} μ_{b}, ϕ_{3} μ_{b})', \\ μ_{b} = E (F_{B}) = ϕ_{11} μ_{1} + ϕ_{12} μ_{2}, \end{matrix} \\ \begin{matrix} Σ_{W} = Λ_{W} Φ_{W} Λ_{W}^{'} + Ψ_{W,} & Λ_{W} = (1, θ_{1}, θ_{2}, θ_{3})', \\ Ψ_{W} = diag (θ_{5}, θ_{6}, θ_{7}, θ_{8}), & Φ_{W} = var (F_{W}) = (θ_{4}), \end{matrix} \\ {\tilde{Σ}}_{B} = cov (\begin{matrix} z_{g} \\ υ_{g} \end{matrix}) = (\begin{matrix} Σ_{zz} & Σ_{zy} \\ Σ_{yz} & Σ_{B} \end{matrix}), \begin{matrix} Σ_{B} = Λ_{B} Φ_{B} Λ_{B}^{'} + Ψ_{B}, \\ Λ_{B} = (1, ϕ_{1}, ϕ_{2}, ϕ_{3})', \end{matrix} \\ \begin{matrix} Φ_{B} (ϕ) = var (F_{B}), & Ψ_{B} (ϕ) = diag (ϕ_{4}, ϕ_{5}, ϕ_{6}, ϕ_{7}), \end{matrix} \\ \begin{matrix} Σ_{zz} = (\begin{matrix} ϕ_{8} & ϕ_{9} \\ ϕ_{9} & ϕ_{10} \end{matrix}), & Σ_{zy} = Σ_{yz}^{'} \end{matrix} = (cov (Z_{j}, Y_{i})), i = 1, 2, 3, 4; j = 1, 2, \end{matrix}

(27)

where

\begin{matrix} var (F_{B}) & = υ_{b} = ϕ_{8} ϕ_{11}^{2} + ϕ_{10} ϕ_{12}^{2} + 2 ϕ_{9} ϕ_{11} ϕ_{12} + ϕ_{13}, \\ cov (Z_{1}, Y_{1}) & = υ_{1} = ϕ_{8} ϕ_{11} + ϕ_{9} ϕ_{12}, cov (Z_{1}, Y_{2}) = ϕ_{1} υ_{1}, \\ cov (Z_{1}, Y_{3}) & = ϕ_{2} υ_{1}, cov (Z_{1}, Y_{4}) = ϕ_{3} υ_{1}, \\ cov (Z_{2}, Y_{1}) & = υ_{2} = ϕ_{9} ϕ_{11} + ϕ_{10} ϕ_{12}, cov (Z_{2}, Y_{2}) = ϕ_{1} υ_{2}, \\ cov (Z_{2}, Y_{3}) & = ϕ_{2} υ_{2}, cov (Z_{2}, Y_{4}) = ϕ_{3} υ_{2} . \end{matrix}

The first Monte Carlo experiment is on the performance of the algorithm for nonlinear constraints on the parameters in the model given by Figure 1. These constraints are defined by:

\begin{matrix} 2 θ_{1} + θ_{2} - θ_{3}^{2} = 0, & ϕ_{11} - 2 ϕ_{12} = 0, \\ ϕ_{1}^{2} + 2 ϕ_{2} - ϕ_{3} = 0, & θ_{1}^{2} - ϕ_{1} - ϕ_{2} = 0 . \end{matrix}

(28)

We use the following sampling designs to generate the two-level data.

Design D₁: level 2 sample size G = 60, level-1 sample sizes N_g = 4 for g = 1,…,20; N_g = 6 for g = 21,…,40; N_g = 8 for g = 41,…,60;
Design D₂: level-2 sample size G = 60, level-1 sample sizes N_g = 8 for g = 1,…,20; N_g = 12 for g = 21,…,40; N_g = 16 for g = 41,…,60;
Design D₃: level-2 sample size G = 120, level-1 sample sizes N_g = 4 for g = 1,…,40; N_g = 6 for g = 41,…,80; N_g = 8 for g = 81,…,120;
Design D₄: level-2 sample size G = 120, level-1 sample sizes N_g = 8 for g = 1,…,40; N_g = 12 for g = 41,…,80; N_g = 16 for g = 81,…,120;
Design D₅: level-2 sample size G = 240, level-1 sample sizes N_g = 4 for g = 1,…,80; N_g = 6 for g = 81,…,160; N_g = 8 for g = 161,…,240;
Design D₆: level-2 sample size G = 240, level-1 sample sizes N_g = 8 for g = 1,…,80; N_g = 12 for g = 81,…,160; N_g = 16 for g = 161,…,240.

For each of the sampling designs, we use formulation (1) with assumptions A1)–A5) to generate the two-level data. The level-1 data for υ_gi in formulation (1) are generated according to the level-1 model in Figure 1, and the level-2 data for υ_g and z_g in formulation (1) are generated according to the level-2 model in Figure 1. Then the two-level data {y_gi : i = 1,…,N_g; g = 1,…,G} are obtained by y_gi = υ_g + υ_gi as defined by formulation (1). The true values of the model parameters are chosen as in Table 1 and they satisfy the nonlinear constraints (28). The normal samples are generated by the internal normal random number generator in MATLAB Code. In the simulation, the constant sequences ξ_j and c_j [see (7)] in the algorithm for nonlinear constraints are chosen as: the initial values are $ξ_{j} = (ξ_{1}^{(1)}, ξ_{1}^{(2)}, ξ_{1}^{(3)}, ξ_{1}^{(4)})' = (1, 1, 1, 1)'$ (there are four constraints) and c₁ = 1 (j = 1), then c_j+1 = 1.5c_j for j = 1, 2, 3,…, and $ξ_{j} = (ξ_{j}^{(1)}, ξ_{j}^{(2)}, ξ_{j}^{(3)}, ξ_{j}^{(4)})'$ is updated by

ξ_{j + 1}^{(i)} = ξ_{j}^{(i)} + c_{j + 1} \dot{ϕ} [h_{i} (θ^{(j)})] (i = 1, \dots, 4)

for iteration j = 1, 2, 3,… with ϕ(t) = t²/2. For each of the six sampling designs, the simulation was carried out with 200 replications. We take the average ML estimation for each parameter obtained from 200 replications and compute the standard deviation (S.D.) to show the accuracy of the estimation. The results are summarized in Table 1.

TABLE 1.

Average Estimates with Nonlinear Constraints

	θ₁	θ₂	θ₃	θ₄	θ₅	θ₆	θ₇	θ₈
True Value	1.0000	2.0000	2.0000	0.1600	0.2500	0.2500	0.2500	0.2500
A.E.(D₁)	0.9977	1.9819	1.9935	0.1640	0.2466	0.2518	0.2543	0.2454
A.E.(D₂)	1.0036	2.0067	2.0033	0.1604	0.2485	0.2479	0.2481	0.2491
A.E.(D₃)	0.9991	1.9890	1.9966	0.1601	0.2486	0.2480	0.2535	0.2493
A.E.(D₄)	1.0004	2.0123	2.0032	0.1585	0.2494	0.2492	0.2488	0.2494
A.E.(D₅)	1.0001	2.0015	2.0003	0.1593	0.2505	0.2507	0.2500	0.2496
A.E.(D₆)	1.0006	2.0029	2.0009	0.1594	0.2480	0.2499	0.2508	0.2500

S.D.(D₁)	0.0372	0.1687	0.0526	0.0201	0.0182	0.0228	0.0415	0.0348
S.D.(D₂)	0.0218	0.0911	0.0273	0.0129	0.0161	0.0155	0.0266	0.0245
S.D.(D₃)	0.0174	0.1037	0.0295	0.0139	0.0179	0.0157	0.0296	0.0266
S.D.(D₄)	0.0159	0.0662	0.0200	0.0092	0.0117	0.0109	0.0168	0.0141
S.D.(D₅)	0.0123	0.0686	0.0192	0.0082	0.0107	0.0106	0.0179	0.0166
S.D.(D₆)	0.0135	0.0479	0.0154	0.0066	0.0082	0.0081	0.0124	0.0113

	ϕ₁	ϕ₂	ϕ₃	ϕ₄	ϕ₅	ϕ₆	ϕ₇	ϕ₈
True Value	0.5000	0.5000	1.2500	0.3600	0.3600	0.3600	0.3600	0.4900
A.E.(D₁)	0.4899	0.5068	1.2595	0.3621	0.3720	0.3760	0.3429	0.4851
A.E.(D₂)	0.5074	0.5002	1.2596	0.3626	0.3500	0.3611	0.3461	0.4788
A.E.(D₃)	0.5003	0.4981	1.2475	0.3578	0.3565	0.3753	0.3467	0.4887
A.E.(D₄)	0.4982	0.5030	1.2550	0.3488	0.3643	0.3573	0.3676	0.4910
A.E.(D₅)	0.5002	0.5001	1.2508	0.3614	0.3589	0.3662	0.3596	0.4885
A.E.(D₆)	0.5011	0.5003	1.2522	0.3477	0.3621	0.3520	0.3622	0.4876

S.D.(D₁)	0.0765	0.0608	0.1089	0.1119	0.1101	0.0985	0.1295	0.0862
S.D.(D₂)	0.0434	0.0314	0.0628	0.0759	0.0766	0.0853	0.0982	0.0867
S.D.(D₃)	0.0306	0.0246	0.0521	0.0584	0.0653	0.0551	0.0844	0.0572
S.D.(D₄)	0.0273	0.0245	0.0497	0.0537	0.0489	0.0644	0.0741	0.0589
S.D.(D₅)	0.0212	0.0167	0.0366	0.0478	0.0421	0.0440	0.0575	0.0460
S.D.(D₆)	0.0238	0.0161	0.0374	0.0439	0.0387	0.0397	0.0490	0.0429

	ϕ₉	ϕ₁₀	ϕ₁₁	ϕ₁₂	ϕ₁₃	μ₁	μ₂
True Value	0.3000	0.4900	1.0000	0.5000	0.1600	1.0000	1.0000
A.E.(D₁)	0.2925	0.4908	0.9965	0.4983	0.1500	1.0030	1.0131
A.E.(D₂)	0.2899	0.4751	0.9911	0.4956	0.1470	1.0066	0.9977
A.E.(D₃)	0.3028	0.4974	1.0052	0.5026	0.1599	0.9973	0.9910
A.E.(D₄)	0.2985	0.4854	0.9918	0.4959	0.1594	1.0115	1.0037
A.E.(D₅)	0.2994	0.4918	0.9994	0.4997	0.1563	0.9970	0.9960
A.E.(D₆)	0.3020	0.4916	0.9982	0.4991	0.1621	1.0037	1.0008

S.D.(D₁)	0.0746	0.0927	0.0698	0.0349	0.0660	0.0851	0.0924
S.D.(D₂)	0.0747	0.0894	0.0491	0.0246	0.0629	0.0986	0.0872
S.D.(D₃)	0.0509	0.0680	0.0376	0.0188	0.0516	0.0606	0.0562
S.D.(D₄)	0.0523	0.0637	0.0369	0.0184	0.0445	0.0575	0.0592
S.D.(D₅)	0.0362	0.0443	0.0307	0.0154	0.0345	0.0433	0.0435
S.D.(D₆)	0.0383	0.0487	0.0284	0.0142	0.0287	0.0451	0.0502

Open in a new tab

Note: In Table 1, A.E.(D_i)=Average Estimate from 200 simulation replications, i = 1,…,6; S.D.(D_i)=Standard Deviation from 200 simulation replications, i = 1,…,6. RMSE in (23) is ≤ 10⁻⁴.

The second simulation experiment is to show the performance of the algorithm for linear constraints on the parameters in the model given by Figure 1. The linear constraints are defined by:

\begin{matrix} θ_{1} + 2 θ_{2} - θ_{3} = 2, & θ_{3} - ϕ_{3} = 0.5, \\ θ_{1} - 2 ϕ_{1} = 0, & ϕ_{1} + ϕ_{2} + ϕ_{3} - ϕ_{11} = 0, \\ θ_{2} + ϕ_{2} = 2, & ϕ_{11} - 2 ϕ_{12} = 0.5 . \end{matrix}

(29)

The sampling designs D_i (i = 1,…,6) are the same as in Table 1. The results from the EM algorithm for the linear constraint (29) are summarized in Table 2.

TABLE 2.

Average Estimates with Linear Constraints

	θ₁	θ₂	θ₃	θ₄	θ₅	θ₆	θ₇	θ₈
True Value	1.0000	1.0000	1.0000	1.0000	0.2500	0.2500	0.2500	0.2500
A.E.(D₁)	0.9997	0.9994	0.9986	1.0073	0.2459	0.2496	0.2508	0.2500
A.E.(D₂)	1.0002	0.9993	0.9988	1.0008	0.2486	0.2495	0.2496	0.2495
A.E.(D₃)	0.9981	0.9996	0.9974	1.0004	0.2495	0.2510	0.2504	0.2479
A.E.(D₄)	0.9988	1.0000	0.9988	0.9980	0.2495	0.2490	0.2495	0.2498
A.E.(D₅)	0.9978	1.0016	1.0011	1.0017	0.2499	0.2517	0.2515	0.2499
A.E.(D₆)	0.9994	1.0003	0.9999	1.0023	0.2503	0.2503	0.2502	0.2491

S.D.(D₁)	0.0212	0.0116	0.0179	0.0896	0.0220	0.0271	0.0261	0.0252
S.D.(D₂)	0.0150	0.0092	0.0145	0.0598	0.0168	0.0191	0.0209	0.0172
S.D.(D₃)	0.0145	0.0071	0.0125	0.0627	0.0184	0.0195	0.0185	0.0181
S.D.(D₄)	0.0108	0.0059	0.0093	0.0450	0.0133	0.0121	0.0124	0.0113
S.D.(D₅)	0.0102	0.0052	0.0091	0.0406	0.0134	0.0146	0.0142	0.0133
S.D.(D₆)	0.0081	0.0042	0.0069	0.0316	0.0100	0.0089	0.0097	0.0095

	ϕ₁	ϕ₂	ϕ₃	ϕ₄	ϕ₅	ϕ₆	ϕ₇	ϕ₈
True Value	0.5000	1.0000	0.5000	0.1600	0.1600	0.1600	0.1600	0.3600
A.E.(D₁)	0.4999	1.0006	0.4986	0.1655	0.1651	0.1663	0.1561	0.3567
A.E.(D₂)	0.5001	1.0007	0.4988	0.1614	0.1539	0.1621	0.1575	0.3549
A.E.(D₃)	0.4990	1.0004	0.4974	0.1621	0.1586	0.1593	0.1573	0.3535
A.E.(D₄)	0.4994	1.0000	0.4988	0.1623	0.1610	0.1553	0.1613	0.3575
A.E.(D₅)	0.4989	0.9984	0.5011	0.1554	0.1605	0.1625	0.1624	0.3637
A.E.(D₆)	0.4997	0.9997	0.4999	0.1603	0.1547	0.1619	0.1606	0.3482

S.D.(D₁)	0.0106	0.0116	0.0179	0.0591	0.0481	0.0621	0.0488	0.0634
S.D.(D₂)	0.0075	0.0092	0.0145	0.0501	0.0392	0.0495	0.0392	0.0655
S.D.(D₃)	0.0073	0.0071	0.0125	0.0369	0.0345	0.0387	0.0324	0.0457
S.D.(D₄)	0.0054	0.0059	0.0093	0.0378	0.0313	0.0394	0.0285	0.0417
S.D.(D₅)	0.0051	0.0052	0.0091	0.0261	0.0231	0.0271	0.0222	0.0328
S.D.(D₆)	0.0041	0.0042	0.0069	0.0266	0.0204	0.0231	0.0193	0.0338

	ϕ₉	ϕ₁₀	ϕ₁₁	ϕ₁₂	ϕ₁₃	μ₁	μ₂
True Value	0.2000	0.3600	2.0000	0.7500	0.2500	1.0000	1.0000
A.E.(D₁)	0.1951	0.3580	1.9990	0.7495	0.2444	1.0006	1.0114
A.E.(D₂)	0.2008	0.3569	1.9997	0.7498	0.2391	1.0057	1.0014
A.E.(D₃)	0.1983	0.3589	1.9968	0.7484	0.2466	1.0001	0.9963
A.E.(D₄)	0.2021	0.3646	1.9982	0.7491	0.2526	1.0067	1.0103
A.E.(D₅)	0.2007	0.3540	1.9983	0.7492	0.2545	1.0000	1.0027
A.E.(D₆)	0.1913	0.3539	1.9994	0.7497	0.2487	1.0055	1.0095

S.D.(D₁)	0.0553	0.0700	0.0254	0.0127	0.0978	0.0769	0.0801
S.D.(D₂)	0.0552	0.0695	0.0180	0.0090	0.0706	0.0790	0.0781
S.D.(D₃)	0.0384	0.0484	0.0183	0.0091	0.0611	0.0538	0.0565
S.D.(D₄)	0.0384	0.0469	0.0131	0.0065	0.0570	0.0492	0.0490
S.D.(D₅)	0.0207	0.0291	0.0129	0.0064	0.0493	0.0380	0.0426
S.D.(D₆)	0.0277	0.0372	0.0100	0.0050	0.0373	0.0345	0.0323

Open in a new tab

Note: In Table 2, A.E.(D_i)=Average Estimate from 200 simulation replications, i = 1,…,6; S.D.(D_i)=Standard Deviation from 200 simulation replications, i = 1,…,6. RMSE in (23) is ≤ 10⁻⁴.

The simulation results in Tables 1–2 show that the performance of the algorithm for both linear and nonlinear parameter constraints is generally acceptable. By comparing the standard deviation (S.D.) row by row among the six sampling designs D_i (i = 1,…,6), we can summarize the following two empirical conclusions from Tables 1–2:

The algorithm performs better for a larger level-1 sample size than for a smaller level-1 sample size in the sense that the S.D. becomes smaller for a larger N_g with the same G. For example, for all cases of the within parameter estimates, S.D.(D₂) <S.D.(D₁), S.D.(D₄) <S.D.(D₃), and S.D.(D₆) <S.D.(D₅). The between parameter estimates are not generally improved (i.e., with smaller S.D.) when simply increasing the level-1 sample sizes N_g. This verifies the theoretical results in Yuan and Bentler (2006): the level-1 sample size mainly affects the standard errors of within parameter estimates;
The algorithm performs better for a larger level-2 sample size than for a smaller level-2 sample size in the sense that the S.D. becomes smaller for a larger G with the same N_g. For example, for all cases of the within and between parameter estimates, S.D.(D₅) <S.D.(D₃) <S.D.(D₁), and S.D.(D₆) <S.D.(D₄) <S.D.(D₂). Both within and between parameter estimates are generally improved (i.e., with smaller S.D.) when increasing the level-2 sample sizes G. This also verifies the theoretical results in Yuan and Bentler (2006): the level-2 sample size affects the standard errors of both within and between parameter estimates.

PRACTICAL ILLUSTRATION ON LINEAR CONSTRAINTS

In this section we illustrate the algorithm in Section 3 by setting up a two-level SEM with formulation (1) and the mean and covariance structures (2) and running the model with the proposed linear constraints on EQS 6.1 (Bentler, 2006). The real data set is from the National Education Longitudinal Study (NELS: 88) and the full data set is available from the authors upon request. The data set contains measurements of some variables for schools and for 5,198 students nested in 235 schools. There are 21 columns in the data set. We only take the data in columns 7–10 that record the students’ scores from four math tests (level-1 observations, denoted by indicator variables M_i, i = 1, 2, 3, 4); the data in columns 11–14 that record the students’ scores from four science tests (level-1 observations, denoted by indicator variables S_i, i = 1, 2, 3, 4); and the data in columns 20–21 that record the school-level (level-2) observations from two level-2 observable variables: Z₁=minority and Z₂=school type. It is empirically known that the students’ scores are affected by three within-school (level-1) factors: F_w1 =general math ability, F_w2 =general science ability, and F_w3 =general writing ability; and two between-school (level-2) factors: F_b1 =general background and F_b2 =math background. The two-level SEM is set up as:

The within (level-1) model is a factor model: both F_w1 and F_w3 generate (subject to independent measurement errors without further mention as follows) the four math tests. In addition, F_w1 also influences the second science test S₂. The factor loadings of M₁ on F_w1 and F_w3 are fixed as the constant “1” as reference for model identification. Both F_w2 and F_w3 influence all four science tests S_i (i = 1, 2, 3, 4). All three level-1 factors have unknown variances as free parameters, the covariances cov(F_w1, F_w3) and cov(F_w2, F_w3) are free parameters, but cov(F_w1, F_w2) = 0;
The between (level-2) model is not a factor model: F_b1 influences all eight tests M_i and S_i (i = 1, 2, 3, 4). F_b2 influences the four math tests M_i (i = 1, 2, 3, 4) and the science test S₂. The factor loadings of M₁ on F_b1 and F_b2 are fixed as the constant “1” as reference for model identification. In addition, the two factors F_b1 and F_b2 are predicted by the two school-level variables Z₁ and Z₂ with unknown intercepts and correlated disturbance errors. The additional unknown intercepts are also imposed on M₃ and S₁, respectively, according to some prior knowledge. The observations from both Z₁ and Z₂ are obtained subject to correlated random residuals. The following covariances are free parameters: the covariance between the two disturbance errors for F_b1 and F_b2, the covariance between the two residuals for M₃ and for Z₂, the covariance between the two residuals for Z₁ and for Z₂.

The above measurement relationships can be visualized in Figure 2, using the same conventions as in Figure 1. Some nonzero covariances in the within and between models as mentioned above are released as free parameters based on some preliminary EQS runs by using the Lagrange Multiplier Test (command LMTEST in EQS) for adding parameters. Note: this test should not be confused with the use of augmented Lagrangians in optimization as discussed above. See for example, Buse (1982) for a discussion of this test, which is asymptotically equivalent to the chi-square difference test and the Wald test.

Path diagram of the model for the selected school data

In the model given by Figure 2, there are a total of ten indicator variables M_i and S_i (i = 1, 2, 3, 4), and Z₁ and Z₂. But there are only six free mean parameters: the two means from Z₁ and Z₂, the two nonzero intercepts from the prediction equations of F_b1 and F_b2, and the the two nonzero intercepts from the prediction equations of M₃ and S₁. Therefore we have a nonsaturated mean structure for this model. The mean and covariance structures for the model given by Figure 2 can be clearly expressed as some functions of the model parameters (indicated by the star sign “*” in Figure 2) like those in (27). Since some of the latent factors in the model have almost the same sets of indicator variables, equality of factor loadings on a given factor is used to help differentiate or identify the various factors. For example, the four variables loading on Fw1 are constrained to have equal loadings. These considerations, along with some preliminary EQS runs, yield 10 equality constraints overall. Thus we set up the model of Figure 2 with EQS incorporating these 10 restrictions, which are shown in the Appendix in the sections labeled /CONSTRAINTS. The EQS output provides information on the statistical adequacy of these constraints as follows.

As expected, the p-values in Table 3 verify that all chi-square tests for the equality constraints are insignificant at the usual statistical levels (1%, 5% and 10%). So all null hypotheses of equality constraints should not be rejected, i.e., they are suitable for the data. The ML estimates with the above ten constraints are provided by the following EQS output for the measurement equations, where the number with the star sign “*” is an estimate for a factor loading parameter, the number with “*” before the variable V999 means that it is an estimate of an intercept or a mean parameter, F1=F_w1, F2=F_w2, and F3=F_w3 in the within model; F1=F_b1, and F2=F_b2 in the between model.

Measurement equations for the within model:

\begin{matrix} M 1 = V 7 = 1.000 F 1 + 1.000 F 3 + 1.000 E 1 \\ M 2 = V 8 = 1.039 * F 1 + 1.047 * F 3 + 1.000 E 2 \\ M 3 = V 9 = 1.039 * F 1 + .682 * F 3 + 1.000 E 3 \\ M 4 = V 10 = 1.039 * F 1 + 1.047 * F 3 + 1.000 E 4 \\ S 1 = V 11 = 1.000 F 2 + .036 * F 3 + 1.000 E 5 \\ S 2 = V 12 = 1.039 * F 1 + .699 * F 2 + .180 * F 3 + 1.000 E 6 \\ S 3 = V 13 = .699 * F 2 + .180 * F 3 + 1.000 E 7 \\ S 4 = V 14 = .699 * F 2 + .180 * F 3 + 1.000 E 8 \end{matrix}

(30)

Measurement equations for the between model:

\begin{matrix} M 1 = V 7 = 1.000 F 1 + 1.000 F 2 + 1.000 E 1 \\ M 2 = V 8 = 1.070 * F 1 + 1.276 * F 2 + 1.000 E 2 \\ M 3 = V 9 = .596 * F 1 + .454 * F 2 + .655 * V 999 + 1.000 E 3 \\ M 4 = V 10 = 1.123 * F 1 + .498 * F 2 + 1.000 E 4 \\ S 1 = V 11 = .961 * F 1 + .324 * V 999 + 1.000 E 5 \\ S 2 = V 12 = .855 * F 1 + .332 * F 2 + 1.000 E 6 \\ S 3 = V 13 = .855 * F 1 + 1.000 E 7 \\ \begin{matrix} S 4 = V 14 = .855 * F 1 + .1.000 E 8 \\ Z 1 = V 20 = 4.630 * V 999 + 1.000 E 9 \\ Z 2 = V 21 = 1.170 * V 999 + 1.000 E 10 \\ F 1 = - .159 * V 20 + 0.19 * V 21 + 3.373 * V 999 + 1.000 D 1 \\ F 2 = - .061 * V 20 - .048 * V 21 - .107 * V 999 + 1.000 D 2 \end{matrix} \end{matrix}

(31)

TABLE 3.

LMTEST for the Equality Constraints

	CONSTRAINTS FROM GROUP 1 (within model)
CONSTRAINT #	CONSTRAINT	χ²(1)-statistic	p-Value
CONSTR: 1	(M2,F1)−(M3,F1)=0;	.001	.979
CONSTR: 2	(M2,F1)−(M4,F1)=0;	.000	.985
CONSTR: 3	(M2,F1)−(S2,F1)=0;	.288	.592
CONSTR: 4	(M2,F3)−(M4,F3)=0;	.024	.876
CONSTR: 5	(S2,F3)−(S3,F3)=0;	.033	.857
CONSTR: 6	(S2,F3)−(S4,F3)=0;	.033	.857
CONSTR: 7	(S2,F2)−(S3,F2)=0;	.001	.972
CONSTR: 8	(S2,F2)−(S4,F2)=0;	.003	.956
	CONSTRAINTS FROM GROUP 2 (between model)
CONSTR: 9	(S2,F1)−(S3,F1)=0;	.849	.357
CONSTR: 10	(S2,F1)−(S4,F1)=0;	.849	.357

Open in a new tab

Note: In Table 3, F1=F_w1, F2=F_w2, and F3=F_w3 in the within model; F1=F_b1, and F2=F_b2 in the between model. EQS notation such as (M2,F1) denotes the factor loading parameter of the second math test M₂ on the first within-level factor F1=F_w1=general math ability.

As can be seen above, the equality constraints imposed on the model were implemented during estimation, that is, the relevant parameter estimates are in fact equal. For example, the value 1.039 describes four of the within factor loadings (on F1, or F_w1 in the diagram). The ML estimates for the variance-covariance parameters in the within and between models are summarized in Table 4.

TABLE 4.

ML Estimates for the Variance-Covariance Parameters

		Within Model Variances-Covariances Parameter Estimates
	E1	E2	E3	E4	E5	E6	E7	E8	F1	F2	F3
Variance	.344	.257	1.328	2.463	.605	.629	.607	2.403	.153	.495	.887
Covariance		cov(F1,F3)=−.146					cov(F2,F3)=.464

		Between Model Variances-Covariances Parameter Estimates
	E1	E2	E3	E4	E5	E6	E7	E8	E9	E10	D1	D2
Variance	.016	.004	.033	.037	.017	.028	.009	.025	4.437	.142	.114	.023
Covariance	cov(D1,D2)=−.005				cov(E3,E10)=−.024				cov(E9,E10)=−.365

Open in a new tab

Note: In Table 4, the residuals E1, E2, …, E8, and the three level-1 factors F1, F2, and F3 in the within model are the same as those in the measurement equations given by (30); the residuals E1, E2, …, E10, and the two disturbances D1 and D2 in the between model are the same as those in the measurement equations given by (31). All nonzero covariances are statistically significant at level 5% based on the individual t-test provided by EQS.

In addition to the ML parameter estimates subject to the linear constraints in Table 3, EQS provides additional information on the model in Figure 2. An important statistic is the model chi-square=53.8 with 48 degrees of freedom and a p-value=.261. This implies that the model is suitable for the selected data. The model contains the nonzero covariances presented in Table 4, all of which are statistically significant at level 5%. The LMTEST furthermore showed that all remaining covariances between any two residuals are statistically insignificant at level 5%. This implies that it is not necessary to release more covariance parameters. Finally, the overall necessity for using a two-level model rather than a one-level model can be evaluated by the model-based intraclass correlations. In this example, variables M1–M4 and S1–S4 respectively have .200, .224, .067, .092, .167, .150, .150, and .064 proportions of the total (within plus between) variance. The between-school effects on individual performance clearly should not be ignored. The above interpretation on the output may not be easily understandable for the readers who are not familiar with structural equation models. A good reference is Bentler (2006).

CONCLUDING REMARKS

The algorithm in this paper is a mixture of the EM approach (Dempster, Laird, & Rubin, 1977) and the Lagrange multiplier method. There is no guarantee for the final solution to be the global maximum point for the likelihood function under the parameter constraints. A theoretical discussion on the global convergence of this kind of mixed algorithm is beyond the scope of this paper. We are also unable to provide a comparison between our algorithm and some possible alternative ones, as we have not found parallel algorithms for handling the same model in the literature. However, the simulation experiments and the practical illustration provide some evidence that the current approach is feasible. Since complicated functional constraints with mean and covariance structures are not usually proposed in two-level statistical models, our evaluation was limited to simulation results. Linear constraints, especially equality constraints, are much more frequent in application, and we were able to illustrate them with the real school data set using the publicly available EQS program (Bentler, 2006). As a result, anyone with a working knowledge of EQS can easily run the program given in the Appendix, or, by referring to the EQS manual (Bentler, 2006), they can write their own setups for analysis of two-level SEM with both mean and covariance structures subject to general linear constraints on parameters.

This paper is focused on developing an algorithm for estimating parameters subject to general constraints in more complicated two-level SEM. Asymptotic properties such as the asymptotic normality and standard errors of the parameter estimates, and goodness-of fit test for justifying the imposed constraints can also be developed. Due to limited space of this paper, we will guide interested readers to our further research.

ACKNOWLEDGMENTS

The research was supported by National Institute on Drug Abuse grant DA01070, a grant from the Hong Kong Baptist University (project number FRG/07-08/II-35), and University of New Haven 2009 and 2010 Summer Research Grants.

APPENDIX

EQS Input Program for the model in Figure 2

/TITLE

Two-level analysis for the school data

(Two factors in the between model, three factors in the within model, 8 y-variables)

WITHIN MODEL FIRST

/SPECIFICATION

data=’school.dat’; case =5198; variable=21; method=ml;

matrix=raw; GROUP=2; analysis=covariance; MULTILEVEL=ML; CLUSTER= V19;

/LABELS

V7=M1; V8=M2; V9=M3; V10=M4; V11=S1; V12=S2; V13=S3; V14=S4;

V19=SCHOOL; V20=Z1; V21=Z2;

! F1=Fw1=math ability factor; F2=Fw2=science ability factor;

! F3=Fw3=general writing ability factor

/EQUATIONS

M1=1F1+1F3+E1; M2=*F1+*F3+E2; M3=*F1+*F3+E3; M4=*F1+*F3+E4;

S1=1F2+*F3+E5; S2=*F1+*F2+*F3+E6; S3=*F2+*F3+E7; S4=*F2+*F3+E8;

/VARIANCES

E1–E8=*; F1=*; F2=*; F3=*;

/COVARIANCES

F1,F2=0; F2,F3=0*; F1,F3=0*;

/CONSTRAINTS

(M2,F1)=(M3,F1)=(M4,F1)=(S2,F1); (M2,F3)=(M4,F3);

(S2,F3)=(S3,F3)=(S4,F3); (S2,F2)=(S3,F2)=(S4,F2);

/END

/TITLE

BETWEEN MODEL

/LABELS

V7=M1; V8=M2; V9=M3; V10=M4; V11=S1; V12=S2; V13=S3; V14=S4;

V19=SCHOOL; V20=Z1; V21=Z2;

! F1=Fb1==general background factor; F2=Fb2=math background factor;

/EQUATIONS

M1=1F1+1F2+E1; M2=*F1+*F2+E2; M3=*V999+*F1+*F2+E3; M4=*F1+*F2+E4;

S1=*V999+*F1+E5; S2=*F1+*F2+E6; S3=*F1+E7; S4=*F1+E8;

F1=*V999+*Z1+*Z2+D1; F2=*V999+*Z1+*Z2+D2; Z1=*V999+E9; Z2=*V999+E10;

/VARIANCES

E1–E10=0*; D1–D2=*;

/COVARIANCES

D1,D2=0*; E3,E10=0*; E9,E10=0*;

/CONSTRAINTS

(S2,F1)=(S3,F1)=(S4,F1);

/tech

itr=200; con=.000001;

/LMTEST

set=pee;

/END

Contributor Information

Peter M. Bentler, University of California, Los Angeles

Jiajuan Liang, University of New Haven.

Man-Lai Tang, Hong Kong Baptist University.

Ke-Hai Yuan, University of Notre Dame.

REFERENCES

Aitchison J, Silvey SD. Maximum likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics. 1958;29:813–828. [Google Scholar]
Bentler PM. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc.; 2006. ISBN: 1-885889-03-7 ( www.mvsoft.com). [Google Scholar]
Bertsekas DP. Multiplier method: A survey. Automatica. 1976;12:133–145. [Google Scholar]
Buse A. The likelihood ratio, Wald and Lagrange multiplier tests: An expository note. American Statistician. 1982;36:153–157. [Google Scholar]
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm (with discussion) Journal of the Royal Statistical Society (Series B) 1977;39:1–38. [Google Scholar]
du Toit S, du Toit M. Multilevel structural equation modeling. In: de Leeuw J, Meijer E, editors. Handbook of Multilevel Analysis (Chapter 12) New York: Springer Verlag; 2008. [Google Scholar]
Jamshidian M. On algorithms for restricted maximum likelihood estimation. Computational Statistics & Data Analysis. 2004;45:137–157. [Google Scholar]
Kim DK, Taylor JMG. The restricted EM algorithm for maximum likelihood estimation under linear restrictions on parameters. Journal of the American Statistical Association. 1995;90:708–716. [Google Scholar]
Lange K. A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society (Series B) 1995a;57:425–437. [Google Scholar]
Lange K. A Quasi-Newton acceleration of the EM algorithm. Statistica Sinica. 1995b;5:1–8. [Google Scholar]
Lee SY. Constrained estimation in covariance structure analysis. Biometrika. 1979;66:539–545. [Google Scholar]
Lee SY, Poon WY. Analysis of two-level structural equation models via EM type algorithms. Statistica Sinica. 1998;8:749–766. [Google Scholar]
Lee SY, Tsang SY. Constrained maximum likelihood estimation of two-level covariance structure model via EM type algorithms. Psychometrika. 1999;64:435–450. [Google Scholar]
Liang J, Bentler PM. A new EM algorithm for fitting two-level structural equation models. Psychometrika. 2004;69:101–122. [Google Scholar]
McDonald RP, Goldstein H. Balanced versus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical & Statistical Psychology. 1989;42:215–232. [Google Scholar]
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. New York: Wiley; 1997. [Google Scholar]
Muthén BO. Multilevel covariance structure analysis. Sociological Methods & Research. 1994;22:376–398. [Google Scholar]
Muthén LK, Muthén BO. Mplus Version 3 User’s Guide. Los Angeles: Muthén & Muthén; 2004. [Google Scholar]
Raudenbush SW. Maximum likelihood estimation for unbalanced multilevel covariance structure models via the EM algorithm. British Journal of Mathematical & Statistical Psychology. 1995;48:359–370. [Google Scholar]
Yuan K-H, Bentler PM. Asymptotic robustness of standard errors in multilevel structural equation models. Journal of Multivariate Analysis. 2006;97:1121–1141. [Google Scholar]
Yung YF, Bentler PM. On added information for ML factor analysis with mean and covariance structures. Journal of Educational & Behavioral Statistics. 1999;24:1–20. [Google Scholar]

[R1] Aitchison J, Silvey SD. Maximum likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics. 1958;29:813–828. [Google Scholar]

[R2] Bentler PM. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc.; 2006. ISBN: 1-885889-03-7 ( www.mvsoft.com). [Google Scholar]

[R3] Bertsekas DP. Multiplier method: A survey. Automatica. 1976;12:133–145. [Google Scholar]

[R4] Buse A. The likelihood ratio, Wald and Lagrange multiplier tests: An expository note. American Statistician. 1982;36:153–157. [Google Scholar]

[R5] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm (with discussion) Journal of the Royal Statistical Society (Series B) 1977;39:1–38. [Google Scholar]

[R6] du Toit S, du Toit M. Multilevel structural equation modeling. In: de Leeuw J, Meijer E, editors. Handbook of Multilevel Analysis (Chapter 12) New York: Springer Verlag; 2008. [Google Scholar]

[R7] Jamshidian M. On algorithms for restricted maximum likelihood estimation. Computational Statistics & Data Analysis. 2004;45:137–157. [Google Scholar]

[R8] Kim DK, Taylor JMG. The restricted EM algorithm for maximum likelihood estimation under linear restrictions on parameters. Journal of the American Statistical Association. 1995;90:708–716. [Google Scholar]

[R9] Lange K. A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society (Series B) 1995a;57:425–437. [Google Scholar]

[R10] Lange K. A Quasi-Newton acceleration of the EM algorithm. Statistica Sinica. 1995b;5:1–8. [Google Scholar]

[R11] Lee SY. Constrained estimation in covariance structure analysis. Biometrika. 1979;66:539–545. [Google Scholar]

[R12] Lee SY, Poon WY. Analysis of two-level structural equation models via EM type algorithms. Statistica Sinica. 1998;8:749–766. [Google Scholar]

[R13] Lee SY, Tsang SY. Constrained maximum likelihood estimation of two-level covariance structure model via EM type algorithms. Psychometrika. 1999;64:435–450. [Google Scholar]

[R14] Liang J, Bentler PM. A new EM algorithm for fitting two-level structural equation models. Psychometrika. 2004;69:101–122. [Google Scholar]

[R15] McDonald RP, Goldstein H. Balanced versus unbalanced designs for linear structural relations in two-level data. British Journal of Mathematical & Statistical Psychology. 1989;42:215–232. [Google Scholar]

[R16] McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. New York: Wiley; 1997. [Google Scholar]

[R17] Muthén BO. Multilevel covariance structure analysis. Sociological Methods & Research. 1994;22:376–398. [Google Scholar]

[R18] Muthén LK, Muthén BO. Mplus Version 3 User’s Guide. Los Angeles: Muthén & Muthén; 2004. [Google Scholar]

[R19] Raudenbush SW. Maximum likelihood estimation for unbalanced multilevel covariance structure models via the EM algorithm. British Journal of Mathematical & Statistical Psychology. 1995;48:359–370. [Google Scholar]

[R20] Yuan K-H, Bentler PM. Asymptotic robustness of standard errors in multilevel structural equation models. Journal of Multivariate Analysis. 2006;97:1121–1141. [Google Scholar]

[R21] Yung YF, Bentler PM. On added information for ML factor analysis with mean and covariance structures. Journal of Educational & Behavioral Statistics. 1999;24:1–20. [Google Scholar]

PERMALINK

Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models²

Peter M Bentler

Jiajuan Liang

Man-Lai Tang

Ke-Hai Yuan

Abstract

A REVIEW ON THE EXISTING MODEL

THE ALGORITHM ASSOCIATED WITH THE LAGRANGE MULTIPLIER

A MONTE CARLO STUDY

FIGURE 1.

TABLE 1.

TABLE 2.

PRACTICAL ILLUSTRATION ON LINEAR CONSTRAINTS

FIGURE 2.

TABLE 3.

TABLE 4.

CONCLUDING REMARKS

ACKNOWLEDGMENTS

APPENDIX

EQS Input Program for the model in Figure 2

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models2

Peter M Bentler

Jiajuan Liang

Man-Lai Tang

Ke-Hai Yuan

Abstract

A REVIEW ON THE EXISTING MODEL

THE ALGORITHM ASSOCIATED WITH THE LAGRANGE MULTIPLIER

A MONTE CARLO STUDY

FIGURE 1.

TABLE 1.

TABLE 2.

PRACTICAL ILLUSTRATION ON LINEAR CONSTRAINTS

FIGURE 2.

TABLE 3.

TABLE 4.

CONCLUDING REMARKS

ACKNOWLEDGMENTS

APPENDIX

EQS Input Program for the model in Figure 2

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Constrained Maximum Likelihood Estimation for Two-level Mean and Covariance Structure Models²