A two-level structural equation model approach for analyzing multivariate longitudinal responses

Xin-Yuan Song; Sik-Yum Lee; Yih-Ing Hser

doi:10.1002/sim.3266

. Author manuscript; available in PMC: 2010 Mar 10.

Published in final edited form as: Stat Med. 2008 Jul 20;27(16):3017–3041. doi: 10.1002/sim.3266

A two-level structural equation model approach for analyzing multivariate longitudinal responses

Xin-Yuan Song ^1,^*,^†,^‡, Sik-Yum Lee ¹, Yih-Ing Hser ²

PMCID: PMC2836235 NIHMSID: NIHMS172524 PMID: 18416447

SUMMARY

The analysis of longitudinal data to study changes in variables measured repeatedly over time has received considerable attention in many fields. This paper proposes a two-level structural equation model for analyzing multivariate longitudinal responses that are mixed continuous and ordered categorical variables. The first-level model is defined for measures taken at each time point nested within individuals for investigating their characteristics that are changed with time. The second level is defined for individuals to assess their characteristics that are invariant with time. The proposed model accommodates fixed covariates, nonlinear terms of the latent variables, and missing data. A maximum likelihood (ML) approach is developed for the estimation of parameters and model comparison. Results of a simulation study indicate that the performance of the ML estimation is satisfactory. The proposed methodology is applied to a longitudinal study concerning cocaine use.

Keywords: latent variables, longitudinal study on cocaine use, maximum likelihood, MCEM algorithm, model comparison, ordered categorical variables

1. INTRODUCTION

In public health and biomedical sciences, it is common to encounter latent variables or constructs that cannot be directly measured by a single observed variable, but instead assessed through a number of observed variables. Structural equation models (SEMs) [1] are a flexible class of models for complex modeling of multivariate correlated data to explain the interrelationships among observed and latent variables. In general, SEMs combine the ideas of factor analysis and regression. They are formulated by two major components. The first component is the measurement equation, which is basically a confirmatory factor analysis model for grouping correlated observed variables to `measure' their corresponding latent variables (factors) and for taking the measurement errors into account. The second component is a regression-type structural equation for assessing the effects of independent latent variables on dependent latent variables of interest. As the number of latent variables is much less than the number of original observed variables, the use of a structural equation with latent variables has its advantages over the application of the ordinary regression model with originally observed variables. Through the use of user-friendly software [2, 3], SEMs have been extensively applied to behavioral, educational, and social–psychological sciences in the past years and to biological and medical sciences in recent years. The basic theory and statistical properties of SEMs are mainly developed in the field of psychometrics for assessing latent constructs. Recently, advanced SEM methodologies have also appeared in mainstream statistical journals; see, for example, [4–10]. Moreover, Sanchez et al. [11] presented a review of the basic theory, including its connection to some latent variables models.

In this paper, we propose a novel, dynamic two-level SEM to analyze multivariate response variables measured at multiple time points. More specifically, let u_gt be a random vector for the gth (g=1,…,G) individual measured at time point t (t =1,…,T). A two-level model for u_gt will be defined by u_gt =y_g+v_gt, where y_g is the second-level random vector that is independent of t, and v_gt is the first-level random vector for accounting characteristics that change dynamically with t. A confirmatory factor analysis model with fixed covariates will be defined for y_g for accounting characteristics that are invariant with t. A nonlinear SEM with fixed covariates will be defined for v_gt, in which the latent variables at every time point are divided into independent and dependent latent variables, and their dynamic relationships over time are assessed by a flexible autoregressive nonlinear structural equation (see equation (5)). Here, latent variables are used to represent latent traits that are related to several observed variables (indicators). To cope with the complex nature of real-world data in substantive research, the proposed model also accommodates mixed continuous and ordered categorical variables, and missing data that are missing at random. A maximum likelihood (ML) approach will be developed for analyzing the proposed model. A Monte Carlo Expectation–Maximization (MCEM) algorithm [12] will be developed for obtaining the ML estimates, and the Louis formula [13] will be used to obtain the standard error estimates. The Bayesian information criterion (BIC) will be used for model comparison. Owing to the complexity of the model, the observed-data likelihood involved in the BIC includes intractable multiple integrals. A procedure based on path sampling [14] will be developed to evaluate the observed-data likelihood.

We illustrate our methodology through a longitudinal data set about cocaine use and related phenomena. Cocaine use is a major social and health problem in the United States. According to the 2002 National Survey on Drug Use and Health [15], an estimated 34 million Americans aged 12 years or older (14.4 per cent) reported cocaine use at least once in their lifetime; among them, an estimated 2 million persons reported cocaine use in the past month. Cocaine is consistently detected in the urine specimens from approximately one-third of arrestees tested across the nation each year [16]. Furthermore, cocaine is the illegal drug mentioned most often in emergency department records [17]. There are various latent traits that influence cocaine use. Specifically, many studies have shown substantial impact of psychiatric problems on cocaine-dependent patients [18–20] and that social support is another key latent trait [21–23].

Although treatment for cocaine use has received considerable attention in recent years [24, 25], the longitudinal phenomena of cocaine use in relation to psychiatric problems and social support have yet to be adequately explicated. To illustrate the application of the proposed methodologies, we analyze the UCLA longitudinal data set collected from cocaine-dependent patients at intake, 1 year, 2 years, and 12 years after treatment [26]. As this data set involves ordered categorical outcomes with missing entries, the proposed two-level SEM is required to assess the longitudinal effects of the latent traits (psychiatric problems and social support) to cocaine use.

Although some specific SEMs or latent variable models have been developed for analyzing longitudinal data, their objectives and/or formulations are quite different from the proposed dynamic two-level nonlinear SEMs. For instance, a standard growth curve model [27, 28] can be viewed as the following confirmatory factor analysis model: x_i =Λω_i +∊_i, where x_i is a vector of repeated measures of a univariate response variable, Λ is the factor loading matrix of sequential known values of the growth curve records, and ω_i is the latent growth factor containing the `initial status (intercept)' and `rate of change (slope),' and ∊_i is the vector of residuals. This is a single-level linear SEM, in which the latent variables in ω_i are used to assess the characteristic of change over time rather than latent traits such as `psychiatric problem' and `social support' in our example. Hence, the formulation of a growth curve model is less general, and the interpretation of the latent variables is very different. Dunson [29] recently analyzed multidimensional longitudinal data by developing dynamic latent variable models, which allow mixtures of count, and categorical and continuous response variables. In this model, to assess the changes between the latent variables at different time points, the latent variable ω_it at time t for individual i was regressed on the linear terms of the past latent vectors ω_i1,…,ω_i,t−1 through a regression equation with fixed covariates. Note that Dunson's [29] model is single-level and its latent variables are not divided into independent and dependent latent variables. Compared with Dunson's single-level model, our two-level model includes a second-level model for assessing the individuals' characteristics that are invariant with time. Moreover, latent variables at the first level of our model are divided into independent and dependent latent variables, and the dynamic nonlinear effects of the independent latent variables on the dependent latent variables are assessed through a rather general autoregressive nonlinear structural equation (see equation (5)). Similarly, the multilevel model [30] for the longitudinal profiling of health-care units does not involve an autoregressive nonlinear structural equation among covariates and latent variables. Other longitudinal models in the statistics literature, such as the linear mixed model [31] and the generalized linear mixed model [32], do not involve the latent traits and their associated structural equation. Clearly, the objectives and formulations of these models and our proposed model are very different. Finally, refer to the second remark given in Section 2.3 for the differences of the proposed model and the existing multilevel SEMs.

The contributions of this paper include (i) a novel dynamic two-level nonlinear SEM with fixed covariates for analyzing longitudinal data from mixed continuous and ordered categorical measures and missing data, (ii) a novel utilization of the computational tools, such as the MCEM algorithm [12] and path sampling [14] with the derivation of some new conditional distributions and derivatives for estimation and model comparison of the proposed model, and (iii) a novel application of the developed methodologies to the longitudinal study of cocaine use.

Section 2 presents the two-level SEM for analyzing multivariate random responses with mixed continuous and ordered categorical variables measured at multiple time points. An ML approach for estimation and model comparison in the proposed model framework is discussed in Section 3. A simulation study and a real example using the longitudinal data of cocaine use are presented in Sections 4 and 5, respectively. In Section 6, we discuss some limitations and outline several extensions for further research.

2. THE SEM FOR MULTIVARIATE LONGITUDINAL DATA

Consider a set of observations of a p×1 random vector u_gt for individual g 1, …, G, which was measured at multiple time points t =1, …, T. The collection U={u_gt :g=1, …, G, t =1, …, T} can be regarded as an observed sample of hierarchical observations that were measured at different time points (first level) and were nested in the individuals (second level). Hence, the following two-level SEM is proposed to model u_gt :

u_{g t} = y_{g} + v_{g t}

(1)

where y_g and v_gt are the random vectors that, respectively, model the second-level effect of the individual g and the first-level effect with respect to the individual g at time t.

2.1. The second-level model

The second-level model for assessing characteristics of individuals that are invariant over time is defined by

y_{g} = A_{0} c_{g 0} + Λ_{0} ω_{g 0} + ε_{g 0}

(2)

where c_g0 is a vector of fixed covariates, ω_g0 is a vector of latent variables, ε_g0 is a vector of residual errors, and A₀ and Λ₀ are matrices of unknown coefficients. We assume that ω_g0 is identically and independently distributed (i.i.d.) as N[0,Φ₀] and that ε_g0 is independent of ω_g0, and i.i.d. as N[0,Ψ₀], where Ψ₀ is a diagonal matrix. This is a factor analysis model with covariates [33], which is defined for studying the relationships between the observed variables in u_gt and the latent variables in ω_g0 with respect to different individuals but common to all time points.

2.2. The first-level model

We define the following measurement model for the more important first-level random vector v_gt :

v_{g t} = A_{t} c_{g t} + Λ_{t} ω_{g t} + ε_{g t}, g = 1, \dots, G, t = 1, \dots, T

(3)

in which the definitions of A_t, c_gt, Λ_t, ω_gt, and ε_gt are similar to those given in equation (2); except here they are defined at time point t nested within the individual g, and the distribution of ε_gt is N[0,Ψ_t], where Ψ_t is assumed to be a diagonal matrix for brevity. Hence, we assume that the random vector u_gt conditional on y_g has the following structure:

u_{g t} = y_{g} + A_{t} c_{g t} + Λ_{t} ω_{g t} + ε_{g t}, g = 1, \dots, G, t = 1, \dots, T

(4)

Conditional on y_g, equation (4) accounts for dependency among the observed variables measured for the individual g at a given time point t through the fixed covariates c_gt and the shared latent variables ω_gt. The relationships between the observed variables and the corresponding fixed covariates and latent variables that change dynamically over time can be assessed by estimating A_t and Λ_t.

According to the spirit of SEM, we consider a partition of ω_gt into $(η_{g t}^{'}, ξ_{g t}^{'})'$ , where η_gt and ξ_gt are q₁-dimensional-dependent and q₂-dimensional-independent random latent vectors, respectively, q=q₁+q₂. Effects of the independent latent variables ξ_gt on the dependent latent variables η_gt are studied through the following autoregressive nonlinear structural equation: For g=1, …, G, t =1, …, T,

η_{g t} = B_{0} d_{g 0} + B_{t} d_{g t} + Γ_{t}^{*} F_{t}^{*} (ω_{g 1}, \dots, ω_{g, t - 1}, ξ_{g t}) + δ_{g t}

(5)

where B₀, B_t, and $Γ_{t}^{*}$ are matrices of unknown coefficients, d_g0 is a vector of fixed covariates, which is independent of t, d_gt is another vector of fixed covariates, δ_gt is a vector of residual errors, and $F_{t}^{*} = (f_{t 1}, \dots, f_{t r})$ is a vector-valued function in which f_tj is a differentiable function of the independent latent vector ξ_gt at the current time point and the latent vectors ω_g1, …, ω_g,t−1 at previous time points. The residual vector δ_gt is independent of ε_gt, ω_g1, …, ω_g,t−1, and ξ_gt. Moreover, these residual errors are i.i.d. N[0,Ψ_δt], where Ψ_δt is a diagonal matrix. Let $ξ_{g} = (ξ_{g 1}^{'}, \dots, ξ_{g T}^{'})'$ ; the distribution of ξ_g is assumed to be N[0,Φ], where Φ contains the variance-covariance matrix Φ_tt of ξ_gt, and the covariance matrix Φ_{t_it_j} of ξ_{gt_i} and ξ_{gt_j} at different time points t_i and t_j.

2.3. Remarks

1. The first and second terms of equation (5) allow for direct effects of observed predictors that are invariant over time (d_g0) and variant over time (d_gt) on the dependent latent variables. The term $Γ_{t}^{*} F_{t}^{*} (\cdot)$ is very important for assessing the effects of independent latent variables on dependent latent variables to change dynamically over time. It allows various flexible autoregressive structures for recovering the dependency of the dependent latent variable η_gt at the current time point and across both dependent latent variables at previous time points and independent latent variables at previous and current time points. For example, typical special cases are

(a) η_{g t} = β_{t - 1} η_{g, t - 1} + Γ_{1} ξ_{g 1} + \dots + Γ_{t - 1} ξ_{g, t - 1} + Γ_{t} F_{t} (ξ_{g t}) + δ_{g t}

(b) η_{g t} = β_{1} η_{g 1} + \dots + β_{t - 1} η_{g, t - 1} + Γ_{t} F_{t} (ξ_{g t}) + δ_{g t}

where F_t is a vector of differentiable functions of ξ_gt. In the special case (a), η_gt is assessed by the effects of linear-independent latent vectors at all previous time points, the dependent-latent vector at t−1, and finally nonlinear terms of independent latent variables at the current time point. The autoregressive structure (b) allows the dependency of η_gt on the previous endogenous latent vector η_g1, …, η_g,t−1 and a general function F_t (ξ_gt) of the current exogenous latent vector ξ_gt. Note that the longitudinal models developed in Dunson [29] and Daniels and Normand [30] do not involve a structural equation in relation to covariates and latent variables.

2. Other multilevel SEMs have been developed to analyze hierarchical data collected from units that are nested within clusters [8, 34]. Owing to different objectives, the formulations of the first-level model for v_gt in those multilevel SEMs are quite different and do not accommodate certain important features of our dynamic model. For example, their matrices of coefficient parameters and covariance matrices of errors are variant with cluster g rather than t. Moreover, for t ≠r, their v_gt and v_gr are assumed to be independent; however, v_gt and v_gr could be correlated in our model.

2.4. Ordered categorical data and identification of the model

From its definition, u_gt is a vector of continuous observations. However, in most substantive research, the continuous measurements of some components of u_gt may not be available. Suppose that u_gt is composed of a subvector x_gt of observed measurements and a subvector w_gt of unobserved measurements whose information is given by ordered categorical observations. In the generic sense, the relationship between an ordered categorical variable z and its underlying continuous variable w is defined by z=k+1 if α_k≤w<α_k+1, for k=0, …, m−1, where {−∞=α₀<α₁<…<α_m=∞} is the set of threshold parameters that define the m categories. A special case with m=2 is the dichotomous or ordered binary variable. Missing data that are missing at random (MAR, [35]) are handled using the procedure given in [36]. Details are not included here for brevity.

The proposed model is not identified without imposing identification conditions. For example, the variance and thresholds associated with each ordered categorical variable are not identifiable; the first- and second-level covariance structures are not identified, and we cannot allow intercepts to simultaneously exist in equations (3) and (5). Existing methods suggested in the SEM literature can be adopted for identifying various components of the current model. For example, the identification problem associated with an ordered categorical variable can be solved by fixing the thresholds α₁ and/or α_m−1 at some appropriate preassigned values [6, 36], and the covariance structures in the first- and second-level models can be identified by the common practice in structural equation modeling by fixing appropriate elements in Λ₀ and Λ_t at preassigned values [8].

3. AN ML APPROACH

3.1. Estimation

Let α be the parameter vector of the unknown thresholds, and θ be the parameter vector that contains all the unknown distinct structural parameters that are involved in equations (2), (3), and (5). The likelihood function of the observed data involves high-dimensional intractable integrals that are induced by the ordered categorical variables and latent variables. One approach to handle this integral is using adaptive quadrature [34]. In this paper, we use the idea of data augmentation and the MCEM algorithm [12] in the ML estimation.

Let z_gt ={z_{gt j}; j =1, …, s} be a vector of ordered categorical observations, and let u_gt = {w_gt, x_gt}, where x_gt is a vector of continuous measurements and w_gt is the latent continuous measurement corresponding to z_gt, u_g ={u_gt; t =1, …, T}, Z={z_gt; g=1, …, G, t =1,…, T}, W={w_gt; g=1, …, G, t =1, …, T}, and X={x_gt; g=1, …, G, t =1, …, T}. Moreover, let $ω_{g} = (η_{g 1}^{'}, \dots, η_{g T}^{'}, ξ_{g 1}^{'}, \dots, ξ_{g T}^{'})'$ , Ω₁={ω_g; g=1,…, G}, Ω₀={ω_g0; g=1, …, G}, and Y={y_g;g=1,…, G} be matrices of latent vector at the first and second levels. Further, let $Π_{t} = (B_{t}^{'} Γ_{t}^{*'})'$ , and $h_{g t} = (d_{g t}^{'}, F_{t}^{*} (ω_{g 1}, \dots, ω_{g, t - 1}, ξ_{g t})')'$ , then equation (5) can be rewritten as

η_{g t} = B_{0} d_{g 0} + Π_{t} h_{g t} + δ_{g t}

(6)

The observed data set is (X,Z). The observed-data likelihood is very complicated due to the existence of latent quantities (Y,Ω₁,Ω₂,W). Utilizing the idea of data augmentation, we consider the complete data set, which is equal to {Y,Ω₀,Ω₁,W,X,Z}={Y,Ω₀,Ω₁,U,Z}. The complete data likelihood function is equal to

\begin{matrix} p & (Y, Ω_{0}, Ω_{1}, U, Z ∣ α, θ) \\ = p (Z ∣ Y, Ω_{0}, Ω_{1}, U, α, θ) p (U ∣ Y, Ω_{1}, θ) p (Ω_{1} ∣ θ) p (Y ∣ Ω_{0}, θ) p (Ω_{0} ∣ θ) \\ = {(2 π)}^{- G T_{p} ∕ 2} {∣ Ψ_{t} ∣}^{- G T ∕ 2} \prod_{g = 1}^{G} \prod_{t = 1}^{T} \exp {- \frac{1}{2} (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t})' Ψ_{t}^{- 1} \\ \times (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t})} I_{R_{g t}} (w_{g t}) \\ \times {(2 π)}^{- G T_{q_{1}} ∕ 2} {∣ Ψ_{δ t} ∣}^{- G T ∕ 2} \prod_{g = 1}^{G} \prod_{t = 1}^{T} \exp {- \frac{1}{2} (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})' Ψ_{δ t}^{- 1} (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})} \\ \times {(2 π)}^{- G_{q_{2}} ∕ 2} {∣ Φ ∣}^{- G ∕ 2} \prod_{g = 1}^{G} \exp {- \frac{1}{2} ξ_{g}^{'} Φ^{- 1} ξ_{g}} \\ \times {(2 π)}^{- G_{q} ∕ 2} {∣ Φ_{0} ∣}^{- G ∕ 2} \prod_{g = 1}^{G} \exp {- \frac{1}{2} ω_{g 0}^{'} Φ_{0}^{- 1} ω_{g 0}} \\ \times {(2 π)}^{- G_{p} ∕ 2} {∣ Ψ_{0} ∣}^{- G ∕ 2} \prod_{g = 1}^{G} \exp {- \frac{1}{2} (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0})' Ψ_{0}^{- 1} (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0})} \end{matrix}

(7)

where I_{R_gt} (w_gt) is an indicator function that takes the value 1 if w_gt ∈ R_gt and zero otherwise, and R_gt =[α_{1,z_gt1},α_{1,z_gt1+1})×…×[α_{s,z_gts},α_{s,z_gts+1}. Note that for every component w_gtj in w_gt, there exists one and only one [α_{j,z_gtj},α_{j,z_gtj+1}) such that w_gtj is in [α_{j,z_gtj},α_{j,z_gtj+1}). Hence, the indicator function and the corresponding value of the density function are nonzero. It can be shown from (7) that the complete-data log-likelihood function can be expressed as L_c(Y,Ω₀,Ω₁,U,Z;α,θ)=L₁+L₂, where

\begin{matrix} L_{1} = & - \frac{1}{2} \sum_{g = 1}^{G} [q_{2} \log (2 π) + \log ∣ Φ ∣ + ξ_{g}^{'} Φ^{- 1} ξ_{g} + \sum_{t = 1}^{T} {(p + q_{1}) \log (2 π) + \log ∣ Ψ_{t} ∣ \\ + \log ∣ Ψ_{δ t} ∣ + (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t})' Ψ_{t}^{- 1} (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t}) \\ + (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})' Ψ_{δ t}^{- 1} (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})}] for w_{g t} \in R_{g t} \end{matrix}

(8)

\begin{matrix} L_{2} = & - \frac{1}{2} \sum_{g = 1}^{G} {(p + q) \log (2 π) + \log ∣ Ψ_{0} ∣ + \log ∣ Φ_{0} ∣ + ω_{g 0}^{'} Φ_{0}^{- 1} ω_{g 0}, \\ + (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0})' Ψ_{0}^{- 1} (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0})} \end{matrix}

(9)

Note that L_c decomposes into two separable functions, each with distinct sets of parameters. This property simplifies the estimation significantly, although due to the complex function h_gt, L₁ is still rather complicated. ML estimates of α and θ are obtained using the EM algorithm that consists of the following E-step and M-step at the lth iteration: E-step: Evaluate Q(α,θ|α^(l),θ^(l))=E{L_c(Y,Ω₀,Ω₁,U,Z;α,θ)|X,Z,α^(l),θ^(l)}, in which the expectation is taken with respect to the conditional distribution of (Y,Ω₀,Ω₁,W) given (X,Z) at (α^(l),θ^(l)). M-step: Determine (α^(l+1),θ^(l+1)) by maximizing Q(α,θ|α^(l),θ^(l)).

Evaluation of the E-step is rather complicated because the conditional expectations involve intractable multiple integrals. Inspired by the MCEM algorithm [12], we solve this problem by simulating a sufficiently large number of observations from the corresponding conditional distributions using a hybrid algorithm that combines the Gibbs sampler [37] and the Metropolis–Hastings (MH) algorithm [38, 39]. In this algorithm, observations are sampled interactively from the following conditional distributions: p(Y|Ω₀,Ω₁,U,Z,α,θ), p(Ω₀|Y,Ω₁,U,Z,α,θ), p(Ω₁|Y,Ω₀,U,Z,α,θ), and p(W|Y,Ω₀,Ω₁,X,Z,α,θ). Expressions for these conditional distributions are provided in Appendix A.

The M-step updates unknown parameters by maximizing the conditional expectations obtained in the E-step. Structural parameter θ can be updated by maximizing the conditional expectation of L₁ and L₂ with respect to θ by solving the following system of equations:

\frac{\partial Q (α, θ ∣ α^{(l)}, θ^{(l)})}{\partial θ} = E {\frac{\partial}{\partial θ} (L_{1} + L_{2}) ∣ X, Z, α^{(l)}, θ^{(l)}} = 0

(10)

Threshold parameters α can be updated according to the method described in Shi and Lee [6], and Lee and Song [8]. The M-step is completed by conditional maximization. Technical details on the derivatives of Q(α,θ|α^(l),θ^(l)) with respect to components of θ and the corresponding solutions are given in Appendix B. The convergence of the MCEM algorithm is monitored by the following method proposed in Shi and Copas [40]. After the m₀th MCEM iteration, ${\overset{‒}{θ}}^{(l)} = (θ^{(l - m_{0} + 1)} + \dots + θ^{(l)}) ∕ m_{0}$ is computed, and the convergence is monitored using the following stopping rule: For given small values $δ_{1}^{*}$ and $δ_{2}^{*}$ (e.g. 0.001), the procedure is stopped if $‖ {\overset{‒}{θ}}^{(l)} - {\overset{‒}{θ}}^{(l - γ_{o}^{*})} ‖ ∕ (‖ {\overset{‒}{θ}}^{(l - γ_{0}^{*})} ‖ + δ_{1}^{*})$ is smaller than some predetermined small value $δ_{2}^{*}$ . To avoid the danger of stopping early, a value of $γ_{0}^{*}$ larger than 2 is suggested. Convergence is claimed after the stopping rule is satisfied for several consecutive iterations, and ${\overset{‒}{θ}}^{(l)}$ is then taken to be the ML estimate. The sample observations of Ω₀ and Ω₁ simulated at the last iteration of the MCEM algorithm provide estimates of the latent variables. Shi and Copas [40] argued that for a sufficiently large m₀, the average of the associative Monte Carlo errors is negligible. To reduce the bias, an appropriate m₀ that can control the Monte Carlo errors within a bearable limit is taken. In the simulation study and the example given in Sections 4 and 5, we take m₀ =50. Finally, good starting values of some parameters (such as Λ_t and ∏_t) could be obtained from the estimates achieved through separate analysis of the individual model at each time point. As each individual model only involves a comparatively small number of parameters, little computing effort is required.

Standard error estimates of the structural parameters in θ are obtained by the following Louis formula [24] through the simulated observations and the ML estimates:

- \frac{\partial^{2} L (X, Z ∣ α, θ)}{\partial θ \partial θ^{T}} = E {- \frac{\partial^{2} L (Y, Ω_{0}, Ω_{1}, U, Z ∣ α, θ)}{\partial θ \partial θ^{T}}} - Var {\frac{\partial L (Y, Ω_{0}, Ω_{1}, U, Z ∣ α, θ)}{\partial θ}}

where expectations are taken with respect to the conditional distribution of (Y,Ω₀,Ω₁,W) given (X,Z) and (α,θ), and the whole expression is evaluated at $(\hat{α}, \hat{θ})$ . These expectations are difficult to evaluate analytically due to existence of the latent quantities, but they can be approximated, respectively, by the sample mean and the sample covariance matrix of the random sample ${(Y^{(j)}, Ω_{0}^{(j)}, Ω_{1}^{(j)}, W^{(j)}); j = 1, \dots, J}$ , generated from $p (Y, Ω_{0}, Ω_{1}, W ∣ X, Z, \hat{α}, \hat{θ})$ using the proposed hybrid algorithm. Details are not given to save space.

3.2. Model comparison

Model comparison is an important issue in SEM analysis. For the current two-level SEM, it is interesting to evaluate competing models that correspond to various hypotheses about the characteristics of the parameters with respect to time changes. In this article, competing models M₁ and M₂ are compared with the following BIC:

{BIC}_{12} = - 2 {\log p (X, Z ∣ {\hat{α}}_{1}, {\hat{θ}}_{1}, M_{1}) - \log p (X, Z ∣ {\hat{α}}_{2}, {\hat{θ}}_{2}, M_{2})} + (d_{1} - d_{2}) \log (G T)

(11)

where $p (X, Z ∣ {\hat{α}}_{k}, {\hat{θ}}_{k}, M_{k})$ is the observed-data likelihood evaluated at the ML estimates of α_k and θ_k under M_k, and d_k is the dimension of θ_k. The criterion given in Kass and Raftery [41] can be used for interpreting BIC₁₂.

In obtaining BIC₁₂, we must compute the observed-data log-likelihood $\log p (X, Z ∣ {\hat{α}}_{h}, {\hat{θ}}_{h}, M_{h})$ which involves complicated multiple integrals. Inspired by its successful application in computing the observed-data likelihood for complex latent variable models [8, 42], we apply path sampling [14] to compute BIC₁₂. Path sampling is a generalization of bridge sampling [43] and gives more accurate results in computing complicated multiple integrals. It is conceptually simple, easy to implement, and applicable to a wide range of models [14].

4. A SIMULATION STUDY

We have conducted an extensive simulation study in different settings to empirically evaluate the performance of the MCEM algorithm. To save space, only the results obtained under the following settings are reported here. A special case of the two-level SEM defined in (2), (4), and (5) with p=9 and G=500 at three time points is considered. For the second-level model associated with individuals, c_g0 is a 2×1 vector of covariates, in which the first entry is taken to be 1 to represent the intercept, and the second entry is obtained from observations that are simulated from a standard normal distribution. The true population values of the parameters are given as follows:

A_{0}^{'} = [\begin{matrix} 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 & 0.5 \end{matrix}], Φ_{0} = [\begin{matrix} 1.0 & 0.3 & 0.3 \\ 0.3 & 1.0 & 0.3 \\ 0.3 & 0.3 & 1.0 \end{matrix}]

Λ_{0}^{'} = [\begin{matrix} 1.0 * & 0.6 & 0.6 & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 1.0 * & 0.6 & 0.6 & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 1.0 * & 0.6 & 0.6 \end{matrix}], Ψ_{0} = diag (0.2, \dots, 0.2)

In this paper, parameters with an asterisk are fixed to identify the model. For the measurement equation of the first-level model associated with t =1,2,3 (see (3)), no fixed covariates are involved. The true values of parameters in Λ_t and Ψ_t are given by

Λ_{t}^{'} = [\begin{matrix} 1.0 * & 0.8 & 0.8 & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 1.0 * & 0.8 & 0.8 & 0.0 * & 0.0 * & 0.0 * \\ 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 0.0 * & 1.0 * & 0.8 & 0.8 \end{matrix}], Ψ_{t} = diag (0.5, \dots, 0.5)

The latent variable in ω_gt=(η_gt,ξ_1gt,ξ_2gt)′ is modeled by the following structural equation:

η_{g t} = b_{10} d_{1 g 0} + b_{20} d_{2 g 0} + b_{t} d_{g t} + β_{t - 1} η_{g, t - 1} + γ_{1 t} ξ_{1 g t} + γ_{2 t} ξ_{2 g t} + γ_{3 t} ξ_{1 g t} ξ_{2 g t} + δ_{g t}

(12)

where the two components in the fixed covariate vector d_g0=(d_1g0,d_2g0)′ are independently generated from the standard bivariate normal distribution, and the fixed covariate d_gt (t=1,2,3) are generated from Bernoulli distribution with probability of success 0.6. The true values of the parameters involved in (12) are: b₁₀=1.0,b₂₀=−1.0; and for all t =1,2,3, b_t =0.7,β_t−1=0.6, γ_1t =γ_2t =0.6, γ₃₁=−0.4,γ₃₂=−0.6,γ₃₃=−0.8, Ψ_δt=0.5, (ϕ_t11,ϕ_t12ϕ_t22) in Φ_tt equal to (1.0,0.3,1.0), and covariances of ξ_{igt_j} and ξ_{kgt_l} are all 0.3. Based on the above specifications of the model and the true parameter values, continuous observations of u_gt as defined by (4) can be obtained. To consider the model with ordered categorical outcomes, the 5th, 6th, and 9th entries in every u_gt were transformed to ordered categorical using the following thresholds (−1.2*,−0.6,0.6,1.2*). Moreover, missing data are considered by randomly deleting some entries in u_gt. About 80 per cent of the observations in the data set are fully observed. The total number of parameters in this two-level SEM is 142. In the MCEM algorithm for computing the ML estimates, we generate 100 observations to approximate the conditional expectations at the E-step of each MCEM iteration. Convergence can be monitored by the method of Shi and Copas [40] by using $γ_{0} = 3, δ_{1}^{*} = δ_{2}^{*} = 0.001$ . Convergence is claimed if the stopping rule is satisfied for three consecutive iterations. The algorithm stops at less than 100 MCEM iterations in all 100 replications.

Based on 100 replications, the bias (Bias) and the root mean squares (Rms) between the estimates and the true population values of the parameters were computed as follows:

Bias of \hat{θ} (h) = 100^{- 1} \sum_{r = 1}^{100} {\hat{θ}}_{r} (h) - θ_{0} (h), Rms of \hat{θ} (h) = {100^{- 1} \sum_{r = 1}^{100} {[{\hat{θ}}_{r} (h) - θ_{0} (h)]}^{2}}^{\frac{1}{2}}

where θ₀(h) is the hth element of the true parameter vector, and ${\hat{θ}}_{r} (h)$ is its ML estimate. Standard error (SE) of the estimates were obtained by the arithmetic mean of SE computed through the Louis formula [13] in 100 replications. Simulation results obtained on the basis of 100 replications are presented in Tables I and II. We observe from these tables that the `Bias,' `Rms,' and `SE' are rather small. These results indicate that the empirical performance of the ML estimates obtained by the MCEM algorithm is satisfactory. To save space, results on the nuisance threshold parameters are not presented.

Table I.

Means, root mean squares, and standard errors of the parameter estimates in the first level.

First level
Measurement model				Structural equation
Par	Bias	Rms	SE	Par	Bias	Rms	SE	Par	Bias	Rms	SE
λ_1,21	0.004	0.029	0.025	b₁₀	−0.009	0.044	0.019	ϕ₁₁	0.052	0.151	0.082
λ_1,31	0.004	0.029	0.025	b₂₀	0.017	0.047	0.020	ϕ₁₂	0.003	0.075	0.071
λ_1,52	−0.004	0.081	0.072	b₁	0.001	0.096	0.079	ϕ₁₃	0.043	0.091	0.068
λ_1,62	−0.008	0.074	0.057	b₂	0.001	0.079	0.064	ϕ₁₄	0.013	0.086	0.061
λ_1,83	−0.005	0.068	0.054	b₃	−0.010	0.075	0.070	ϕ₁₅	0.041	0.100	0.066
λ_1,93	−0.020	0.074	0.063	β₁	0.011	0.042	0.022	ϕ₁₆	0.008	0.078	0.063
λ_2,21	−0.000	0.018	0.016	β₂	0.004	0.026	0.015	ϕ₂₂	0.023	0.144	0.104
λ_2,31	−0.001	0.019	0.016	γ₁₁	0.004	0.091	0.053	ϕ₂₃	0.008	0.090	0.059
λ_2,52	0.001	0.072	0.064	γ₂₁	−0.004	0.085	0.047	ϕ₂₄	0.020	0.097	0.062
λ_2,62	−0.016	0.068	0.058	γ₃₁	0.034	0.081	0.031	ϕ₂₅	0.010	0.079	0.064
λ_2,83	0.005	0.066	0.044	γ₁₂	−0.011	0.080	0.047	ϕ₂₆	0.020	0.097	0.076
λ_2,93	−0.001	0.064	0.065	γ₂₂	−0.015	0.070	0.066	ϕ₃₃	0.066	0.144	0.086
λ_3,21	−0.001	0.015	0.011	γ₃₂	0.045	0.096	0.047	ϕ₃₄	0.023	0.090	0.061
λ_3,31	−0.001	0.014	0.014	γ₁₃	−0.022	0.090	0.063	ϕ₃₅	0.049	0.101	0.064
λ_3,52	−0.002	0.064	0.065	γ₂₃	−0.020	0.078	0.052	ϕ₃₆	0.013	0.085	0.067
λ_3,62	−0.004	0.071	0.063	γ₃₃	0.054	0.109	0.044	ϕ₄₄	0.037	0.132	0.091
λ_3,83	−0.007	0.063	0.041					ϕ₄₅	0.017	0.088	0.066
λ_3,93	0.001	0.071	0.070					ϕ₄₆	0.029	0.105	0.062
								ϕ₅₅	0.067	0.139	0.082
								ϕ₅₆	0.014	0.077	0.060
								ϕ₆₆	0.037	0.141	0.080

Open in a new tab

Table II.

Means, root mean squares, and standard errors of the parameter estimates in the second level.

Par	Bias	Rms	SE	Par	Bias	Rms	SE	Par	Bias	Rms	SE
a_0,11	0.008	0.048	0.041	a_0,21	−0.039	0.065	0.046	λ_0,21	−0.043	0.065	0.042
a_0,12	0.005	0.039	0.032	a_0,22	−0.024	0.052	0.035	λ_0,31	−0.037	0.061	0.041
a_0,13	0.004	0.039	0.033	a_0,23	−0.024	0.044	0.029	λ_0,52	−0.041	0.067	0.049
a_0,14	0.008	0.050	0.041	a_0,24	−0.022	0.049	0.046	λ_0,62	−0.029	0.055	0.049
a_0,15	0.003	0.035	0.074	a_0,25	−0.023	0.043	0.038	λ_0,83	−0.034	0.060	0.042
a_0,16	0.000	0.042	0.075	a_0,26	−0.018	0.044	0.046	λ_0,93	−0.033	0.063	0.047
a_0,17	0.002	0.044	0.048	a_0,27	−0.020	0.050	0.044	ϕ_0,11	−0.040	0.173	0.099
a_0,18	0.004	0.040	0.034	a_0,28	−0.015	0.036	0.032	ϕ_0,12	−0.024	0.103	0.073
a_0,19	−0.001	0.041	0.062	a_0,29	−0.013	0.043	0.040	ϕ_0,13	−0.036	0.105	0.067
								ϕ_0,22	0.008	0.122	0.097
								ϕ_0,23	−0.009	0.077	0.062
								ϕ_0,33	0.015	0.126	0.088

Open in a new tab

5. AN APPLICATION OF THE SEM APPROACH: THE LONGITUDINAL STUDY OF COCAINE USE

The data set in this example is obtained from a longitudinal study about cocaine use conducted at the UCLA Center for Advancing Longitudinal Drug Abuse Research. Various measures were collected from patients admitted to the West Los Angeles Veterans Affair Medical Center in 1988–1989 and who met the DSM III-R criteria for cocaine dependence [44]. These patients were assessed at baseline, one year after treatment, two years after treatment, and 12 years after treatment (t =t₁,t₂,t₃,t₄) in 2002–2003. Among these patients located at the 12-year follow-up, some were confirmed to be deceased, some declined to be interviewed, and some were either out of the country or too ill to be interviewed after one year, two years, or at the 12-year follow-up. Hence, there is a considerable amount of missing data. Seven observed variables at each t are involved in the current analysis: (i) cocaine use (CC), an ordered categorical variable with codings 1–5 to denote days of cocaine use per month that are fewer than 2 days, between 2 and 7 days, between 8 and 14 days, between 15 and 25 days, and more than 25 days, respectively; (ii) Beck inventory (BI), an ordered categorical variable with codings 1–5 to denote scores that are less than 3.0, between 3.0 and 8.0, between 9.0 and 20.0, between 21 and 30, and larger than 30; (iii) depression (DEP), an ordered categorical variable based on the Hopkin Symptom Checklist-58 scores, with codings 1–5 to denote scores that are less than 1.1, between 1.1 and 1.4, between 1.4 and 1.8, between 1.8 and 2.5, and larger than 2.50; (iv) number of friends (NF), an ordered categorical variable with codings 1–5 to denote no friend, 1 friend, 2–4 friends, 5–8 friends, more than 9 friends; (v) `have someone to talk to about problem (TP)'; (vi) `currently employed (EMP)'; and (vii) `alcohol dependence (AD) at baseline.' The last three variables are ordered binary variables with {0,1} for {No,Yes}. The sample size is 223, and the frequencies of all variables at different time points are given in Table III.

Table III.

Frequencies of the ordered categorical variables at different time points in the cocaine use example.

		Categories
Variables		1	2	3	4	5	No=0	Yes=1	Total
t = 1	CC	13	24	38	37	111			223
	BI	14	35	68	31	12			160
	DEP	15	29	29	57	26			156
	NF	27	30	64	27	8			150
	TP						30	170	200
	EMP						88	135	223
	AD						120	103	223
t = 2	CC	89	46	23	22	43			223
	BI	30	47	72	24	10			183
	DEP	37	46	40	48	18			189
	NF	25	27	90	21	18			181
	TP						15	178	193
	EMP						69	154	219
	AD						120	103	223
t = 3	CC	105	30	25	16	47			223
	BI	37	44	78	22	11			192
	DEP	47	40	47	40	13			187
	NF	22	22	94	29	23			190
	TP						9	202	211
	EMP						79	144	223
	AD						120	103	223
t = 4	CC	172	15	14	8	14			223
	BI	56	63	72	15	16			222
	DEP	63	51	43	46	20			223
	NF	21	37	113	26	23			220
	TP						22	198	220
	EMP						84	139	223
	AD						120	103	223

Open in a new tab

In this study, we consider observed variables (CC, BI, DEP, NF, TP) and two fixed covariates (EMP, AD) for each individual, which were measured at four time points. The proposed dynamic two-level SEM (see equations (1)–(3) and (5)) is applied to analyze this data set. The second-level model for assessing invariant effects over time is defined by the following two-factor confirmatory factor analysis model:

y_{g} = A_{0} c_{g 0} + Λ_{0} ω_{g 0} + ε_{g 0}, g = 1, \dots, 223

(13)

where y_g corresponds to (CC, BI, DEP, NF, TP) of the gth individual, c_g0 is fixed at 1.0 so that A₀ is a vector of intercepts, ω_g0=(ω_g01,ω_g02)′, Ψ₀=diag(0.0*,ψ₀₂,ψ₀₃,ψ₀₄,ψ₀₅),

Λ_{0}^{'} = [\begin{matrix} 1.0 * & 0 * & 0 * & 0 * & 0 * \\ 0 * & λ_{22, 0} & λ_{32, 0} & λ_{42, 0} & λ_{52, 0} \end{matrix}] and Φ_{0} = [\begin{matrix} 1.0 * \\ ϕ_{21, 0} & 1.0 * \end{matrix}]

In this example, parameters with an asterisk were fixed for specifying an identified model. Note that as y_g1 is fully defined by CC, λ_11,0 and ψ₀₁ are, respectively, fixed at 1.0 and 0.0 according to the usual practice of SEM [2]. In this formulation, ω_g01 can be interpreted as the invariant portion of cocaine use and ω_g02 can be interpreted as an invariant general latent factor that is not changed over time. The correlation of ω_g01 and ω_g02 is assessed by ϕ_21,0. The first-level model for assessing various dynamic effects that are changed over time is defined by the following nonlinear SEM. The measurement model is defined by

v_{g t} = Λ_{t} ω_{g t} + ε_{g t}, g = 1, \dots, 223, t = t_{1}, t_{2}, t_{3}, t_{4}

(14)

where ω_gt =(η_gt,ξ_1gt,ξ_2gt)′, and

Λ_{t}^{'} = [\begin{matrix} 1.0 * & 0 * & 0 * & 0 * & 0 * \\ 0 * & 1.0 * & λ_{32, t} & 0 * & 0 * \\ 0 * & 0 * & 0 * & 1.0 * & λ_{53, t} \end{matrix}]

The nonoverlapping structure of Λ_t is used for achieving a clear interpretation of the latent variables. Based on the meaning of the observed variables and the nonoverlapping structure of Λ_t, latent variables η_gt,ξ_1gt,ξ_2gt can be clearly interpreted as `cocaine use (CC),' `psychiatric problems,' and `social support.' In the structural equations of the first-level model, cocaine use was treated as the dependent variable (η_gt), and it is regressed on various independent latent variables, together with fixed covariates EMP(d_gt) and AD(d_g0) that are variant and invariant over time, respectively. More specifically, the structural equations are defined by

\begin{matrix} for t = t_{1} : & η_{g 1} = b_{0} d_{g 0} + b_{1} d_{g 1} + γ_{11} ξ_{1 g 1} + γ_{21} ξ_{2 g 1} + δ_{g 1} \\ for t = t_{2}, t_{3}, t_{4} : & η_{g t} = b_{0} d_{g 0} + b_{t} d_{g t} + β_{t - 1} η_{g, t - 1} + γ_{1 t} ξ_{1 g t} + γ_{2 t} ξ_{2 g t} + δ_{g t} \end{matrix}

(15)

Clearly, (15) is a special case of (5). So far, this dynamic two-level SEM involves 94 unknown parameters. As the sample size is relatively small (n=223), it is not worthwhile to treat the nuisance threshold parameters of the ordered categorical variables as unknown. Hence they are fixed at α_jh =Φ*⁻¹ (ρ_jh), where Φ* is the distribution function of N[0,1], and ρ_jh are the observed cumulative marginal proportions of the categories with z_j<h, see [6, 36].

Let M₁ be the two-level model defined as above. To roughly illustrate model comparison with BIC, we compare it with M₂, which is a single-level model that is defined by equations (14) and (15) without the second level corresponding to y_g as given in (13). In the MCEM algorithm for computing the ML estimates under the competing models, we generate 30+5l observations to approximate the conditional expectations at the E-step of the lth iteration (see [8]). Based on the stopping rule given in Section 4, the algorithm stops at the 202nd MCEM iteration in M₁, and ${\overset{‒}{θ}}^{(202)}$ is taken as the ML estimate in M₁. Fewer MCEM iterations are required to attain the convergence in the simpler model M₂. The BIC value for model comparison between M₁ and M₂ is BIC₁₂=−140.6. According to the interpretation of BIC [41], this result indicates that the two-level model M₁ is significantly better than the single-level model M₂. The ML estimates and their standard error estimates (which were computed using 5000 simulated observations) of parameters in the first and second level of the selected model M₁ are, respectively, presented in path diagrams displayed in Figures 1(a) and (b), while those corresponding to Φ are presented in Table IV. Inspired by the model checking technique in regression and the analysis in Lee and Song [8], we use the following estimated residuals to reveal the adequacy of the measurement models and the structural equation in proposed two-level SEM for fitting the data: ${\hat{ε}}_{g t} = {\hat{u}}_{g t} - {\hat{y}}_{g} - {\hat{Λ}}_{t} {\hat{ω}}_{g t}$ , ${\hat{ε}}_{g 0} = {\hat{y}}_{g} - {\hat{A}}_{0} c_{g 0} - {\hat{Λ}}_{0} {\hat{ω}}_{g 0}$ , ${\hat{δ}}_{g 1} = {\hat{η}}_{g 1} - {\hat{b}}_{0} d_{g 0} - {\hat{b}}_{1} d_{g 1} - {\hat{γ}}_{11} {\hat{ξ}}_{1 g 1} - {\hat{γ}}_{21} {\hat{ξ}}_{2 g 1}$ , and ${\hat{δ}}_{g t} = {\hat{η}}_{g t} - {\hat{b}}_{0} d_{g 0} - {\hat{b}}_{t} d_{g t} - {\hat{β}}_{t - 1} {\hat{η}}_{g, t - 1} - {\hat{γ}}_{1 t} {\hat{ξ}}_{1 g t} - {\hat{γ}}_{2 t} {\hat{ξ}}_{2 g t}$ , for t =t₂, t₃, t₄. Plots of the estimated residual in the first level ${\hat{ε}}_{g t 1}$ versus, ${\hat{η}}_{g t}$ , ${\hat{ξ}}_{1 g t}$ , and ${\hat{ξ}}_{2 g t}$ and plots of ${\hat{δ}}_{g t}$ versus ${\hat{ξ}}_{1 g t}$ and ${\hat{ξ}}_{2 g t}$ at t =t₁ are presented in Figure 2. Other estimated residual plots have similar behaviors. These plots lie within two parallel horizontal lines that are centered at zero, and no linear or quadratic trends are detected. This roughly indicates that the proposed measurement models and the structural equation are adequate in fitting the data.

(a) Path diagram and estimates of some parameters of the first-level model in M₁, where observed variables are represented by rectangles, latent variables are represented by ellipse, and standard error estimates are in parenthesis. Here, η_t,ξ_1t, and ξ_2t, respectively, denote `cocaine use (CC),' `psychiatric problems,' and `social support' at t =1,2,3,4. (b) Path diagram and estimates of some parameters of the second-level model in M₁, where observed variables are represented by rectangles, latent variables are represented by ellipse, and standard error estimates are in parenthesis.

Table IV.

ML estimates and their standard error estimates (in parenthesis) of Φ [Φ_{t_it_j}] in the cocaine use example.

t = 1	0.601 (0.054)
	−0.354 (0.049)	0.571 (0.056)
t = 2	0.317 (0.045)	−0.149 (0.029)	0.612 (0.065)				Symmetric part
	−0.283 (0.048)	0.272 (0.045)	−0.355 (0.040)	0.469 (0.051)
t = 3	0.304 (0.053)	−0.426 (0.031)	0.402 (0.061)	−0.324 (0.045)	0.689 (0.048)
	−0.166 (0.053)	0.374 (0.019)	−0.323 (0.051)	0.271 (0.006)	−0.435 (0.047)	0.544 (0.052)
t = 4	0.100 (0.047)	−0.155 (0.042)	0.093 (0.042)	−0.139 (0.039)	0.204 (0.054)	−0.120 (0.043)	0.605 (0.034)
	−0.007 (0.039)	0.137 (0.031)	−0.183 (0.056)	0.235 (0.040)	−0.228 (0.051)	0.149 (0.037)	−0.277 (0.040)	0.567 (0.066)

Open in a new tab

Plots of ${\hat{ε}}_{g 11}$ *versus* ${\hat{η}}_{g 1}$ , ${\hat{ξ}}_{1 g 1}$ , and ${\hat{ξ}}_{2 g 1}$ , and plots of ${\hat{δ}}_{g 1}$ *versus* ${\hat{ξ}}_{1 g 1}$ and ${\hat{ξ}}_{2 g 1}$ .

We can conclude from the model comparison result given by BIC₁₂ (=−140.6) value that it is significantly better to include a second-level model y_g for incorporating characteristics that are invariant over time to analyze the longitudinal properties of u_gt, see equation (1). It follows from equation (13) and Figure 1(b) that the following findings can be achieved: (i) All the factor loading estimates are significant; this indicates substantial associations of the observed variables with the general latent factor. (ii) The correlation between the general latent factor and cocaine use is not significant. (iii) As expected, the intercept estimates corresponding to the ordered categorial variables are close to zero. The intercept estimate $({\hat{a}}_{05})$ corresponding to the dichotomous variable y₅ is 0.281. This indicates that y₅=0 if the underlying latent continuous variable is less than −0.281; otherwise y₅=1. Clearly, the above findings cannot be obtained by a single level model without the multilevel component.

The longitudinal patterns of the more important parameters in the first-level model are displayed in Figures 3(a) and (b). From Figures 1(a) and 3(a) and (b), the following phenomena are observed: (i) As λ₃₂ at all time points (t =t₁, t₂, t₃, t₄) are close to 1.0, BI and DEP give equal constant loadings over time to the latent variable `psychiatric problems, ξ_1gt.' This indicates that the relationship between `psychiatric problems' and {BI,DEP} are very stable over time (they both measure depression symptoms). (ii) The fixed covariate AD has a positive effect on `cocaine use, CC.' (iii) The fixed covariate EMP has a negative effect on `cocaine use, CC' at all time points (see the line `b₁'), which indicates that having a job reduces cocaine use. Also note that the effect of EMP is more substantial during the long-term follow-up period. (iv) The positive effect of the latent variable `social support' on cocaine use at baseline changed quickly to negative effects at two years after intake, as well as years after treatment (see the line γ₂). This indicates that social support had a substantial impact on reducing cocaine use during the post-treatment period. (v) Note also that the magnitude of ${\hat{γ}}_{1}$ decreases from baseline to one or two years after intake, but rebounds during the long-term follow-up period. This indicates that the impact of `psychiatric problems' is still strong during a long period of time after treatment, although it may be reduced during treatment or shortly after. (vi) It is interesting to find that the variances ϕ₁₁,ϕ₂₂, and ϕ₁₂ are basically constant over time (see Figure 3(b)). (vii) For λ₅₃, γ₁, and γ₂, their changes are substantial from baseline to one or two years after intake, but less substantial in the subsequent years (see Figures 3(a) and (b)). (viii) The effects of cocaine use from t =t₁ to t =t₂, and from t =t₂ to t =t₃ are $- 0.074 (= {\hat{β}}_{1})$ , and $0.463 (= {\hat{β}}_{2})$ , respectively. After a time difference of 10 years, the corresponding effect from t =t₃ to t =t₄ is reduced to $- 0.163 (= {\hat{β}}_{3})$ (see Figure 1(a)). Most of these effects are significant. These findings are consistent with prior literature showing a strong stability of cocaine use over time. The present modeling results also demonstrate a positive association of psychiatric problems with cocaine use during two of the three follow-up points. The increasing negative association of social support with cocaine use is discernible at the later follow-up points, which suggests that social support becomes a stronger factor influencing cocaine use when the treatment effects dissipate at the longer follow-up point, and future studies are needed to further this investigation.

(a) Plots of ML estimates of λ₃₂,λ₅₃,b, γ₁, and γ₂ at t =t₁,t₂,t₃, and t₄ and (b) plots of ML estimates of ϕ₁₁,ϕ₁₂, and ϕ₂₂ at t =t₁,t₂,t₃, and t₄.

6. DISCUSSION

An appealing feature of SEMs is the ability to group the correlated observed variables into latent variables so that instead of dealing with a large number of observed variables, we can utilize the structural equation with a much smaller number of latent variables to assess inter-relationships among them. This feature is particularly advantageous in analyzing multivariate longitudinal data, because the comprehensive model that is simultaneously defined at every time point involves p×T observed variables, which may be too large for a regression model. Moreover, the interpretations based on latent variables are more clear and precise than those that are based on the original observed variables. The proposed two-level nonlinear SEM with covariates provides a rather general framework for assessing various characteristics of changes with respect to time.

There are a few limitations of the current approach that indicate areas needing future research. First, similar to many other statistical models, the proposed SEM depends on the assumption that the latent variables and the residual errors are normally distributed. As this assumption may be violated for many types of biomedical data, it is necessary to develop robust methods that are less reliant on this assumption [45]. Second, due to the complexities of the proposed two-level nonlinear SEM with ordered categorical variables, which involves observed data with a correlation structure and missing entries, and the fact that a saturated model does not exist in the context of nonlinear SEM, we do not have a classical likelihood ratio test statistic to assess the goodness-of-fit of the proposed model. In a Bayesian approach with the unknown parameters treated as random, goodness-of-fit of a hypothesized model can be assessed by using Bayesian predictive p-values (BPPs) [46], which required integration over the posterior distribution of the unknown parameters in the model. As it is simple to perform numerically using posterior simulation draws of the parameters within the iterations of a Markov chain Monte Carlo algorithm in the Bayesian estimation, the BPPs have been widely applied in Bayesian analyses of some latent variable models in biostatistics (see [47]) as well as many SEMs (see [48] and the references therein). However, there is a conceptual problem in applying BPPs in an ML analysis, because parameters are not considered as random and the posterior distribution of the unknown parameters is not related to the ML approach. Hence, we propose the BIC for model comparison, and residual plots for model checking (see Figure 2 in the example). Clearly, the development of a more formal statistic for goodness-of-fit of the model in the ML context is an interesting topic for future research. Alternatively, the Bayesian approach can be used. Third, the current ML method cannot handle unordered categorical data. As genotype variables are unordered categorical, it is important to develop methods to handle this kind of data for genetic analysis. Fourth, because the missing data considered here are missing at random, it is necessary to generalize the ML approach in analyzing missing data with a nonignorable missing mechanism.

ACKNOWLEDGEMENTS

This research was supported by a grant (CUHK 450607) from the Research Grant Council of Hong Kong Special Administration Region and grant P30DA016383 from the National Institute on Drug Abuse, Bethesda, MD. Special thanks to Diane M. Herbeck and David Huang for data preparation and to C. P. Chou for valuable comments in improving the manuscript.

Contract/grant sponsor: Research Grant Council of Hong Kong Special Administration Region; contract/grant number: CUHK 450607

Contract/grant sponsor: National Institute on Drug Abuse; contract/grant number: P30DA016383

APPENDIX A: CONDITIONAL DISTRIBUTIONS AND IMPLEMENTATION OF THE E-STEP

Based on the definition and properties of the current model, it can be shown that

p (Y ∣ Ω_{0}, Ω_{1}, U, Z, α, θ) = \prod_{g = 1}^{G} p (y_{g} ∣ ω_{g 0}, ω_{g}, u_{g}, θ)

where

\begin{matrix} p & (y_{g} ∣ ω_{g 0}, ω_{g}, u_{g}, θ) \\ \overset{D}{=} N [Σ_{v} {\sum_{t = 1}^{T} Ψ_{t}^{- 1} (u_{g t} - A_{t} c_{g t} - Λ_{t} ω_{g t}) \\ + Ψ_{0}^{- 1} (A_{0} c_{g 0} + Λ_{0} ω_{g 0})}, Σ_{v}] with Σ_{v} = {(\sum_{t = 1}^{T} Ψ_{t}^{- 1} + Ψ_{0}^{- 1})}^{- 1} \end{matrix}

(A1)

p (Ω_{0} ∣ Y, Ω_{1}, U, Z, α, θ) = \prod_{g = 1}^{G} p (ω_{g 0} ∣ y_{g}, θ)

where

p (ω_{g 0} ∣ y_{g}, θ) \overset{D}{=} N [m_{g 0}, Σ_{g 0}]

(A2)

in which

m_{g 0} = Σ_{g 0} Λ_{0}^{'} Ψ_{0}^{- 1} y_{g} and Σ_{g 0} = {(Φ_{0}^{- 1} + Λ_{0}^{'} Ψ_{0} Λ_{0})}^{- 1}

p (Ω_{1} ∣ Y, Ω_{0}, U, Z_{obs}, α, θ) = \prod_{g = 1}^{G} p (ω_{g} ∣ y_{g}, u_{g}, θ)

where

\begin{matrix} p (ω_{g} ∣ y_{g}, u_{g}, θ) \propto & \exp [- \frac{1}{2} {ξ_{g}^{'} Φ^{- 1} ξ_{g} + \sum_{t = 1}^{T} (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t})' \\ \times Ψ_{t}^{- 1} (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t}) \\ + (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})' Ψ_{δ t}^{- 1} (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})}] \end{matrix}

(A3)

\begin{matrix} p (W ∣ Y, Ω_{0}, Ω_{1}, X, Z, α, θ) & = p (W ∣ Y, Ω_{1}, Z, α, θ) \\ = \prod_{g = 1}^{G} \prod_{t = 1}^{T} \prod_{j = 1}^{s} p (w_{g t j} ∣ y_{g j}, ω_{g t}, z_{g t j}, α, θ) \end{matrix}

and

p (w_{g t j} ∣ y_{g j}, ω_{g t}, z_{g t j}, α, θ) \overset{D}{=} N [y_{g j} + A_{i j}^{'} c_{g t} + Λ_{t j}^{'} ω_{g t}, ψ_{t j}] I (w_{g t j} \in (α_{j, z_{g t j}}, α_{j, z_{g t j}} + 1])

(A4)

where z_{gt j} is jth element of z_gt associated with ordered categorical variable w_{gt j}.

The conditional distributions given in (A1) and (A2) are standard normal distributions, drawing observations from them is straightforward. As (A3) is a nonstandard distribution, the MH algorithm is used to simulate observations from (A3). (A4) involves the univariate truncated normal distribution. Simulating observations from this distribution is done by the inverse distribution method proposed in Devroye [49].

To simulate observations from the target density p(ω_g|·) in (A3), let $η_{g} = (η_{1}^{'}, \dots, η_{T}^{'})'$ , Ψ=diag(Ψ₁, …, Ψ_T), Ψ_δ=diag(Ψ_δ1, …, Ψ_δT), λ_tm be the mth column of Λ_t,

A = (\begin{matrix} A_{1} & \dots & 0 \\ ⋮ & ⋮ & ⋮ \\ 0 & \dots & A_{T} \end{matrix}), Λ = (\begin{matrix} λ_{11} & 0 & \dots & 0 & λ_{12} & \dots & λ_{1 q} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & λ_{21} & \dots & 0 & 0 & \dots & 0 & λ_{22} & \dots & λ_{2 q} & 0 & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ & ⋮ & \dots & ⋮ & ⋮ & \dots & ⋮ & ⋮ & \dots & ⋮ \\ 0 & 0 & \dots & λ_{T 1} & 0 & \dots & 0 & 0 & \dots & 0 & λ_{T 2} & \dots & λ_{T q} \end{matrix})

Π = (\begin{matrix} B_{0} & B_{1} & 0 & \dots & 0 & 0 & 0 & \dots & 0 & 0 & Γ_{1} & 0 & \dots & 0 \\ B_{0} & 0 & B_{2} & \dots & 0 & β_{1} & 0 & \dots & 0 & 0 & 0 & Γ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ B_{0} & 0 & 0 & \dots & B_{T} & β_{1} & β_{2} & \dots & β_{T - 1} & 0 & 0 & 0 & Γ_{T} \end{matrix})

and Π_*=I–Π, where I is an identity matrix with appropriate dimension. We choose N[·,σ²Σ_*] as the proposal distribution, where $\sum_{*}^{- 1} = \sum^{- 1} + Λ' Ψ Λ$ and

Σ^{- 1} = [\begin{matrix} Π_{*}^{'} Ψ_{δ}^{- 1} Π_{*} & - Π_{*}^{'} Ψ_{δ}^{- 1} Γ Δ \\ - Δ^{'} Γ^{'} Ψ_{δ}^{- 1} Π_{*} & Φ^{- 1} + Δ^{'} Γ^{'} Ψ_{δ}^{- 1} Γ Δ \end{matrix}]

where $Δ = \partial F_{1} (ξ_{g}) ∕ \partial ξ_{g} ∣ ξ_{g} = 0$ . $Let p (\cdot ∣ μ_{g}^{*}, σ^{2} Σ_{*})$ be the density function corresponding to the proposal distribution $N [μ_{g}^{*}, σ^{2} Σ_{*}]$ ; the MH algorithm is implemented as follows: At the lth iteration with a current value $ω_{g}^{(l)}$ , a new candidate $ω_{g}^{*}$ is generated from $p (\cdot ∣ ω_{g}^{(l)}, σ^{2} Σ_{*})$ and accepting this new candidate with probability min ${1, p (ω_{g}^{*} ∣ \cdot) ∕ p (ω_{g}^{(l)} ∣ \cdot)}$ . The variance σ² can be chosen such that the average acceptance rate is approximately 0.25 or more, see [50].

APPENDIX B: DETAILS IN COMPLETING THE M-STEP

Let $A_{t j}^{'}$ , $A_{0 j}^{'}$ , $Λ_{t j}^{'}$ , $Λ_{0 j}^{'}$ , $Π_{t j}^{'}$ , and B_0j be the jth rows of A_t, A₀, Λ_t, Λ₀, Π_t, and B₀. The derivatives involved in the system of equations given in ∂Q/∂θ=0 are

\frac{\partial L_{1}}{\partial A_{t}} = \sum_{g = 1}^{G} \sum_{t = 1}^{T} Ψ_{t}^{- 1} (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t}) c_{g t}^{'}

\frac{\partial L_{1}}{\partial Λ_{t j}} = \sum_{g = 1}^{G} \sum_{t = 1}^{T} ψ_{t j}^{- 1} (u_{g t j} - y_{g j} - A_{t j} c_{g t} - Λ_{t j} ω_{g t}) ω_{g t}^{'}

\begin{matrix} \frac{\partial L_{1}}{\partial diag (Ψ_{t})} = & \frac{1}{2} diag [\sum_{g = 1}^{G} \sum_{t = 1}^{T} Ψ_{t}^{- 1} {(u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t}) \\ \times (u_{g t} - y_{g} - A_{t} c_{g t} - Λ_{t} ω_{g t})' - Ψ_{t}} Ψ_{t}^{- 1}] \end{matrix}

\frac{\partial L_{1}}{\partial Π_{t j}} = \sum_{g = 1}^{G} \sum_{t = 1}^{T} ψ_{δ t j}^{- 1} (η_{g t j} - B_{0 j} d_{g 0} - Π_{t j} h_{g t}) h_{g t}^{'}

\frac{\partial L_{1}}{\partial B_{0}} = \sum_{g = 1}^{G} \sum_{t = 1}^{T} Ψ_{δ t}^{- 1} (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t}) d_{g 0}

\frac{\partial L_{1}}{\partial Φ} = \frac{1}{2} Φ^{- 1} \sum_{g = 1}^{G} (ξ_{g} ξ_{g}^{'} - Φ) Φ^{- 1}

\frac{\partial L_{1}}{\partial diag (Ψ_{δ t})} = \frac{1}{2} diag [\sum_{g = 1}^{G} \sum_{t = 1}^{T} Ψ_{δ t}^{- 1} {(η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t}) (η_{g t} - B_{0} d_{g 0} - Π_{t} h_{g t})' - Ψ_{δ t}} Ψ_{δ t}^{- 1}]

\frac{\partial L_{2}}{\partial A_{0}} = Ψ_{0}^{- 1} \sum_{g = 1}^{G} (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0}) c_{g 0}^{'}

\frac{\partial L_{2}}{\partial Λ_{0 j}} = ψ_{0 j}^{- 1} \sum_{g = 1}^{G} (y_{g j} - A_{0 j} c_{g 0} - Λ_{0 j} ω_{g 0}) ω_{g 0}^{'}

\frac{\partial L_{0}}{\partial diag (Ψ_{0})} = \frac{1}{2} diag [Ψ_{0}^{- 1} \sum_{g = 1}^{G} {(y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0}) (y_{g} - A_{0} c_{g 0} - Λ_{0} ω_{g 0})' - Ψ_{0}} Ψ_{0}^{- 1}]

\frac{\partial L_{2}}{\partial Φ_{0}} = \frac{1}{2} Φ_{0}^{- 1} \sum_{g = 1}^{G} (ω_{g 0} ω_{g 0}^{'} - Φ_{0}) Φ_{0}^{- 1}

Conditional on other parameters, the solution of each individual equation in ∂Q/∂θ=0 can be obtained as follows:

A_{t} = {(\sum_{g = 1}^{G} \sum_{t = 1}^{T} c_{g t} c_{g t}^{'})}^{- 1} \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [u_{g t} - y_{g} - Λ_{t} ω_{g t} ∣ X, Z, α, θ]

Λ_{t j}^{'} = {(\sum_{g = 1}^{G} \sum_{t = 1}^{T} E [ω_{g t} ω_{g t}^{'} ∣ X, Z, α, θ])}^{- 1} \times \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [ω_{g t} (u_{g t j} - y_{g j} - A_{t j} c_{g t}) ∣ X, Z, α, θ]

ψ_{t j} = \frac{1}{n} \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [{(u_{g t j} - y_{g j} - A_{t j} c_{g t} - Λ_{t j} ω_{g t})}^{2} ∣ X, Z, α, θ]

Π_{t j}^{'} = {(\sum_{g = 1}^{G} \sum_{t = 1}^{T} E [h_{g t} h_{g t}^{'} ∣ X, Z, α, θ])}^{- 1} \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [h_{g t} (η_{g t j} - B_{0 j} d_{g 0}) ∣ X, Z, α, θ]

B_{0} = {(\sum_{g = 1}^{G} d_{g 0} d_{g 0}^{'})}^{- 1} \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [η_{g t} - Π_{t} h_{g t} ∣ X, Z, α, θ]

Φ = \frac{1}{n} \sum_{g = 1}^{G} E [ξ_{g} ξ_{g}^{'} ∣ X, Z, α, θ]

ψ_{δ t j} = \frac{1}{n} \sum_{g = 1}^{G} \sum_{t = 1}^{T} E [{(η_{g t j} - B_{0 j} d_{g 0} - Π_{g t j} h_{g t})}^{2} ∣ X, Z, α, θ]

A_{0} = {(\sum_{g = 1}^{G} c_{g 0} c_{g 0}^{'})}^{- 1} \sum_{g = 1}^{G} E [y_{g} - Λ_{0} ω_{g 0} ∣ X, Z, α, θ]

Λ_{0 j}^{'} = {(\sum_{g = 1}^{G} E [ω_{g 0} ω_{g 0}^{'} ∣ X, Z, α, θ])}^{- 1} \sum_{g = 1}^{G} E [ω_{g 0} (y_{g j} - A_{0 j} c_{g 0}) ∣ X, Z, α, θ]

ψ_{0 j} = \frac{1}{n} \sum_{g = 1}^{G} E [{(y_{g j} - A_{0 j} c_{g 0} - Λ_{0} ω_{g 0})}^{2} ∣ X, Z, α, θ]

Φ_{0} = \frac{1}{n} \sum_{g = 1}^{G} E [ω_{g 0} ω_{g 0}^{'} ∣ X, Z, α, θ]

Finally, to update the M-step, the conditional expectations involved in the above solutions are approximated by the corresponding observations simulated by the hybrid algorithm at the E-step.

REFERENCES

1.Bollen KA. Structural Equation with Latent Variables. Wiley; New York: 1989. [Google Scholar]
2.Jöreskog KG, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Scientific Software International; Hove and London: 1996. [Google Scholar]
3.Bentler PM, Wu EJC. EQS6 for Windows User Guide. Multivariate Software, Inc.; Enciuo, CA: 2002. [Google Scholar]
4.Yuan KH, Bentler PM. Mean and covariance structures analysis: the oretical and practical improvements. Journal of the American Statistical Association. 1997;92:767–774. [Google Scholar]
5.Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society, Series B. 2000;62:355–366. [Google Scholar]
6.Shi JQ, Lee SY. Latent variable models with mixed continuous and polytomous data. Journal of the Royal Statistical Society, Series B. 2000;62:77–87. [Google Scholar]
7.Lee SY, Shi JQ. Maximum likelihood estimation of two-level latent variable models with mixed continuous and polytomous data. Biometrics. 2001;57:787–794. doi: 10.1111/j.0006-341x.2001.00787.x. [DOI] [PubMed] [Google Scholar]
8.Lee SY, Song XY. Maximum likelihood analysis of a general latent variable model with hierarchically mixed data. Biometrics. 2004;60:624–636. doi: 10.1111/j.0006-341X.2004.00211.x. [DOI] [PubMed] [Google Scholar]
9.Guo J, Wall M, Amemiya Y. Latent class regression on latent factor. Biostatistics. 2006;7:145–163. doi: 10.1093/biostatistics/kxi046. DOI: 10.1093/biostatistics/kxi046. [DOI] [PubMed] [Google Scholar]
10.Papadopoulos S, Amemiya Y. Correlated samples with fixed and nonnormal latent variables. Annals of Statistics. 2005;33:2732–2757. DOI: 10.1214/009053605000000552. [Google Scholar]
11.Sanchez BN, Budtz-Jögensen E, Ryan LM, Hu H. Structural equation models: a review with applications in environmental epidemiology. Journal of the American Statistical Association. 2005;100:1443–1454. [Google Scholar]
12.Wei GCG, Tanner MA. A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association. 1990;85:699–704. [Google Scholar]
13.Louis TA. Finding the observed information matrix when using EM algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]
14.Gelman A, Meng XL. Simulating normalizing constant: from importance sampling to bridge sampling to path sampling. Statistical Science. 1998;13:163–185. [Google Scholar]
15.Substance Abuse and Mental Health Services Administration . 2002 National Survey on Drug Use & Health Report. SAMHSA, Office of Applied Studies; Rockville, MD: 2003. [Google Scholar]
16.National Institute of Justice . 2000 Arrestee Drug Abuse Monitoring: Annual Report. NIJ, Office of Justice Programs; Washington, DC: 2003. [Google Scholar]
17.Substance Abuse and Mental Health Services Administration . Emergency Department Trends from the Drug Abuse Warning Network, Final Estimates 1995–2002. SAMHSA, Office of Applied Studies; Rockville, MD: 2003. [Google Scholar]
18.Carrol KM, Power ME, Bryant K, Rounsaville BJ. One-year follow-up status of treatment-seeking cocaine abusers. Psychopathology and dependence severity as predictors of outcome. Journal of Nervous and Mental Disease. 1993;181:71–79. doi: 10.1097/00005053-199302000-00001. [DOI] [PubMed] [Google Scholar]
19.Browne RA, Monti PM, Myers MG, Martin RA, Rivinus T, Dubreuil MET, Rohsenow DJ. Depression among cocaine abusers in treatment: relation to cocaine and alcohol use and treatment outcome. American Journal of Psychiatry. 1998;155:220–225. doi: 10.1176/ajp.155.2.220. [DOI] [PubMed] [Google Scholar]
20.Patkar AA, Thornton CC, Mannelli P, Hill KP, Gottheil E, Vergare MJ, Weinstein SP. Comparison of pretreatment characteristics and treatment outcomes for alcohol, cocaine, and multisubstance-dependent patients. Journal of Additive Diseases. 2004;23:93–109. doi: 10.1300/J069v23n01_08. [DOI] [PubMed] [Google Scholar]
21.Hayassy BE, Wasserman DA, Hall SM. Social relationships and abstinence from cocaine in an American treatment sample. Addition. 1995;90:699–710. doi: 10.1046/j.1360-0443.1995.90569911.x. [DOI] [PubMed] [Google Scholar]
22.Weisner C, Ray GT, Mertens JR, Satre DD, Moore C. Short-term alcohol and drug treatment outcomes predict long-term outcomes. Drug and Alcohol Dependence. 2003;71:281–294. doi: 10.1016/s0376-8716(03)00167-4. [DOI] [PubMed] [Google Scholar]
23.Hser Y. Prediction long-term stable recovery from heroin addiction: findings based on a 33-year follow-up study. Journal of Addiction Diseases. 2007;26:51–60. doi: 10.1300/J069v26n01_07. [DOI] [PubMed] [Google Scholar]
24.Crits-Christoph P, Siqueland L, Blaine J, Frank A, Luborsky L, Onken LS, Muenz LR, Thase ME, Weiss RD, Gastfriend DR, Woody GE, Barber JP, Butler SF, Daley D, Salloumn I, Bishop S, Najavits LM, Lis J, Mercer D, Griffin ML, Moras K, Beck AT. Psychosocial treatments for cocaine dependence: National Institute on Drug Abuse Collaborative Cocaine Treatment Study. Archives of General Psychiatry. 1999;56:493–502. doi: 10.1001/archpsyc.56.6.493. [DOI] [PubMed] [Google Scholar]
25.Chou CP, Hser YI, Anglin MD. Longitudinal treatment effects among cocaine users: a growth curve modeling approach. Substance Use and Misuse. 2003;38:1323–1343. doi: 10.1081/ja-120018491. [DOI] [PubMed] [Google Scholar]
26.Hser Y, Stark ME, Paredes A, Huang D, Anglin MD, Rawson R. A 12-year follow-up of a treated cocaine-dependent sample. Journal of Substance Abuse Treatment. 2006;30:219–226. doi: 10.1016/j.jsat.2005.12.007. [DOI] [PubMed] [Google Scholar]
27.Meredith W, Tisak J. Latent curve analysis. Psychometrika. 1990;55:107–122. [Google Scholar]
28.Browne MW. Structural latent curve models. In: Cuadras CM, Rao CR, editors. Multivariate Analysis: Future Directions 2. Elsevier; Amsterdam: 1993. pp. 171–179. [Google Scholar]
29.Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]
30.Daniels MJ, Normand SLT. Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics. 2006;7:1–15. doi: 10.1093/biostatistics/kxi036. DOI: 10.1093/biostatistics/kxi036. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; New York: 2000. [Google Scholar]
32.Diggle DJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd edn Oxford University Press; Oxford, U.K.: 2002. [Google Scholar]
33.Sammel MD, Ryan LM. Latent variables with fixed effects. Biometrics. 1996;52:220–243. [PubMed] [Google Scholar]
34.Rabe-Hesketh S, Skrondal A, Pickless A. Generalized multilevel structural equation modeling. Psychometrika. 2003;69:167–190. [Google Scholar]
35.Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; New York: 1987. [Google Scholar]
36.Song XY, Lee SY. Bayesian analysis of latent variable models with exponential family outcomes. Statistics in Medicine. 2007;26:681–693. doi: 10.1002/sim.2530. DOI: 10.1002/sim.2530. [DOI] [PubMed] [Google Scholar]
37.Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6:721–741. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]
38.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. Journal of Chemical Physics. 1953;21:1087–1091. [Google Scholar]
39.Hastings WK. Monte Carlo sampling methods using Markov chains and their application. Biometrika. 1970;57:97–109. [Google Scholar]
40.Shi JQ, Copas J. Publication bias and meta-analysis for 2×2 tables: an average Markov chain Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B. 2002;64:221–236. [Google Scholar]
41.Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
42.Song XY, Lee SY. Model comparison of generalized linear mixed effect models. Statistics in Medicine. 2006;25:1685–1698. doi: 10.1002/sim.2318. DOI: 10.1002/sim.2318. [DOI] [PubMed] [Google Scholar]
43.Meng XL, Wong HW. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica. 1996;6:831–860. [Google Scholar]
44.Kasarabada DN, Anglin MD, Khalsa-Denison E, Paredes A. Differential effects of treatment modality on psychosocial functioning of cocaine-dependent men. Journal of Clinical Psychology. 1999;55:257–274. doi: 10.1002/(sici)1097-4679(199902)55:2<257::aid-jclp13>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
45.Lee SY, Lu B, Song XY. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine. 2008 doi: 10.1002/sim.3098. DOI: 10.1002/sim.3098. [DOI] [PubMed] [Google Scholar]
46.Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–807. [Google Scholar]
47.Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes. Biostatistics. 2001;2:397–416. doi: 10.1093/biostatistics/2.4.397. [DOI] [PubMed] [Google Scholar]
48.Lee SY. Structural Equation Modelling: A Bayesian Approach. Wiley; New York: 2007. [Google Scholar]
49.Devroye L. Non-uniform Random Variate Generation. Springer; New York: 1985. [Google Scholar]
50.Gelman A, Roberts GO, Gilks WR. Efficient metropolis jumping rules. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. vol. 5. Oxford University Press; Oxford: 1995. pp. 599–607. [Google Scholar]

[R1] 1.Bollen KA. Structural Equation with Latent Variables. Wiley; New York: 1989. [Google Scholar]

[R2] 2.Jöreskog KG, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Scientific Software International; Hove and London: 1996. [Google Scholar]

[R3] 3.Bentler PM, Wu EJC. EQS6 for Windows User Guide. Multivariate Software, Inc.; Enciuo, CA: 2002. [Google Scholar]

[R4] 4.Yuan KH, Bentler PM. Mean and covariance structures analysis: the oretical and practical improvements. Journal of the American Statistical Association. 1997;92:767–774. [Google Scholar]

[R5] 5.Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society, Series B. 2000;62:355–366. [Google Scholar]

[R6] 6.Shi JQ, Lee SY. Latent variable models with mixed continuous and polytomous data. Journal of the Royal Statistical Society, Series B. 2000;62:77–87. [Google Scholar]

[R7] 7.Lee SY, Shi JQ. Maximum likelihood estimation of two-level latent variable models with mixed continuous and polytomous data. Biometrics. 2001;57:787–794. doi: 10.1111/j.0006-341x.2001.00787.x. [DOI] [PubMed] [Google Scholar]

[R8] 8.Lee SY, Song XY. Maximum likelihood analysis of a general latent variable model with hierarchically mixed data. Biometrics. 2004;60:624–636. doi: 10.1111/j.0006-341X.2004.00211.x. [DOI] [PubMed] [Google Scholar]

[R9] 9.Guo J, Wall M, Amemiya Y. Latent class regression on latent factor. Biostatistics. 2006;7:145–163. doi: 10.1093/biostatistics/kxi046. DOI: 10.1093/biostatistics/kxi046. [DOI] [PubMed] [Google Scholar]

[R10] 10.Papadopoulos S, Amemiya Y. Correlated samples with fixed and nonnormal latent variables. Annals of Statistics. 2005;33:2732–2757. DOI: 10.1214/009053605000000552. [Google Scholar]

[R11] 11.Sanchez BN, Budtz-Jögensen E, Ryan LM, Hu H. Structural equation models: a review with applications in environmental epidemiology. Journal of the American Statistical Association. 2005;100:1443–1454. [Google Scholar]

[R12] 12.Wei GCG, Tanner MA. A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. Journal of the American Statistical Association. 1990;85:699–704. [Google Scholar]

[R13] 13.Louis TA. Finding the observed information matrix when using EM algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233. [Google Scholar]

[R14] 14.Gelman A, Meng XL. Simulating normalizing constant: from importance sampling to bridge sampling to path sampling. Statistical Science. 1998;13:163–185. [Google Scholar]

[R15] 15.Substance Abuse and Mental Health Services Administration . 2002 National Survey on Drug Use & Health Report. SAMHSA, Office of Applied Studies; Rockville, MD: 2003. [Google Scholar]

[R16] 16.National Institute of Justice . 2000 Arrestee Drug Abuse Monitoring: Annual Report. NIJ, Office of Justice Programs; Washington, DC: 2003. [Google Scholar]

[R17] 17.Substance Abuse and Mental Health Services Administration . Emergency Department Trends from the Drug Abuse Warning Network, Final Estimates 1995–2002. SAMHSA, Office of Applied Studies; Rockville, MD: 2003. [Google Scholar]

[R18] 18.Carrol KM, Power ME, Bryant K, Rounsaville BJ. One-year follow-up status of treatment-seeking cocaine abusers. Psychopathology and dependence severity as predictors of outcome. Journal of Nervous and Mental Disease. 1993;181:71–79. doi: 10.1097/00005053-199302000-00001. [DOI] [PubMed] [Google Scholar]

[R19] 19.Browne RA, Monti PM, Myers MG, Martin RA, Rivinus T, Dubreuil MET, Rohsenow DJ. Depression among cocaine abusers in treatment: relation to cocaine and alcohol use and treatment outcome. American Journal of Psychiatry. 1998;155:220–225. doi: 10.1176/ajp.155.2.220. [DOI] [PubMed] [Google Scholar]

[R20] 20.Patkar AA, Thornton CC, Mannelli P, Hill KP, Gottheil E, Vergare MJ, Weinstein SP. Comparison of pretreatment characteristics and treatment outcomes for alcohol, cocaine, and multisubstance-dependent patients. Journal of Additive Diseases. 2004;23:93–109. doi: 10.1300/J069v23n01_08. [DOI] [PubMed] [Google Scholar]

[R21] 21.Hayassy BE, Wasserman DA, Hall SM. Social relationships and abstinence from cocaine in an American treatment sample. Addition. 1995;90:699–710. doi: 10.1046/j.1360-0443.1995.90569911.x. [DOI] [PubMed] [Google Scholar]

[R22] 22.Weisner C, Ray GT, Mertens JR, Satre DD, Moore C. Short-term alcohol and drug treatment outcomes predict long-term outcomes. Drug and Alcohol Dependence. 2003;71:281–294. doi: 10.1016/s0376-8716(03)00167-4. [DOI] [PubMed] [Google Scholar]

[R23] 23.Hser Y. Prediction long-term stable recovery from heroin addiction: findings based on a 33-year follow-up study. Journal of Addiction Diseases. 2007;26:51–60. doi: 10.1300/J069v26n01_07. [DOI] [PubMed] [Google Scholar]

[R24] 24.Crits-Christoph P, Siqueland L, Blaine J, Frank A, Luborsky L, Onken LS, Muenz LR, Thase ME, Weiss RD, Gastfriend DR, Woody GE, Barber JP, Butler SF, Daley D, Salloumn I, Bishop S, Najavits LM, Lis J, Mercer D, Griffin ML, Moras K, Beck AT. Psychosocial treatments for cocaine dependence: National Institute on Drug Abuse Collaborative Cocaine Treatment Study. Archives of General Psychiatry. 1999;56:493–502. doi: 10.1001/archpsyc.56.6.493. [DOI] [PubMed] [Google Scholar]

[R25] 25.Chou CP, Hser YI, Anglin MD. Longitudinal treatment effects among cocaine users: a growth curve modeling approach. Substance Use and Misuse. 2003;38:1323–1343. doi: 10.1081/ja-120018491. [DOI] [PubMed] [Google Scholar]

[R26] 26.Hser Y, Stark ME, Paredes A, Huang D, Anglin MD, Rawson R. A 12-year follow-up of a treated cocaine-dependent sample. Journal of Substance Abuse Treatment. 2006;30:219–226. doi: 10.1016/j.jsat.2005.12.007. [DOI] [PubMed] [Google Scholar]

[R27] 27.Meredith W, Tisak J. Latent curve analysis. Psychometrika. 1990;55:107–122. [Google Scholar]

[R28] 28.Browne MW. Structural latent curve models. In: Cuadras CM, Rao CR, editors. Multivariate Analysis: Future Directions 2. Elsevier; Amsterdam: 1993. pp. 171–179. [Google Scholar]

[R29] 29.Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]

[R30] 30.Daniels MJ, Normand SLT. Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics. 2006;7:1–15. doi: 10.1093/biostatistics/kxi036. DOI: 10.1093/biostatistics/kxi036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; New York: 2000. [Google Scholar]

[R32] 32.Diggle DJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd edn Oxford University Press; Oxford, U.K.: 2002. [Google Scholar]

[R33] 33.Sammel MD, Ryan LM. Latent variables with fixed effects. Biometrics. 1996;52:220–243. [PubMed] [Google Scholar]

[R34] 34.Rabe-Hesketh S, Skrondal A, Pickless A. Generalized multilevel structural equation modeling. Psychometrika. 2003;69:167–190. [Google Scholar]

[R35] 35.Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; New York: 1987. [Google Scholar]

[R36] 36.Song XY, Lee SY. Bayesian analysis of latent variable models with exponential family outcomes. Statistics in Medicine. 2007;26:681–693. doi: 10.1002/sim.2530. DOI: 10.1002/sim.2530. [DOI] [PubMed] [Google Scholar]

[R37] 37.Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6:721–741. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]

[R38] 38.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. Journal of Chemical Physics. 1953;21:1087–1091. [Google Scholar]

[R39] 39.Hastings WK. Monte Carlo sampling methods using Markov chains and their application. Biometrika. 1970;57:97–109. [Google Scholar]

[R40] 40.Shi JQ, Copas J. Publication bias and meta-analysis for 2×2 tables: an average Markov chain Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B. 2002;64:221–236. [Google Scholar]

[R41] 41.Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[R42] 42.Song XY, Lee SY. Model comparison of generalized linear mixed effect models. Statistics in Medicine. 2006;25:1685–1698. doi: 10.1002/sim.2318. DOI: 10.1002/sim.2318. [DOI] [PubMed] [Google Scholar]

[R43] 43.Meng XL, Wong HW. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica. 1996;6:831–860. [Google Scholar]

[R44] 44.Kasarabada DN, Anglin MD, Khalsa-Denison E, Paredes A. Differential effects of treatment modality on psychosocial functioning of cocaine-dependent men. Journal of Clinical Psychology. 1999;55:257–274. doi: 10.1002/(sici)1097-4679(199902)55:2<257::aid-jclp13>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]

[R45] 45.Lee SY, Lu B, Song XY. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine. 2008 doi: 10.1002/sim.3098. DOI: 10.1002/sim.3098. [DOI] [PubMed] [Google Scholar]

[R46] 46.Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–807. [Google Scholar]

[R47] 47.Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes. Biostatistics. 2001;2:397–416. doi: 10.1093/biostatistics/2.4.397. [DOI] [PubMed] [Google Scholar]

[R48] 48.Lee SY. Structural Equation Modelling: A Bayesian Approach. Wiley; New York: 2007. [Google Scholar]

[R49] 49.Devroye L. Non-uniform Random Variate Generation. Springer; New York: 1985. [Google Scholar]

[R50] 50.Gelman A, Roberts GO, Gilks WR. Efficient metropolis jumping rules. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. vol. 5. Oxford University Press; Oxford: 1995. pp. 599–607. [Google Scholar]

PERMALINK

A two-level structural equation model approach for analyzing multivariate longitudinal responses

Xin-Yuan Song

Sik-Yum Lee

Yih-Ing Hser

SUMMARY

1. INTRODUCTION

2. THE SEM FOR MULTIVARIATE LONGITUDINAL DATA

2.1. The second-level model

2.2. The first-level model

2.3. Remarks

2.4. Ordered categorical data and identification of the model

3. AN ML APPROACH

3.1. Estimation

3.2. Model comparison

4. A SIMULATION STUDY

Table I.

Table II.

5. AN APPLICATION OF THE SEM APPROACH: THE LONGITUDINAL STUDY OF COCAINE USE

Table III.

Figure 1.

Table IV.

Figure 2.

Figure 3.

6. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX A: CONDITIONAL DISTRIBUTIONS AND IMPLEMENTATION OF THE E-STEP

APPENDIX B: DETAILS IN COMPLETING THE M-STEP

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A two-level structural equation model approach for analyzing multivariate longitudinal responses

Xin-Yuan Song

Sik-Yum Lee

Yih-Ing Hser

SUMMARY

1. INTRODUCTION

2. THE SEM FOR MULTIVARIATE LONGITUDINAL DATA

2.1. The second-level model

2.2. The first-level model

2.3. Remarks

2.4. Ordered categorical data and identification of the model

3. AN ML APPROACH

3.1. Estimation

3.2. Model comparison

4. A SIMULATION STUDY

Table I.

Table II.

5. AN APPLICATION OF THE SEM APPROACH: THE LONGITUDINAL STUDY OF COCAINE USE

Table III.

Figure 1.

Table IV.

Figure 2.

Figure 3.

6. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX A: CONDITIONAL DISTRIBUTIONS AND IMPLEMENTATION OF THE E-STEP

APPENDIX B: DETAILS IN COMPLETING THE M-STEP

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases