Application of Latent Growth Curve Analysis with Categorical Responses in Social Behavioral Research

Tae Kyoung Lee; Kandauda (KAS) Wickrama; Catherine W O’Neal

doi:10.1080/10705511.2017.1375858

. Author manuscript; available in PMC: 2019 May 14.

Published in final edited form as: Struct Equ Modeling. 2017 Oct 16;25(2):294–306. doi: 10.1080/10705511.2017.1375858

Application of Latent Growth Curve Analysis with Categorical Responses in Social Behavioral Research

Tae Kyoung Lee ^a, Kandauda (KAS) Wickrama ^b, Catherine W O’Neal ^b

PMCID: PMC6516863 NIHMSID: NIHMS1502692 PMID: 31097902

Abstract

Latent growth modeling allows social behavioral researchers to investigate within-person change and between-person differences in within-person change. Typically, conventional latent growth curve models are applied to continuous variables, where the residuals are assumed to be normally distributed, whereas categorical variables (i.e., binary and ordinal variables), which do not hold to normal distribution assumptions, have been rarely used. This article describes the latent growth curve model with categorical variables, and illustrates applications using Mplus software that are applicable to social behavioral research. The illustrations use marital instability data from the Iowa Youth and Family Project. We close with recommendations for the specification and parameterization of growth models that use both logit and probit link functions.

Keywords: Categorical variables, Latent response variable, Latent growth curve model

The repeated measures used in longitudinal research may not be continuous, and such measures are not normally distributed. For example, the occurrence of life events, including divorce, job loss, pregnancy, and others, are binary responses (0 = No or 1 = Yes). Also, ordinal scales often consist of multiple response options with uneven spacing. For example, ‘think about divorce’, a constituent item of the marital instability measure, may have response options ranging from 0 = never in the last year, 1 = yes, within the last year, 2= yes, within the last 6 months, and 3 = yes, within the last 3 months. These categorical variables (i.e., binary and ordinal) are discrete and often skewed, which violates the normal distributional assumptions required for ML estimation in LGCM (McTernan & Blozis, 2015).

One solution to this problem is to transform categorical responses into normally distributed continuous variables before estimating a LGCM. This approach is known as latent response variable (LRV) transformation (Masyn, Petras, & Liu, 2013). This approach requires extending the measurement component of LGCM in a SEM framework to incorporate an additional step of transforming observed categorical response variables into latent continuous variables. However, this approach to estimating a LGCM with categorical response variables (hereafter referred to as “a categorical LGCM”) has only recently begun to gain popularity in social behavioral research. Using a longitudinal sample of married couples, there are three main purposes of this paper. We aim to: (a) explain how categorical response variables (i.e., binary and ordinal responses) are transformed into latent continuous response variables, (b) demonstrate how to specify a categorical LGCM with time-invariant covariates (i.e., predictors and the outcomes) and interpret the parameters, and (c) demonstrate the utilization of a longitudinal dyadic model with categorical variables to investigate the associations of time-variant covariates. The equations, figures, and Mplus programs (version 7.4) are provided.

LATENT RESPONSE VARIABLE TRANSFORMATION

Suppose that the observed binary outcome Y (i.e., 0 and 1) is modeled in the confirmatory factor analysis (CFA) model. Contrary to estimating model parameters in a single confirmatory factor analysis (CFA) model with continuous variables (see panel a of Figure 1), additional latent variables are required in a CFA with categorical response variables (hereafter referred to as “categorical CFA”) (see panel b of Figure 1). These additional latent variables estimate a corresponding model with categorical response variables which exist between observed categorical variables (Y₁, Y₂, Y₃, Y₄, and Y₅) and the latent factor (η). The variables comprising this corresponding model are known as latent response variables (LRV) and are indicated as Y*s. The Y*s are latent continuous variables, which are transformed from categorical response variables using cut-points (known as “thresholds”; Skrondal & Rabe-Hesketh, 2004). This transformation converts observed categorical variables (Ys) into latent continuous Y*s. It is these latent continuous Y*s that are then used as the indicators to produce the latent factor η in a CFA model. This LRV transformation of observed categorical variables (Y₁ – Y₅) to latent continuous responses (Y₁* – Y₅*) is part of a family of linear models known as Generalized Linear Models (GLM; Skrondal & Rabe-Hesketh, 2004). The LRV can be formulated based on one of two distributional assumptions: (a) standard logistic distribution or (b) standard normal distribution. In the following sections, we first introduce the LRV transformation using the standard logistic distribution.

Comparison between CFAs with continuous and categorical indicators.

*Note*. i indicates an individual. For model identification, α₁ is fixed to 0 in both the continuous and categorical models.

LRV transformation of a Binary Response Variable

Transformation of a binary response variable assuming a standard logistic distribution.

Using the standard logistic distribution, the non-normal binary variable Y should be transformed into a normally distributed continuous latent variable Y* as given by:

Prob (Y_{i} = 1) \to Odds (Y_{i} = 1) \to Log-odds Y_{i} = Continuous latent response variable {Y_{i}}^{*}

where Prob (Y_i =1) is the probability of Y being 1 for individual i. Odds (Y_i =1) is the ratio of the odds for Y being 1 to the odds for Y being 0. This can be expressed as: $\frac{Prob (Y_{i} = 1)}{Prob (Y_{i} = 0)}$ or $\frac{Prob (Y_{i} = 1)}{1 - Prob (Y_{i} = 1)}$ . Log-odds or logistic (logits) are the natural logarithm of the odds.

As can be seen in the transformation process above, the transformation first starts with the probability of the observed binary Y_i = 1 (recall the possible range was from 0 to 1) and continues with the odds of Y_i = 1 (possible range from 0 to ∞). Then, these odds are converted into log-odds values (possible range from −∞ to ∞) with a standardized logistic distribution (mean = 0 and variance = $\frac{π^{2}}{3}$ ). In the standardized logistic distribution, each respondent’s log-odds (or logit) value, which represents his/her continuous latent variable response Y_i*, yields a threshold, τ₁. In the LRV transformation, this threshold serves as a cut-point that separate the underlying unobserved (latent) continuous variable, Y* into observed categories. Panel a of Figure 2 illustrates how the threshold dichotomizes values of the continuous latent response variable Y*, such that

Y_{i} = {\begin{matrix} 0 & if - \infty < {Y_{i}}^{*} \leq τ_{1}, where τ_{1} is the threshold for {Y_{i}}^{*} \\ 1 & if τ_{1} < {Y_{i}}^{*} \leq \infty \end{matrix}

(1)

If the latent continuous value of Y_i* is less than or equal to τ₁ (the cut-point or threshold), then the observed binary Y_i variable = 0 (that is, individual i’s response is 0). If the latent continuous value of Y_i* is greater than τ₁, then the observed binary Y_i variable = 1 (that is, individual i’s response is 1).

Latent response variable (Y_i*) corresponding to the categorical variable.

*Note*. τ₁, τ₂, and τ₃ represent thresholds for a categorical variable. i indicates an individual. The distribution of Y_i* formulates depending on the type of link-function (i.e., logit link and probit link) (adapted from Masyn, Petras, & Liu, 2013).

Using the information from the threshold under the standard logistic distribution, the unconditional logistic model of the continuous latent variable Y_i* can now be specified as the baseline measurement model of the CFA, such as

{Y_{i}}^{*} = τ_{1} + ε_{i}, ε_{i} ~ logistic (0, \frac{π^{2}}{3})

(2)

As can be seen in Equation 1, the threshold indicates the propensity (the level) of a latent continuous response variable Y_i* for observed category Y = 0. The threshold (i.e., cut-point) can be converted into both odds and the percentages of the response categories.

Next, the logistic model can be modeled with the latent factor η_i as a predictor of latent Y_i* in the CFA model, yielding a conditional logistic model (see panel b of Figure 1; Agresti, 2002; Masyn et al., 2013), such that:

For the log-odds (logit) of the observed cagorical variable Y_i = 0:

{Y_{i}}^{*} = \log (\frac{Prob (Y_{i} = 0 ∣ η_{i})}{Prob (Y_{i} = 1 ∣ η_{i})}) = τ_{1} - λ \times η_{i} + ε_{i}

(3.a)

For the log-odds (logit) of the observed cagorical variable Y_i = 1:

{Y_{i}}^{*} = \log (\frac{Prob (Y_{i} = 1 ∣ η_{i})}{Prob (Y_{i} = 0 ∣ η_{i})}) = - τ_{1} + λ \times η_{i} + ε_{i}

(3.b)

Note that given in equation 1, the value of threshold indicates Y_i = 0, (not Y_i = 1). This leads to an issue that often arises related to the direction of logistic coefficients (see Equations 3.a and 3.b). We will discuss this issue in more detail in next section.

With the addition of the latent factor, the logit value of a binary response variable Y_i now linearly changes as a function of the latent factor (η_i) with the logistic regression coefficient, λ. Similar to the interpretation of a normal regression model, λ can be interpreted as the change in the log-odds (or logit values) of Y_i* for a one-unit difference in η_i. Alternatively, the odds-ratio (abbreviated as OR and calculated as exp [λ]) can be used to interpret factor loadings as the percent (%) change in the odds of Y being 1 (that is, Y = 1) for a one-unit increase in the latent variable η using a simple formula: 100 × (Exp (λ) −1). The main advantage of utilizing this odds-ratio is to investigate the effect sizes of factor loadings, which is analogous to standardized coefficients in a normal regression model (Allen & Le, 2008). That is, an odds-ratio reflects the relative contribution of a latent variable to the categorical outcome.

The association between logistic coefficients and probabilities of Y_i being 1.

This section introduces how to convert logistic coefficients as the probabilities for the observed response category of binary outcome Y_i. According to equations 1 and 2, a positive threshold value (τ₁) indicates the log-odds (i.e., logit) of Y_i being 0 (i.e., the lower category), whereas a negative of threshold value (−τ₁) indicates the logit value of being Y_i = 1 (i.e., the higher category). This threshold τ₁ can be converted to the probability for a specific response category (Muthén, 2001). For example, the response probability for individual i having a Y value of 1 is calculated using the negative of threshold value (−τ₁) as follows:

Prob (Y_{i} = 1) = \frac{1}{1 + exp - (- τ_{1})} = \frac{1}{1 + exp (τ_{1})}

(4.1)

This equation suggests that a large positive threshold value reflects a low probability of Y_i = 1 and consequently, a high probability of Y_i being 0. A large negative threshold value reflects the high probability of Y_i = 1 (or a low probability of Y_i being 0). To use a numerical example, −3 reflects that the probability of Y_i being 1 is .953, whereas 3 reflects that the probability of Y being 1 is .047 (equivalently, 1 – .953). In the conditional logistic model of a CFA (see Equation 3.b), the conditional response probability that the Y_i value is 1, is also calculated as follows:

Prob (Y_{i} = 1 ∣ η_{i}) = \frac{1}{1 + \exp - (- τ_{1} + λ \times η_{i})} = \frac{1}{1 + \exp (τ_{1} - λ \times η_{i})}

(4.2)

In this equation, the threshold τ₁ can be converted to represent the conditional response probability of Y_i = 1 after adjusting for the effects of the latent factor, η_i (i.e., the conditional probability represents a model where the effect of the latent variable is zero).

Extending logit transformation to latent growth curves with binary indicator variables.

Under the logit link function, a categorical CFA can be extended to a categorical latent growth curve model (categorical LGCM; random intercept and random slope model) by specifying repeated categorical indicators to the measurement model and estimating the growth factors (e.g., initial level and rate of change) as latent variables in the structural part of the model (Masyn et al., 2013). The full model specification is shown in Figure 3.

A categorical latent growth curve model with repeated binary indicators.

*Note*. i = an individual. t = Time (0,1, 2, ⋯, T). Cov = covariance between η₀ and η₁. Y indicates an observed categorical variable. Asterisks (*) represent latent response variables. All repeated binary indicators produce a single time-invariant threshold (τ₁) for the model identification (i.e., longitudinal threshold assumption).

The model specification is similar to that of a conventional multilevel framework (Raudenbush & Bryk, 2002). The difference is this categorical LGCM specification uses the continuous latent response variable, Y_ti*, (for individual i at time t) as the growth indicators whereas a conventional LGCM uses observed indicators, Y_ti. Given the baseline model of the continuous latent response variable Y_i* (see Equation 2), the unconditional linear LGCM with binary repeated indicators can be specified as follows.

Level 1 : {Y_{ti}}^{*} = τ_{1} + η_{0 i} + λ_{t} \times η_{1 i} + ε_{ti}

(5)

η_{0 i} = α_{00} + ζ_{0 i}, ζ_{0 i} ~ N (0, Ψ_{00})

(6.1)

Level 2 : η_{1 i} = α_{10} + ζ_{1 i}, ζ_{1 i} ~ N (0, Ψ_{11})

(6.2)

Ψ = [\begin{matrix} Ψ_{00} \\ Ψ_{10} & Ψ_{11} \end{matrix}]

(7)

where τ₁ is the threshold of the latent response variable Y_ti*. η_0i and η_1i are the latent growth factors for the initial level (i.e., intercept factor) and the rate of change (i.e., slope), respectively, with factor loadings (λ) usually set to equal t (= 0, 1, 2, …, T; time is centered at the first occasion of measurement). ζ indicates the normally distributed error with a mean of 0 and a variance of ψ. Ψ represents the variance-covariance structure of η_0i and η_1i. In general, in a categorical LGCM, the thresholds are set to be invariant over time in order to consistently define the association between Y_ti and Y*_ti. This is known as the “longitudinal threshold invariance assumption” (Masyn et al., 2013), showing that the thresholds do not depend on time.

Given the multilevel modeling structure, level 1 represents the individual’s latent responses at each time point, and level 2 characterizes the individual trajectories over time. Therefore, individual differences in the growth factors (i.e., random effects) from level 1 (η_0i and η_1i) are represented by errors (ζ_0i and ζ1i) that vary around the expected means of the intercept (α₀₀) and slope (α₁₀) (i.e., inter-individual difference in intra-individual change). According to the Equation 3.b, at the beginning (when time = 0), in the categorical LGCM, −τ₁ + α₀₀ represents the mean of log-odds (i.e., logit) values of being Y = 1, and α₁₀ represents the mean change in logit for Y = 1 corresponding to a one-unit increase in the time (Masyn et al., 2013). Note that either the threshold, τ₁, or the mean parameter, α₀₀, of the intercept factor (η_0i) should be fixed to 0 for model identification purposes. To follow the conventional LGCM approach, the time-invariant threshold, τ₁, is fixed to 0 and the mean (α₀₀) of the intercept factor is estimated (Mehta, Neale, & Flay, 2004).

For the purpose of illustrating this model specification, the latent response variable, Y_ti* was modeled as a linear function of the growth factors that make up the intercept, η_0i, and the slope, η_1i. However, nonlinear trajectories can be modeled, such as curvilinear change forms ( $λ_{t}^{2}$ , $λ_{t}^{3}$ , etc.), depending on the sample size and the number of repeated categorical indicators. In the section that follows, we illustrate how to model parameters of a linear categorical LGCM using the logit (log-odds) transformation in Mplus.

Mplus Model Specification for the Categorical LGCM

In Mplus, the categorical LGCM with five repeated binary outcomes can be estimated by specifying the syntax below:

DATA: FILE IS example.dat;

VARIABLE: NAMES ARE Y1-Y5;

USEVARIABLES ARE Y1-Y5;

CATEGORICAL ARE Y1-Y5;

ANALYSIS: ESTIMATOR=ML;

LINK=LOGIT;

MISSING = ALL (−999);

MODEL:

I S | Y1@0 Y2@1 Y3@2 Y4@3 Y5@4;

I WITH S; [Y1$1− Y5$1@0]; [I];

where Y1 to Y5 are repeated binary outcomes. Most of this Mplus syntax is similar to that of a LGCM with continuous variables. For example, the DATA command specifies the data file to be utilized for the analysis (i.e., example.dat). The VARIABLE command defines all variables in the data file. The USEVARIABLES option selects the variables to be used in the analysis. The MISSING=ALL (−999) option allows Mplus to handle missing cases (coded as −999 in the current example) with full-information maximum likelihood estimation (FIML). However, several additional lines of syntax must be specified to estimate categorical LGCM (see the italic and bold syntax). First, the CATEGORICAL option specifies variables need to be treated as either binary or ordinal variables in the model. Second, the link option should be included under ANALYSIS to use logit link function (LINK=LOGIT) with the ML estimation (or ML with robust standard errors [MLR] in Mplus).

Also, for the model specification of categorical LGCM, the syntax for two random effects in the growth model (i.e., random intercept and random slope; shown above as I S | …) should be specified in the MODEL command (see italic and bold commands in the above syntax). This syntax instructs Mplus to estimate growth parameters (i.e., the mean and variance) of the intercept and slope factors. The covariance, ψ₁₀ between the intercept and slope is estimated by using a WITH statement (I WITH S). For the illustrative purpose, a linear growth factor was specified by fixing the factor loadings to equal the time intervals (i.e., 0, 1, 2, 3, and 4) for the slope (η_1i). The thresholds of the five repeated indicators are referred to as Y1$1-Y5$1. Both thresholds and mean are defined in square brackets. To set all thresholds to 0 and estimate the mean of intercept factor, Mplus syntax is given by [Y1$1 – Y5$1@0] and [I].

Model fit evaluation.

In Mplus, ML estimation with logit transformation provides both relative and absolute model fit indices. More specifically, a log-likelihood (LL) value and IC statistics (e.g., Akaike information criterion [AIC] and Bayesian information criterion [BIC]) are provided as relative fit indices. For absolute fit indices, Mplus gives two chi-square tests: (a) the Pearson chi-square test and (b) the likelihood ratio chi-square test (LRT). The null hypothesis for both chi-square tests reflects how well the hypothesized model fits the observed data. Therefore, non-significant p values indicate the model fits the data well (Rupp, Templin, & Henson, 2010). However, the Pearson chi-square and LRT statistics sometimes provide inconsistent results, particularly when the model includes a large number of categorical variables in combination with a small sample size (Geiser, 2012). For this reason, we recommend using the deviance statistic (= −2 × LL; with a chi-square distribution) with the number of free parameters (FP). Equivalently, when comparing two competing nested models, a deviance difference test (ΔDeviance; Δ −2LL) can be used. Model M (i.e., the constrained model) is nested within the Model M’ (i.e., the unconstrained model) if M is obtained by imposing constraints on the parameters of M,’ which is often referred to as the parent model. Calculating a ΔDeviance statistic is similar to the calculation process of a nested chi-square comparison (Δχ²) in that ΔDeviance = (−2 × LL _{Constrained model}) − (−2 × LL _{Unconstrained model}) and Δdf = FP _{Unconstrained model} – FP _{Constrained model}. In general, the significant p-value of ΔDeviance indicates that the model with smaller deviance value fits better than the model with larger deviance value, whereas the non-significant p-value of ΔDeviance indicates that the model with larger deviance value fits better. Also, smaller AIC and BIC values can be examined as evidence of the preferred model.

Empirical example of a categorical LGCM with binary response variables: Marital instability

This section uses empirical data to illustrate how model parameters of a categorical LGCM with five repeated binary items can be estimated using the logit transformation. Binary response data were used from the Iowa Youth and Family Project (IYFP) (PI: R. D. Conger). The IYFP is a longitudinal panel study of 451 youths (52% female) and their families from two-parent households in the Midwest. Additional information regarding the study procedures is available from Conger and Conger (2002). For our example, categorical analyses are based on husbands’ and wives’ responses to one marital instability item (“thinking about divorce”) in 1989 (Wave 1), 1990 (Wave 2), 1991 (Wave 3), 1992 (Wave 4), and 1994 (Wave 6). Response options included “never in the last year,” “yes, within the last year,” “yes, within the last 6 months,” and “yes, within the last 3 months” (coded as 0 to 3). The original 4-point scale was recoded with 0 = never and 1 = yes ( > 0; i.e., thoughts of divorce at any point in the previous year). Factor loadings for the slope factor were fixed to 0, 1, 2, 3, and 5 (because the measurement occasions were not equally spaced).

Results.

As in the case of the categorical CFA model, we investigated the model fit indices and model parameters. At Wave 1, around half (64.1%) of the wives reported that they had thought about divorce during the past year. This proportion decreased over time (31.6%, 30.9%, 31.2%, and 28.9% for Wave 2 to Wave 6, respectively). In order to evaluate model-fit by using fit indices, we compared the two competing nested models: (a) the random intercept and slope model (i.e., the linear categorical LGCM; the unconstrained model) and (b) the random intercept model¹ (the constrained model; Curran, Obeidat, & Losardo, 2010). The results showed that the random intercept and random slope model (−2LL, FP = 2287.50, 5; AIC / BIC = 2297.71 / 2318.27; unconstrained model) had smaller IC statistics compared to the model with random intercept (−2LL, FP = 2421.72, 2; AIC / BIC = 2425.71 / 2433.93; constrained model). Also, the p-value of ΔDeviance test was statistically significant (Δ-2LL, Δdf = 134.22, 3, p < .001), indicating that the linear categorical LGCM was a better fit to the data compared to the random intercept model. As mentioned in the previous section, in a similar manner, several potential non-linear trajectories can also be compared to the linear categorical LGCM (e.g., quadratic trajectories). For the illustrative purpose, we used a linear categorical LGCM as the optimal model. Overall, the results indicated that there was an inter-individual variability in intra-individual patterns of change over time for “thinking about divorce.”

Next, we investigated the growth parameters of the categorical LGCM (see the left column of Table 1). The mean of the intercept was not significantly different from 0 (α₀₀ = .11, p = .32). However, the mean of slope was significant (α₁₀ = −.57, p < .001), indicating an odds-ratio (OR) = .57. That is, there was a 43% (= 100 × (exp[−.57]−1)) decrease in the odds of “thinking about divorce” for a 1-year time change. Moreover, the growth factor variances were statistically significant (ψ₀₀ = 1.49, p < .01; ψ₁₁ = .40, p < .01), suggesting that the trajectories of “thinking about divorce” varied across the sample of wives (i.e., this shows inter-individual variation in individual’s trajectories). In addition, the positive covariance between growth factors was significant (ψ₁₀ =.55, p < .001), suggesting that wives who had a higher propensity of “thinking about divorce” at the first measurement occasion tended to show a slower decrease in the propensity to think about divorce over the 6-year time span captured in the analysis.

TABLE 1.

Parameter Estimates and Fit Statistics for the LGCMs

Latent Growth Curve Model (LGCM)
Estimator	ML	ML	ML

Link-function	Logit link	Probit link	Probit link

Indicator	Binary	Binary	Ordinal

Parameterization	–	Theta	Theta

Residuals	Fixed to π² / 3	Fixed to 1	Fixed to 1

Means
α₀₀	.11 (.10)	.04 (.06)	.03 (.06)
α₁₀	−.57^*** (.07)	−.32^*** (.04)	−.27^*** (.03)
Variance and Covariance
ψ₀₀	1.49^** (.52)	.57^** (.18)	.68^*** (.14)
ψ₁₁	.40^** (.14)	.13^** (.05)	.05^** (.02)
ψ₁₀	.55^*** (.13)	.18^*** (.05)	.10^** (.10)
Thresholds
τ₁	= .00	= .00	= .00
τ₂	–	–	1.08 (.05)
τ₃	–	–	1.77 (.07)
Fit statistics
Deviance (−2LL)	2287.50	2289.24	3742.71
FP	5	5	7
AIC / BIC	2297.71 / 2318.27	2299.24 / 2319.79	3756.71 / 3785.49

Open in a new tab

Note. ML = Maximum Likelihood. −2LL= −2 log-likelihood value. FP = Numbers of free parameters. AIC = Akaike Information Criterion. BIC = Bayesian Information Criterion. Unstandardized coefficients are shown with standard errors in parentheses.

^**

p < .01.

^***

p < .001.

A categorical latent growth curve model with time-invariant covariates

The individual differences of growth parameters can be modeled as a function of an individual, time-invariant covariate, or predictor, W_1i (multiple covariates are possible). These differences are quantified by regression coefficients γ₀₁ and γ₁₁ representing the association/influence of the predictor on the intercept and slope, respectively. The time-invariant predictor W_1i can be added to the Equation 6.1 and 6.2, as follows:

η_{0 i} = α_{00} + γ_{01} \times W_{1 i} + ζ_{0 i}

(8.1)

η_{1 i} = α_{10} + γ_{11} \times W_{1 i} + ζ_{1 i}

(8.2)

The advantage of utilizing a SEM approach is that this approach allows for the prediction of a subsequent outcome or response, D_1i, by growth factors (intercept and slope) and other predictors within the same analytical framework. The levels of the outcome can be expressed as a function of the growth parameters and predictor(s) and can be written as:

D_{i} = β_{0} + β_{1} \times η_{0 i} + β_{2} \times η_{0 i} + γ_{1} \times W_{1 i} + ε_{i}, where ε_{i}, ~ N (0, σ^{2})

(9)

β₀ is the intercept for the multiple regression of the outcome, D_i. β₁ and β₂ are the magnitudes of the coefficients linking the intercept parameter (η_0i) and the slope parameter (η_1i), respectively, to the outcome, D_i. γ₁ is the coefficient linking a time-invariant predictor, W_1i, to a time-invariant outcome, D_i. ε_i is the normal distributed residual including the fixed mean (set to 0) and variance (σ²). The model is now identical to a multiple regression estimating the adjusted effect of each predictor on an outcome after controlling for the effects of other covariates.

Mplus model specification for time-invariant covariates in the categorical LGCM.

In Mplus, covariates (i.e., predictors and outcomes) can be estimated by specifying the italicized and bolded syntax below to the existing categorical LGCM syntax:

VARIABLE:NAMES ARE Y1-Y5 X D;

USEVARIABLES ARE Y1-Y5 X D;

⋮

MODEL:

⋮

I S ON X; D ON I S X;

where X and D represent continuous covariates (X = a predictor and D = an outcome). The ON syntax defines the regression relationships between growth factors and covariates. For example, I S ON X represents that predictor X is regressed on the dependent variables I and S, which are two growth factors (i.e., initial level and slope) in a binary LGCM.

Results.

To demonstrate our example conditional LGCM model with binary outcomes, we used two continuous measures from wives’ reports as a time-invariant predictor and outcome, respectively: (a) family-work conflict at Wave 1 (a summed score of two items with a mean [SD] of 5.13 [1.26] and a skewness of −.10) and (b) self-report of global mental health at Wave 6 (a single item with a mean [SD] of 2.14 [.86] and a skewness of .53). Wives’ family-work conflict did not predict the likelihood of their thinking about divorce at the first measurement occasion (γ₀₁ = .04, p = .71), but it positively predicted the slope parameter (γ₁₁ = .11, p < .05). The results indicated that wives who perceived more work-family conflict at Wave 1 were more likely than those who perceived less work-family conflict to exhibit increases in their propensity to think about divorce over time. In terms of the outcome model, the slope of “thinking about divorce” was positively related to wives’ mental health problems at Wave 6 (β₂ = .30, p < .05), after adjusting for the effects of the intercept (β₁ = .09, p = .06) and early family-work conflict (γ₁ = .05, p = .21). These results indicated that a wife with a greater increase in the likelihood of “thinking about divorce” over time tended to reporter more mental health problems at Wave 6.

Applying Probit Transformation for Categorical LGCM

In the categorical SEMs, probit transformation assumes the latent response variable Y* is normally distributed (i.e., standard normal distribution)². Therefore, probit transformation produces the standard normal z-scores for the continuous latent response variable Y* in place of the logit values acquired by a logit transformation. Note that the estimated logit value is approximately equal to 1.81 times the probit value, which allows for an easy transformation from probit to logit values and vice versa. The probit transformation of a categorical response Y for a categorical CFA (see panel b of Figure 1) is as follows:

{Y_{i}}^{*} = Probit (Y_{i} = 1 ∣ η_{i}) = - τ_{1} + λ \times η_{i} + ε_{i}, ε_{i} ~ N (0, θ)

(10)

where τ₁ is a threshold, defining the continuous latent response variable Y_i* (see Equation 1). λ is the estimated factor loading of an indicator on the latent factor, η_i. ε_i is the residual of Y_i*, including the mean (set to 0) and variance (θ).

In a SEM framework, two parameterizations are commonly used: (a) theta and (b) delta parameterizations for probit models (Muthén & Asparouhov, 2002). In the theta parameterization, the residual variance θ is defined by fixing its variance to 1. Thus, the residual variance θ is directly specified and is fixed at 1 (e.g., ε_i ~ N (0, 1)). This is how probit regression models are parameterized when using ML estimation in other statistical packages (e.g., SAS, and SPSS). Instead, in the delta parameterization, the total variance of Y_i* is defined by fixing it to 1 (e.g., (λ² × ψ + ε_i) ~ N (0, 1)). Therefore, the residual variance θ is not directly specified, but indirectly calculated using a scale factor $Δ = \frac{1}{σ^{*}}$ where σ*= a standard deviation of Y* (equivalent to $\sqrt{λ^{2} \times ψ + θ}$ ). Typically, this scale factor is fixed to 1, which produces the residual variance θ as 1 – explained variance of Y* (equivalent to standardized λ²).

In latent growth curve modeling using probit transformation, two parameterizations (theta vs. delta) produce slightly different parameter estimates because different identification constraints are imposed (see Grimm & Liu, 2016 for a more detailed discussion of estimation challenges). In this tutorial, we focus on the theta parameterization, because it is a standard probit model specification. Now, the probit value of binary response variable Y_i changes linearly as a function of the latent factor (η_i) with the probit regression coefficient, λ. Similar to interpreting a standard regression model, λ can be interpreted as the expected change in the probit values of Y_i* for a one-unit difference in η_i, which can be converted to probabilities. Moreover, a standardized coefficient, λ, can make interpreting model parameters between the continuous latent response variable, Y*, and the latent variable, η, simpler (i.e., the correlation [r-matrix]; Toland, 2014). These standardized parameters correspond to the effect size estimates (Kline, 2011).

In addition, the probit transformation can be used under either the (a) ML estimator or (b) weighted least squares (WLS) estimator (or weighted least squares means and variances, WLSMV) for the model estimation. The WLS estimator uses limited information methods, which eases computational burden (Finney & DiStefano, 2006). Therefore, the WLS estimator is preferable compared to ML estimation in SEM. However, some researchers opt for ML estimation because it produces more efficient parameter estimates (Edwards, 2010). For this reason, the illustrative examples for the categorical LCGMs were estimated with ML estimation. With ML estimation, probit transformation provides the same types of model fit indices as logit transformation. For model comparisons, the ∆Deviance test (which we describe in the model evaluation of the logit model) can be also used with a probit model with ML estimation.

Using the model specification shown in figure 3, we demonstrated the univariate latent growth curve model with same five repeated dichotomized (binary) items (‘thinking about divorce’) of the marital instability measure with probit transformation. The Mplus commands for the categorical LGCM (using probit transformation) are identical to commands with the categorical LGCM using logit transformation with the exception of the LINK option (i.e., LINK=PROBIT) in the ANALYSIS command. The results (i.e., model fit indices and growth parameters) were similar to those in the categorical LGCM with logit link function (see the middle column of Table 1). Similar to the categorical LGCM with logit link function, the growth parameters for specific response with probit link function can also be converted to corresponding percentages (%) of response categories (see the formula in the appendix A; Grimm & Liu, 2016).

LRV Transformation of Ordinal Response Variables.

As mentioned in the introduction, LRV transformation can be extended to a model with ordinal variables. If the response variable is comprised of more than two response categories, according to LRV formulation, there should be more than one threshold. More specifically, the number of thresholds should equal one less than the number of response variable categories. Panel b of Figure 2 shows how an ordinal variable with four categories (ranged from 0 to 3) can be converted into a latent response variable, Y_i*. The observed Y_i value is 0 when Y_i* is less than or equal to threshold τ₁. The observed Y_i = 1 when Y_i* is greater than τ₁ but less than or equal to τ₂. Y_i = 2 when Y_i* is greater than τ₂ but less than or equal to τ_3,. The observed Y_i = 3, when Y_i* exceeds the threshold τ₃ (Masyn et al., 2013).

In the ordinal response, thresholds reflect the “distance” between categories. Thus, the greater distance between response thresholds reflects higher likelihood of being in one category rather than the other categories, whereas lesser distance between thresholds indicates a lower likelihood of being in one category rather than the others. As can be seen in panel b of Figure 2, most individuals responded 0; consequently, the greatest increase is noted for τ₁ (from the left side of the curve), followed by τ₂ (from τ₁), and τ₃ (from τ₂).

Extending Probit Transformations to a Latent Growth Curve with Ordinal Indicators.

The probit transformation for ordinal response variables, an extension of the binary probit model, provides the cumulative probit model. That is, for an ordinal variable with J categories, the probit model represents the propensity of being a higher category j (> j) compared to a member of category j or a lower category (≤ j). Based on the standard probit model with a binary item (see Equation 10), the ordinal probit regression equation linking the latent response variable Y_i* to the latent factor η_i can be expressed as:

{Y_{i}}^{*} = Probit (Y_{i} > j ∣ η_{i}) = - τ_{j + 1} + λ \times η_{i} + ε_{i}

(11)

where j = 0, 1, 2, …, J−1, τ₀ = −∞, and τ_J = ∞. Similar to the interpretation of the coefficient in the binary probit model, λ is interpreted as the difference of probit values for responding above category j for every one-unit increase in a latent variable, η_i. Probit values with ordinal variables can also be converted to the probabilities for the specific category. Given that ordinal variables produce cumulative probabilities, the probability of a specific category j can be calculated as:

Prob (Y_{i} = j ∣ η_{i}) = Prob ({Y_{i}}^{*} > τ_{j} ∣ η_{i}) - Prob ({Y_{i}}^{*} > τ_{j + 1} ∣ η_{i}),

(12)

In this instance, the continuous LRV, Y_i*, of an ordinal variable is used as the indicator of the categorical LGCM. For the same reason that the threshold values (τ) are fixed to be equal across time points in the categorical LGCM with repeated binary variables (i.e., a time-invariant, τ₁), the multiple thresholds within each ordinal variable should be fixed to be equal across time points (e.g., τ₁, τ₂, ⋯, τ_J−1) to meet the longitudinal threshold assumption. In Mplus, the CATEGORICAL option (under the ANALYSIS command) can be used to specify both binary and ordinal variables. Consequently, specifying repeated ordinal variables in the categorical LCGM is identical to model specification of the LGCM with binary variables in Mplus program. In next section, we will demonstrate the univariate categorical LGCM with the original response scale (i.e., 4 responses) of same items (‘thinking about divorce’) on the marital instability.

Results.

Across the five time points, the observed proportion of respondents in Category 0 (= did not think about divorce in the last year) increased from 35.9% at Wave 1 to 68.8% at Wave 5. In contrast, the proportion of respondents in Category 1 (= thought about divorce within the last year) decreased from 32.4% at Wave 1 to 20.5% at Wave 5. Category 2 (= thought about divorce within the last 6 months) decreased from to.26.6% to 5.1%. For Category 3 (= thought about divorce within the last 3 months), the proportions were small, but stable, over time (ranged from 5.1% at Wave 1 to 5.6% at Wave 4). Overall, these response frequencies imply that the likelihood of “thinking about divorce” generally decreased over time. Regarding model evaluation, as previously discussed with the categorical LGCM with logit transformation, comparison of incremental models showed that the random intercept and random slope model (linear LGCM; −2LL, FP = 3742.71, 7; AIC / BIC = 3756.71 / 3785.49; unconstrained model) was a better fit to the data than the random intercept model (−2LL, FP =3865.72, 4; AIC / BIC = 3873.72 / 3890.17; constrained model) with a significant p-value for the ∆Deviance test (Δ-2LL, Δdf = 123.01, 3, p < .001).

Next, we examined this model’s growth parameters (see the last column of Table 1). As discussed previously, ordinal variables produce cumulative response probabilities (with multiple thresholds). In the LGCM with repeated ordinal variables, the initial level growth factor mean, α00, now coincides that the expected proportion of respondents “thinking about divorce” is greater than category 0 (= Never in the last year). Given by the definition − τ₁ + α₀₀ in the categorical LGCM, the mean of initial level growth factor, .03, is identical to the first threshold, τ₁, −.03. Consequently, the three estimated thresholds were −.03, 1.08, and 1.77 for τ₁ to τ₃, respectively. These multiple thresholds provide information on the expected item-response proportions (i.e., probabilities) of wives’ responses on marital instability at Wave 1. These thresholds can also be converted into the expected proportions (or probabilities) of the item at the first assessment using the formula in the Equation 12 and appendix A. Our calculations showed that the expected proportions of responses 0, 1, 2, and 3 at the first measurement occasion (i.e., initial levels) were 49.1% ( $= 1 - F [\frac{.03}{\sqrt{.68 + 1.00}}]$ ), 30.7% ( $= F [\frac{.03}{\sqrt{.68 + 1.00}}] - F [\frac{- 1.08}{\sqrt{.68 + 1.00}}]$ ), 11.7% ( $= F [\frac{- 1.08}{\sqrt{.68 + 1.00}}] - F [\frac{- 1.77}{\sqrt{.68 + 1.00}}]$ ), and 8.6% ( $= F [\frac{- 1.77}{\sqrt{.68 + 1.00}}]$ ), respectively. Additionally, the variance of the latent variable intercept, η_0i was .68 and was significantly different from 0 (p < .001), indicating that wives varied in their propensity to marital instability at the initial level.

The mean of the slope factor was −.27, indicating that the average response propensity decreased .27 units per year. This decreasing mean trajectory indicates that, on average, wives’ propensity to “think about divorce” decreases over time. The variance of the latent variable slope was 0.05 and differed significantly from 0, indicating that there was inter-individual variation across wives in their propensity change over time. Finally, the covariance between the intercept and slope of the latent variable was .10 (also statistically different from 0), suggesting that wives who had higher response propensities at the first measurement occasion generally experienced a slower rate of change in their response propensity over time compared to wives with a low response propensity at Wave 1.

Categorical parallel process models (Categorical PPM) with categorical LGCMs

In order to investigate the dyadic association of marital attributes between wives and husbands over time, a categorical LGCM can be extended to a categorical parallel process model (hereafter referred to as “categorical PPM”) by using either the probit transformation or the logit transformation. To demonstrate the categorical PPM with ordinal outcomes, our example categorical PPM was estimated using the probit transformation. Similar to a parallel process model with continuous variables (Wickrama, Lee, O’Neal & Lorenz, 2016), the categorical PPM contains two separated categorical LGCMs as follows:

\underline{For wives :} {Y_{ti}}^{*} = Probit (Y_{i} > j ∣ η^{(w)}) = - τ_{j + 1}^{(w)} + η_{0 i}^{(w)} + λ_{t}^{(w)} \times η_{1 i}^{(w)} + ε_{ti}^{(w)}

(13)

\begin{matrix} η_{0 i}^{(w)} = α_{00}^{(w)} + ζ_{0 i}^{(w)}, & ζ_{0 i}^{(w)} ~ N (0, Ψ_{00}^{(w)}) \\ η_{1 i}^{(w)} = α_{10}^{(w)} + ζ_{1 i}^{(w)}, & ζ_{1 i}^{(w)} ~ N (0, Ψ_{11}^{(w)}) \end{matrix}

\underline{For husbands} : {Z_{ti}}^{*} = Probit (Z_{i} < j ∣ η^{(h)}) = - τ_{j + 1}^{(h)} + η_{0 i}^{(h)} + λ_{t}^{(h)} \times η_{1 i}^{(h)} + ε_{ti}^{(h)}

(14)

\begin{matrix} η_{0 i}^{(h)} = α_{00}^{(h)} + ζ_{0 i}^{(h)}, & ζ_{0 i}^{(h)} ~ N (0, Ψ_{00}^{(h)}) \\ η_{1 i}^{(h)} = α_{10}^{(h)} + ζ_{1 i}^{(h)}, & ζ_{1 i}^{(h)} ~ N (0, Ψ_{11}^{(h)}) \end{matrix}

As can be seen in Equations 13 and 14, the parallel process model is estimated using two primary growth curve models identified by the latent response variables Y_ti* and Z_ti*. The unique feature of a categorical PPM is its ability to specify the variance and covariance structures among latent growth factors (i.e., $η_{0 i}^{(w)}$ , $η_{1 i}^{(w)}$ , $η_{0 i}^{(h)}$ , and $η_{1 i}^{(h)}$ ), which is shown in panel a of Figure 4. Using the probit transformation with ML estimation, the Mplus commands for the example categorical PPM with ordinal response variables are as follows:

⋮

VARIABLE:NAMES ARE Y1-Y5 Z1-Z5;

USEVARIABLES ARE Y1-Y5 Z1-Z5;

CATEGORICAL VARIABLES ARE Y1-Y5 Z1-Z5;

MISSING = ALL (999);

ANALYSIS: ESTIMATOR=ML;

LINK=PROBIT;

MODEL:

I1 S1 | Y1@0 Y2@1 Y3@2 Y4@3 Y5@4;

I2 S2 | Z1@0 Z2@1 Z3@2 Z4@3 Z5@4;

[I1 I2]; [Y1$1-Y5$1@0]; [Z1$1-Z5$1@0];

I1 WITH S1-S2; S1 WITH I2-S2; I2 WITH S2;

OUTPUT: STANDARDIZED;

In the model commands, two sets of categorical latent growth curve models (I1 S1|… and I2 S2|…; [I1 I2]; [Y1$1-Y5$1@0]; [Z1$1-Z5$1@0]) are specified to estimate growth parameters for within each dyad member group (i.e., wives’ and husbands’ categorical LGCMs). All covariances among growth factors both within and across dyad members are estimated by a series of WITH statements. The OUTPUT command contains an added keyword: STANDARDIZED. This option instructs Mplus to include standardized parameter estimate values and their standard errors in addition to the default unstandardized values in the output. With ML estimation, estimating growth parameters in the categorical PPM sometimes increase the computational burden as a function of the number of categorical variables. Muthén and Muthén (1998-2012) suggests using the INTEGRATION=MONTECARLO option (500 integration points by default) in the ANALYSIS command. This option will reduce the number of integration points, which may save computational time. For our illustrative purpose, we estimated the model with the ordinal items of marital instability measure using both wives’ and husbands’ reports. The results are shown in panel b of Figure 4.

A categorical parallel process model (Categorical PPM).

*Note*. M = mean. V = Variance. COV = Covariance. Repeated measures are not shown in figure.

For wives’ growth model, the thresholds at the first assessment were −.04, 1.12, and 1.82 (corresponding item-response proportions = 48.9%, 30.7%, 11.4%, 9.0% for category 0 to 3, respectively). For husbands’ growth model, the thresholds at the first assessment were .02, 1.26, and 2.00 (corresponding item-response proportions = 50.6%, 31.0%, 10.7%, 7.7% for category 0 to 3, respectively). Both wives and husbands decreased in their response propensities for “thinking about divorce” over time ( $α_{10}^{(w)} = - .29$ ; $α_{10}^{(h)} = - .37$ ). However, the variances of all growth factors (i.e., initial level and rate of change for wives and husbands) were statistically significant, indicating the existence of inter-individual differences in these trajectories ( $Ψ_{00}^{(w)} = .84$ ; $Ψ_{11}^{(w)} = .08$ ; $Ψ_{00}^{(h)} = .96$ ; $Ψ_{11}^{(h)} = .09$ ). Positive covariances between intercept and slope growth factors were significant in the growth models for within each dyadic group ( $Ψ_{10}^{(w)} = .07$ ; $Ψ_{10}^{(h)} = .12$ ), which suggests that both wives and husbands who had higher response propensities for “thinking about divorce” at the first occasion tended to show slower decreases in their response propensity over time. Additionally, the two significant associations across dyadic groups (i.e., associations between wives’ and husbands’ growth factors) were found: (a) between wives’ and husband’s intercept factors; and (b) between wives’ and husband’s slope factors. The positive associations between intercept factors ( $Ψ_{00}^{(h) (w)} = .72$ ) indicated that wives with high initial levels of ‘thinking about divorce’ also tended to have husbands with high initial levels of ‘thinking about divorce’. The positive associations between slope factors ( $Ψ_{11}^{(h) (w)} = .07$ ) indicators that the linear trajectory for wives was positively associated with the linear trajectory for husbands, suggesting the parallel associations between wives’ and husbands’ reports on the “thinking about divorce” item.

A PPM allows for the specification of residual correlations taken at the same time across dyad members (between-dyads correlations) along with residual correlations within dyads across time (within-dyad correlations [within-time correlations]; Wickrama et al., 2016), which may affect model fit, estimated parameters, and their standard errors. These residual structures can also be specified by the categorical latent growth curve models using theta parameterization under the WLS (or WLSMV) estimator. In Mplus, the residual correlations can be specified by a WITH option (e.g., Y1 WITH Y2 and Y1 WITH Z1 for within- and between-dyad correlations, respectively) using a theta parameterization under the WLS estimator (ANALYSIS: ESTIMATOR=WLS (or WLSMV); PARAMETERIZATION = THETA). We recommend first estimating a PPM without specifying a residual structure (where the residual variances are fixed to 1 across time; Y1-Y5@1; Z1-Z5@1;), then researchers can explore if it is necessary to specify the residual structure. This is our recommendation because specifying the residual structures in PPM increases model complexity, which may result in an improper solution (i.e., convergence problems) or impossible parameter estimates. For researchers who are interested in specifying residual structures in a latent growth curve model with ordinal variables, more detailed information is provided in Grimm and Liu (2016).

CONCLUDING REMARKS

Social behavioral researchers often need to describe and analyze changes in behavioral or psychological attributes with repeated categorical outcomes. In the present article, using the structural equation modeling framework, we have illustrated categorical latent growth curve modeling for repeated categorical responses based on logit and probit transformation strategies. We have presented step-by-step procedures for categorical LGCM with corresponding Mplus syntax. We also extended on univariate modeling by incorporating time-invariant covariates (i.e., predictors and outcomes) and time varying covariates (i.e., parallel process model) to the categorical LGCMs. These modeling illustrations are useful for social behavioral researchers to test important hypotheses involving analysis of change in categorical outcomes.

Acknowledgments

During the past several years, support for this research has come from multiple sources, including the National Institute of Mental Health (MH00567, MH19734, MH43270, MH48165, MH51361), The National Institute on Drug Abuse (DA05347), the Bureau of Maternal and Child Health (MCJ-109572), the Macarthur Foundation Research Network on Successful Adolescent Development Among Youth in High-Risk Settings, and the Iowa Agriculture and Home Economics Experiment Station (Project 3320).

Appendix A.

Formula to convert probit values of being Y_t > category j to the expected proportion in the linear categorical LGCM (i.e., random intercept and random slope model).

Prob (Y_{t} > j ∣ η) = Prob ({Y_{t}}^{*} > τ_{j + 1} ∣ η) = F [\frac{- τ_{j + 1} + (λ_{t} \times α_{10})}{\sqrt{ψ_{00} + (ψ_{11} \times λ_{t}^{2}) + (2 \times λ_{t} \times ψ_{10}) + θ}}]

Note. F = the standard normal distribution function (i.e., z-table). Item / variable response category j = 0, 1, 2, …, J-1. Time t = 0, 1, 2, …, T.

Footnotes

Random intercept model is the baseline model for the growth curve model. In Mplus, the model can be specified by using I | Y1@0 Y2@1 Y3@2 Y4@3 Y5@5; under the model command.

Prob (Y_i = 1) → Φ⁻¹ (Prob (Y_i = 1)) → Probit Y_i (or z-score) = Latent response variable Yi* where Prob (Y_i =1) is the probability of being Y_i =1 for individual i. The inverse standard normal function Φ⁻¹ of Prob (Y_i =1) produces probit values which can be converted to standard normal value z-score. Like the logit model, the probit values yields a threshold, τ₁ which represents continuous latent response variable Y_i*.

REFERENCES

Agresti A (2002). Categorical Data Analysis (2nd ed.). New York: John Wiley & Sons. [Google Scholar]
Allen J, & Le H (2008). An additional measures of overall effect size for logistic regression models. Journal of Educational and Behavioral Statistics, 33, 416–441. [Google Scholar]
Conger RD, & Conger KJ (2002). Resilience in midwestern families: selected findings from the first decade of a prospective, longitudinal study. Journal of Marriage and the Family, 64, 361–373. [Google Scholar]
Curran P, Obeidat K, & Losardo D (2010). Twelve frequently asked questions about growth curve modeling. Journal of Cognition and Development, 11, 121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edwards MC (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75, 474–497. [Google Scholar]
Finney SJ, & DiStefano C (2006). Nonnormal and categorical data in structural equation models In Hancock GR & Mueller RO (Eds.). A second course in structural equation modeling (pp. 269–314). Greenwich, CT: Information Age. [Google Scholar]
Geiser C (2012). Data Analysis with Mplus. New York, NY: Guildford Press. [Google Scholar]
Grimm K, & Liu Y (2016). Residual structure in growth models with ordinal outcomes. Structural Equation Modeling, 23, 466–475. [Google Scholar]
Kline R (2011). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford Press. [Google Scholar]
Masyn KE, Petras H, & Liu W (2013). Growth curve models with categorical outcomes In Bruinsma G & Weisburd D (Eds.), Encyclopedia of Criminology and Criminal Justices (pp. 2013–2025). New York: Springer Verlag. [Google Scholar]
McTernan M, & Blozis SA (2015). Longitudinal models for ordinal data with many zeros and varying numbers of response categories. Structural Equation Modeling, 22, 216–226. [Google Scholar]
Mehta PD, Neale MC, & Flay BR (2004). Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychological Methods, 9, 301–333. [DOI] [PubMed] [Google Scholar]
Muthén BO (2001). Latent variable mixture modeling In Marcoulides GA & Schumacker RE (Eds.), New developments and techniques in structural equation modeling (pp. 1–34). Mahwah, NJ: Lawrence Erlbaum Associates. [Google Scholar]
Muthén B, & Asparouhov T (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. Retrieved from http://www.statmodel.com/download/webnotes/CatMGLong.pdf [Google Scholar]
Muthén LK, & Muthén BO (1998-2012). Mplus user’s guide (7th ed.). Los Angeles: Muthén & Muthén. [Google Scholar]
Raudenbush SW, & Bryk AS (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. [Google Scholar]
Rupp AA, Templin J, & Henson RA (2010). Diagnostic measurement theory, methods, and applications. New York, NY, Guildford. [Google Scholar]
Skrondal A, & Rabe-Hesketh S (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
Toland MD (2014). Practical guide to conducting an item response theory analysis. Journal of Early Adolescence, 34, 120–151. [Google Scholar]
Wickrama KAS, Lee TK, O’Neal CW, & Lorenz FO (2016). Higher-order growth curves and mixture modeling with Mplus: A practical guide. New York, NY, Routledge. [Google Scholar]

[R1] Agresti A (2002). Categorical Data Analysis (2nd ed.). New York: John Wiley & Sons. [Google Scholar]

[R2] Allen J, & Le H (2008). An additional measures of overall effect size for logistic regression models. Journal of Educational and Behavioral Statistics, 33, 416–441. [Google Scholar]

[R3] Conger RD, & Conger KJ (2002). Resilience in midwestern families: selected findings from the first decade of a prospective, longitudinal study. Journal of Marriage and the Family, 64, 361–373. [Google Scholar]

[R4] Curran P, Obeidat K, & Losardo D (2010). Twelve frequently asked questions about growth curve modeling. Journal of Cognition and Development, 11, 121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Edwards MC (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75, 474–497. [Google Scholar]

[R6] Finney SJ, & DiStefano C (2006). Nonnormal and categorical data in structural equation models In Hancock GR & Mueller RO (Eds.). A second course in structural equation modeling (pp. 269–314). Greenwich, CT: Information Age. [Google Scholar]

[R7] Geiser C (2012). Data Analysis with Mplus. New York, NY: Guildford Press. [Google Scholar]

[R8] Grimm K, & Liu Y (2016). Residual structure in growth models with ordinal outcomes. Structural Equation Modeling, 23, 466–475. [Google Scholar]

[R9] Kline R (2011). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford Press. [Google Scholar]

[R10] Masyn KE, Petras H, & Liu W (2013). Growth curve models with categorical outcomes In Bruinsma G & Weisburd D (Eds.), Encyclopedia of Criminology and Criminal Justices (pp. 2013–2025). New York: Springer Verlag. [Google Scholar]

[R11] McTernan M, & Blozis SA (2015). Longitudinal models for ordinal data with many zeros and varying numbers of response categories. Structural Equation Modeling, 22, 216–226. [Google Scholar]

[R12] Mehta PD, Neale MC, & Flay BR (2004). Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychological Methods, 9, 301–333. [DOI] [PubMed] [Google Scholar]

[R13] Muthén BO (2001). Latent variable mixture modeling In Marcoulides GA & Schumacker RE (Eds.), New developments and techniques in structural equation modeling (pp. 1–34). Mahwah, NJ: Lawrence Erlbaum Associates. [Google Scholar]

[R14] Muthén B, & Asparouhov T (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. Retrieved from http://www.statmodel.com/download/webnotes/CatMGLong.pdf [Google Scholar]

[R15] Muthén LK, & Muthén BO (1998-2012). Mplus user’s guide (7th ed.). Los Angeles: Muthén & Muthén. [Google Scholar]

[R16] Raudenbush SW, & Bryk AS (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. [Google Scholar]

[R17] Rupp AA, Templin J, & Henson RA (2010). Diagnostic measurement theory, methods, and applications. New York, NY, Guildford. [Google Scholar]

[R18] Skrondal A, & Rabe-Hesketh S (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]

[R19] Toland MD (2014). Practical guide to conducting an item response theory analysis. Journal of Early Adolescence, 34, 120–151. [Google Scholar]

[R20] Wickrama KAS, Lee TK, O’Neal CW, & Lorenz FO (2016). Higher-order growth curves and mixture modeling with Mplus: A practical guide. New York, NY, Routledge. [Google Scholar]

PERMALINK

Application of Latent Growth Curve Analysis with Categorical Responses in Social Behavioral Research

Tae Kyoung Lee, Ph.D

Kandauda (KAS) Wickrama, Ph.D

Catherine W O’Neal, Ph.D

Abstract

LATENT RESPONSE VARIABLE TRANSFORMATION

FIGURE 1.

LRV transformation of a Binary Response Variable

Transformation of a binary response variable assuming a standard logistic distribution.

FIGURE 2.

The association between logistic coefficients and probabilities of Y_i being 1.

Extending logit transformation to latent growth curves with binary indicator variables.

FIGURE 3.

Mplus Model Specification for the Categorical LGCM

Model fit evaluation.

Empirical example of a categorical LGCM with binary response variables: Marital instability

Results.

TABLE 1.

A categorical latent growth curve model with time-invariant covariates

Mplus model specification for time-invariant covariates in the categorical LGCM.

Results.

Applying Probit Transformation for Categorical LGCM

LRV Transformation of Ordinal Response Variables.

Extending Probit Transformations to a Latent Growth Curve with Ordinal Indicators.

Results.

Categorical parallel process models (Categorical PPM) with categorical LGCMs

FIGURE 4.

CONCLUDING REMARKS

Acknowledgments

Appendix A.

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Application of Latent Growth Curve Analysis with Categorical Responses in Social Behavioral Research

Tae Kyoung Lee, Ph.D

Kandauda (KAS) Wickrama, Ph.D

Catherine W O’Neal, Ph.D

Abstract

LATENT RESPONSE VARIABLE TRANSFORMATION

FIGURE 1.

LRV transformation of a Binary Response Variable

Transformation of a binary response variable assuming a standard logistic distribution.

FIGURE 2.

The association between logistic coefficients and probabilities of Yi being 1.

Extending logit transformation to latent growth curves with binary indicator variables.

FIGURE 3.

Mplus Model Specification for the Categorical LGCM

Model fit evaluation.

Empirical example of a categorical LGCM with binary response variables: Marital instability

Results.

TABLE 1.

A categorical latent growth curve model with time-invariant covariates

Mplus model specification for time-invariant covariates in the categorical LGCM.

Results.

Applying Probit Transformation for Categorical LGCM

LRV Transformation of Ordinal Response Variables.

Extending Probit Transformations to a Latent Growth Curve with Ordinal Indicators.

Results.

Categorical parallel process models (Categorical PPM) with categorical LGCMs

FIGURE 4.

CONCLUDING REMARKS

Acknowledgments

Appendix A.

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The association between logistic coefficients and probabilities of Y_i being 1.