Abstract
Latent class analysis (LCA) is a statistical approach to identifying underlying subgroups (i.e. latent classes) of individuals based on their responses to a set of observed categorical variables. Latent transition analysis (LTA) extends this framework to longitudinal data in order to estimate the incidence of transitions over time in latent class membership. This study provides an introduction to LCA and LTA, including the use of grouping variables and covariates, and demonstrates the use of two SAS ® procedures (PROC LCA and PROC LTA) to fit these models. The empirical demonstration involved data from 457 women who participated in the Women's Interagency HIV Study (WIHS). First, LCA was used to identify drug use latent classes based on reported use of tobacco, alcohol, marijuana, crack/cocaine/heroin and other drugs. Second, LTA was used to estimate the incidence of transitions in drug use latent classes over a one-year period. Third, racial differences in initial drug use and transitions over time were examined using multiple-groups LTA. Fourth, the effect of participation in an alcohol or drug treatment program on initial latent class membership and transitions over time were examined using LTA with covariates. Measurement invariance across time and groups is examined.
Keywords: latent class analysis, latent transition analysis, drug use, longitudinal
Introduction
Modeling complex behavior such as substance use as a categorical latent variable can reveal important information about the behavior that would not otherwise be known. In many cases, complex behaviors cannot easily be assessed with a single questionnaire item or observational measure. It may be necessary to record multiple dimensions of behaviors or assess behavior from multiple informants in order to fully measure the construct. A categorical latent variable takes into account information from a set of available measures to divide the population into homogeneous subgroups with similar behavior profiles.
One statistical approach that has proven to be highly useful in recent years for modeling complex behavior is latent class analysis (LCA) [16]. Latent class theory provides the framework for measuring categorical latent variables, where two or more categorical observed variables serve as indicators of an underlying grouping variable [19, 26]. For example, LCA has been used to model a variety of complex behaviors including marijuana use [11], underage problem drinking [32], and weight management strategies [25].
Although LCA is an appropriate technique for modeling behavior at one time point, latent transition analysis (LTA) [16] extends this approach to include modeling transitions in behavior over two or more times. Unlike in growth curve modeling [5], where change over time is typically characterized by a mean-level function of time such as linear or quadratic growth, in LTA change is characterized by a transition probability matrix, where each element represents the probability of transitioning to a particular latent class at time t + 1 given latent class membership at time t. In other words, change is represented by movement over time between discrete stages.
Numerous studies in recent years have used LTA to model substance use. Perhaps the most common application to date has been to examine the substance use onset process through which adolescents and young adults transition [3, 13, 20, 21, 23]. LTA has also been used to shed light on numerous other related processes, for example transitions through the stages of change during smoking cessation [37], and on the association between adolescent substance use and transitions through stages of sexual risk behavior [22]. To date, the majority of these studies have been based on national samples such as the National Longitudinal Survey of Adolescent Health [36] and the National Longitudinal Survey of Youth [8].
Far less is known about the intersection of use of various drugs among important subpopulations such as women who are at high risk for HIV infection, and how stable such individuals are in this complex behavior over time. The overall goal of the present study is to examine patterns of drug use among a population of high-risk women and examine the associations between this behavior and several covariates. To address this goal we will follow these steps: 1) identify latent classes of drug use behavior at a single time point; 2) model transitions in this behavior over time; 3) estimate differences across racial groups in class prevalence and transitions over time; and 4) examine participation in an alcohol or drug treatment program as a predictor of initial drug use and transitions over time in the behavior.
Method
Participants
Data for the current study were from Visits 2 and 4 (referred to in this study as Times 1 and 2) of the public release of the Women's Interagency HIV Study (WIHS). The WIHS examines the impact of HIV infection on women in the United States. The WIHS collects data from HIV-positive women, including women previously diagnosed with clinical AIDS or women with low CD4+ cell counts, as well as HIV-negative women at high-risk for HIV infection for comparison [4]. Clinical centers for the WIHS are located in five large metropolitan cities, including New York City, Washington, Chicago, San Francisco, and Los Angeles. WIHS public release data are available at http://statepiaps.jhsph.edu/wihs/index.htm.
The sample used in the current study consisted of 457 HIV-negative women assessed in 1994–1996 (Time 1) and then assessed again approximately one year later in 1995–1997 (Time 2). All women provided at least one response to any of the drug use questions used as indicators in the latent class and latent transition models. Of the 457 women, 349 (76.4%) women provided data at both Time 1 and Time 2, 60 (13.1%) women provided data at Time 1 only, and 48 (10.5%) women provided data at Time 2 only. The sample comprised 54.7% African Americans, 15.8% Whites and 29.5% women from other race groups. The average age of participants was 34.3 years (SD=8.4 years).
Measures
At each time, five indicators of drug use in the past six months were created from seven survey questions asking about the average frequency of tobacco, alcohol, marijuana, crack, cocaine, heroin, and other drug use since the date of the last study visit (approximately 6 months prior). The following four indicators were recoded into three response categories: tobacco (1 = no cigarette use, 2 = less than a pack a day, 3 = pack a day or more); alcohol (1 = no use, 2 = less than 3–4 days a week, 3 = 3–4 days a week or more); marijuana (1 = no use, 2 = less than once a week, 3 = once a week or more); crack, cocaine or heroin (CCH) (1 = no use, 2 = less than once a week, 3 = once a week or more). The fifth indicator was dichotomized due to low base rates of use: other drugs (1 = no use, 2 = some use). Descriptive statistics for tobacco, alcohol, marijuana, CCH, and other drug use at Time 1 and Time 2 are shown in Table 1. Individuals were referred to a counselor if they reported using alcohol an average of 3–4 days a week or more frequently, or if they reported any use of marijuana, CCH, or other drugs during the past six months.
Table 1.
Indicator | Code | Label | Time 1 Frequency (Valid %) |
Time 2 Frequency (Valid %) |
---|---|---|---|---|
Tobacco | 1 | No use | 164 (40.2) | 158 (39.9) |
2 | Less than a pack a day | 159 (39.0) | 157 (39.6) | |
3 | Pack a day or more | 85 (20.8) | 81 (20.5) | |
. | Missing | 49 | 61 | |
Alcohol | 1 | No use | 202 (49.5) | 191 (48.2) |
2 | Less than 3–4 days a week | 160 (39.2) | 160 (40.4) | |
3 | 3–4 days a week or more | 46 (11.3) | 45 (11.4) | |
. | Missing | 49 | 61 | |
Marijuana | 1 | No use | 306 (75.4) | 299 (75.3) |
2 | Less than once a week | 51 (12.6) | 53 (13.4) | |
3 | Once a week or more | 49 (12.1) | 45 (11.3) | |
. | Missing | 51 | 60 | |
Crack, | 1 | No use | 317 (77.7) | 306 (77.3) |
Cocaine | 2 | Less than once a week | 41 (10.0) | 28 (7.1) |
or Heroin | 3 | Once a week or more | 50 (12.3) | 62 (15.7) |
. | Missing | 49 | 61 | |
Other | 1 | No use | 386 (94.4) | 372 (93.7) |
Drugs | 2 | Some use | 23 (5.6) | 25 (6.3) |
. | Missing | 48 | 60 |
A grouping variable for race was coded 1 for African Americans, 2 for Whites and 3 for other races. In addition, at each time, a covariate indicating participation in an alcohol or drug treatment program was created from two survey questions asking about being in such a program since the date of the last study visit (approximately 6 months prior). At Time 1, this covariate indicated participation in an alcohol or drug treatment program in the six-month period immediately prior (0 = no participation, 1 = participation). At Time 2, additional information was obtained so that this covariate indicated participation in an alcohol or drug treatment program in the year between Time 1 and Time 2 (0 = no participation, 1 = participation). At Time 1, 120 (29.3%) women reported participating in an alcohol or drug treatment program; at Time 2, 141 (36.9%) women reported participating.
Latent Class and Latent Transition Analysis
The fundamental latent class model posits that there are underlying (latent) subgroups in a population that can not be directly observed, but instead must be inferred from multiple categorical observed items. The mathematical model for LCA can be expressed as follows. Let yj represent element j of a response pattern y. Let us establish an indicator function I(yj = rj) that equals 1 when the response to variable j = rj, and equals 0 otherwise. Then the probability of observing a particular vector of responses is
(1) |
where γc is the probability of membership in latent class c and is the probability of response rj to item j, conditional on membership in latent class c. The γ parameters represent a vector of latent class membership probabilities that sum to 1. The ρ parameters represent a matrix of item-response probabilities conditional on latent class membership. In LCA, the ordering of the latent classes is arbitrary (typically determined only by a set of random starting values) and one must examine the item-response probabilities in order to interpret each latent class. This can create a situation referred to as the “label-switching problem” [12].
A critical issue that can arise in LCA is that in many cases the likelihood function is not defined by a single mode, but rather one maximum value and one or more local modes. We refer to this as a problem of model identification, as the problem can typically be resolved by reducing the number of unknowns (i.e. parameters) or increasing the amount of information (i.e. data). A well-identified model is one in which the estimation procedure finds the maximum likelihood solution regardless of the starting values that are provided; in this case the likelihood function is characterized by a single mode. If the model is not well identified, then different parameter estimates may be produced depending on the starting values. The likelihood function in cases like this is characterized by multiple modes, where local modes correspond to parameter estimates that are not the maximum likelihood solution. Model identification can be assessed by generating numerous sets of random starting values for the iterative estimation algorithm. Issues with model identification arise extremely frequently in LCA, particularly in models with few parameter restrictions imposed, thus this step of assessing identification in each competing model is critical.
Several criteria are available to select from a set of models with different numbers of latent classes. These include the G2 fit statistic, although with large degrees of freedom (df) as are common in LCA and LTA models the distribution of this test statistic is unknown; therefore a formal hypothesis test for overall model fit is not reported. A number of information criteria also can be used to compare models, including the AIC [2], BIC [33], CAIC [6], and a-BIC [34]. The entropy R2 provides an indication of the overall degree of classification uncertainty in the solution [9]. It also can be useful to calculate the percentage of times the maximum-likelihood solution was identified using multiple sets of random starting values for the estimation procedure; we will refer to this as the solution stability. Lower values are preferable for the G2 fit statistic, AIC, BIC, CAIC, and a-BIC, whereas higher values are better for the entropy R2 and solution stability. Finally, it is critical to assess the interpretability of the competing solutions to ensure that the selected model yields latent classes that are both meaningful and distinguishable from each other.
Multiple-groups LCA can be conducted in a straightforward manner, in much the same way that multiple-groups SEM is conducted. Both the latent class membership probabilities and item-response probabilities can be expressed as a function of a grouping variable, and measurement invariance across groups can be tested empirically. In addition, covariates can be included in the latent class model in order to examine the association between those variables and latent class membership (see [16, 23] for details on multiple-groups LCA and LCA with covariates).
LTA involves three sets of parameters. These included a vector of class membership probabilities at Time 1 (now denoted δ to reflect the dynamic nature of the latent class variable); matrices of transition probabilities, denoted τ, reflecting the incidence of transitions in drug use from Time 1 to Time 2, Time 2 to Time 3, and so on; and a matrix of item-response probabilities within each time, denoted ρ. Note that in LTA the latent classes often are referred to as latent statuses; although we do not adopt this convention we will distinguish dynamic latent variables from static ones in the notation by referring to them as s1 at Time 1, s2 at Time 2, and so on.
It is often useful to constrain each element of the matrix of ρ parameters at Time 1 to be equal to its corresponding element at subsequent times, which has the effect of imposing measurement invariance across time. Measurement invariance over time can be formally tested using a likelihood-ratio test (i.e., nested G2 test).
For T times, the LTA model can be expressed as follows. Using an indicator function I(yj,t = rj,t) that equals 1 when the response to variable j = rj,t at Time t, and equals 0 otherwise, the probability of observing a particular vector of responses is
(2) |
where δs1 is the probability of membership in latent class s1 at Time 1; τs2|s1 is the probability of membership in latent class s2 at Time 2 conditional on membership in latent class s1 at Time 1; and ρt,j,rj,t|st is the probability of response rj,t to item j at Time t, conditional on membership in latent class st at Time t. As in LCA, in LTA the G2 fit statistic is provided, and the relative fit of models with different numbers of latent classes can be assessed using information criteria. It is also important to consider the solution stability and examine the interpretability of the competing solutions when conducting model selection in LTA.
In multiple-groups LTA, the latent class membership probabilities (δ's) and transition probabilities (τ's) can be expressed as a function of the grouping variable. This allows for a comparison between groups in the prevalence of the behavior latent classes and in the incidence of transitions in behavior over time. In addition, the item-response probabilities (ρ's) can be conditioned on the grouping variable so that measurement invariance across groups can be tested. The LTA model presented in Equation 2 can be extended to include a categorical grouping variable V as follows:
(3) |
where δs1|q is the probability of membership in latent class s1 at Time 1 conditional on membership in group q; τs2|s1,q is the probability of membership in latent class s2 at Time 2 conditional on membership in latent class s1 at Time 1 and membership in group q; and ρt,j,rj,t|st,q is the probability of response rj,t to item j at Time t conditional on membership in latent class st at Time t and membership in group q.
One or more covariates can be incorporated in LTA in order to test the association between covariates and Time 1 latent class membership, as well as between covariates and transitions over time. The covariates can be time-invariant, such as gender, or time-varying, such as employment status or exposure to drug use. To include a single, time-invariant covariate X to predict latent class membership and transitions over time, we extend the LTA model presented in Equation 2 as follows:
(4) |
where δs1 (x) = P(L1 = s1|X = x) and τst|st−1 (x) = P(Lt = st|Lt−1 = st−1,X = x) are standard baseline-category multinomial logistic models (e.g., [1]). The ρ parameters are interpreted the same as in Equation 2.
With a single covariate X, δs1 (x) can be expressed as follows:
(5) |
for k = 1,…, S − 1 and reference latent class S.
For the first row of the transition probability matrix, τst|1t−1 (x) can be expressed as:
(6) |
for k = 1,…, S − 1, and reference latent class S. As with a standard baseline-category multinomial logistic model, the choice of reference class is arbitrary and can be specified by the user to provide the desired set of logistic regression coefficients, denoted β. The β parameter estimates can be exponentiated to obtain odds ratios so that the association between covariates and latent class membership (or transitions over time) can be more readily interpreted. The model can be readily extended to include two or more covariates, and to include covariates in a multiple-groups LTA. The reader is referred to [16, 22] for details on LTA, including multiple-groups LTA and LTA with covariates.
Analytic Strategy
In the first analytic step in this study, which was to identify underlying subgroups of women characterized by common patterns of drug use behavior at one time point, we used LCA to model behavior at Time 1. Indicators of drug use latent classes included reported use of tobacco, alcohol, marijuana, CCH, and other drugs. The analysis was replicated using data from Time 2 in order to validate our model selection procedure. At each time point, we took a comprehensive approach to model selection among a set of models with different numbers of latent classes. We examined the G2 fit statistic, several information criteria (AIC, BIC, CAIC, and a-BIC), the entropy R2, the solution stability, and the interpretability of the competing solutions. When the interpretability of the competing solutions was examined, the concepts of latent class separation and homogeneity were considered; both high latent class separation and homogeneity were desired. High separation of the latent classes occurs when each latent class is characterized by its own distinct combination of substances. High homogeneity of the latent classes occurs when each latent class strongly corresponds to a specific set of responses to the five substance use items. The concepts of latent class separation and homogeneity are discussed in depth by Collins and Lanza [16].
The second analytic step in this study was to model transitions in drug use behavior over time. To do this, we extended the selected LCA model to two time points using LTA. Measurement invariance over time was formally tested using a likelihood-ratio test (i.e., nested G2 test). This test involved the comparison of the model with ρ parameters freely estimated at each time point to the model where each element of the matrix of ρ parameters at Time 1 was constrained to be equal to its corresponding element at Time 2.
In order to examine race differences in latent class prevalence and transitions over time, we fit a multiple-groups LTA. We first tested measurement invariance across groups using a likelihood-ratio test, and then compared the estimated drug use latent class prevalences and transition probabilities for African Americans, Whites and Others.
Finally, we examined the association between participation in an alcohol or drug treatment program and initial latent class membership, as well as its association with transitions over time in drug use behavior. An indicator of treatment was incorporated as a covariate in the LTA model. Participation in an alcohol or drug treatment program during the six-month period leading up to Time 1 was used to predict Time 1 latent class membership, and participation in such a program during the year between Time 1 and Time 2 was included as a predictor of transitions in drug use behavior.
All LCA and LTA models were fit using PROC LCA and PROC LTA Version 1.2.5 [24], SAS ® procedures developed for SAS Version 9 for Windows1 . The software is available for download free of charge at http://methodology.psu.edu/. PROC LCA and PROC LTA syntax for the baseline LCA, baseline LTA, multiple-groups LTA, and LTA with a covariate is provided in the Appendix.
Results
The LCA Model
LCA was used to identify drug use latent classes based on reported use of tobacco, alcohol, marijauna, CCH, and other drugs. The main objectives of this stage of the analysis were 1) to discern whether meaningful patterns of drug use behavior could be identified from these data; and 2) if so, to determine how many patterns were required to represent the heterogeneity across individuals in responses to the five measured items, and how the resultant latent classes might be interpreted. At each time point, models with 1 through 5 latent classes were estimated and compared in order to determine the number of classes that optimally balanced model fit and parsimony. To address model identification in this study, we generated 1000 random sets of starting values for LCA models and assessed the percentage of solutions that converged to the maximum likelihood value. Although there is no specific recommendation for the minimal acceptable percent of times the maximum-likelihood solution is identified across different random sets of starting values (i.e., solution stability), a higher percent indicates greater confidence an analyst can have that the maximum-likelihood solution was in fact identified [16]. We chose to report solutions when this occurred for at least ten percent of the starting values, and deemed model identification inadequate otherwise. At both time points, identification of the 5-class model was inadequate.
Indicators of fit for models with 1 to 4 classes are summarized in the top two panels of Table 2. At both times, the different fit indices suggested that the 2- or 3-class model may be optimal. Because we had concern that statistical power in this study was limited, we chose to rely more heavily on the AIC as it is known to favor more complex models [27]. Therefore, we explored the solution to the 3-class model carefully. In addition, when extending an LCA model to repeated measures data using LTA, the additional information from other time points can increase the ability to detect smaller latent classes (i.e., statistical power can improve). In anticipation of this, we also chose to examine the 4-class model for interpretability.
Table 2.
Time 1 LCA | ||||||||
---|---|---|---|---|---|---|---|---|
Classes | df | AIC | BIC | CAIC | a-BIC | Entropy R2 | G2 | Solution % |
1 | 152 | 327.1 | 363.2 | 372.2 | 334.6 | 1.00 | 309.1 | 100.0 |
2 | 142 | 208.7 | 284.9 | 303.9 | 224.7 | 0.63 | 170.7 | 100.0 |
3 | 132 | 194.6 | 311.0 | 340.0 | 219.0 | 0.65 | 136.6 | 91.8 |
4 | 122 | 195.2 | 351.8 | 390.8 | 228.0 | 0.87 | 117.2 | 13.9 |
Time 2 LCA | ||||||||
Classes | df | AIC | BIC | CAIC | a-BIC | Entropy R2 | G2 | Solution % |
1 | 152 | 310.8 | 346.7 | 355.7 | 318.1 | 1.00 | 292.8 | 100.0 |
2 | 142 | 177.2 | 252.9 | 271.9 | 192.6 | 0.65 | 139.2 | 100.0 |
3 | 132 | 170.4 | 285.9 | 314.9 | 193.9 | 0.73 | 112.4 | 63.1 |
4 | 122 | 174.2 | 329.5 | 368.5 | 205.8 | 0.76 | 96.2 | 14.7 |
Time 1 to Time 2 LTA | ||||||||
Classes | df | AIC | BIC | CAIC | a-BIC | Entropy R2 | G2 | Solution % |
2 | 26222 | 1594.4 | 1681.0 | 1702.0 | 1614.3 | — | 1552.4 | 100.0 |
3 | 26208 | 1440.9 | 1585.2 | 1620.2 | 1474.1 | — | 1370.9 | 10.0 |
4 | 26192 | 1358.4 | 1568.8 | 1619.8 | 1406.9 | — | 1256.4 | 25.0 |
5 | 26174 | 1317.3 | 1601.9 | 1670.9 | 1382.9 | — | 1179.3 | 11.0 |
Note: Solution % is the percentage of times the solution was selected out of 1000 random sets of starting values for LCA models, and out of 100 random sets for LTA models. Dashes indicate that the criterion was not calculated for the model.
Parameter estimates for the 3- and 4-class models at Time 1 are shown in Table 3. (Estimates from Time 2 data were similar, although model identification was problematic in the 4-class model.) In the 3-class model, the first class (59.7%) was characterized by high (> .5) probabilities of reporting no use for each of the five drugs. The second class (22.7%) was characterized by high probabilities of reporting moderate tobacco and alcohol use, but no use of CCH or other drugs. The third class (17.6%) was characterized by high probabilities of reporting heavy (pack of cigarettes a day or more) smoking and using CCH once a week or more often, but no use of other drugs. These three classes might be labeled Non-users; Moderate Smokers and Drinkers; and CCH Users. In comparison, the solution for the 4-class model can be interpreted as including the following four groups of women: Non-users (44.4%); Moderate Smokers (29.3%); Moderate Drinkers/Marijuana Users (8.3%); and CCH Users (18.1%). In both models, the CCH Users are most likely to also be heavy smokers. The pattern of item-response probabilities in the 4-class model compared to the 3-class model showed more defined separation of latent classes; for example, there was a clearer distinction between groups of participants who reported smoking versus those who reported drinking and using marijuana. In addition, the 4-class model had stronger overall correspondence between the five items and the latent class variable, which indicated higher homogeneity. Based on the indicators of fit and interpretability of the solution, the 4-class model was tentatively selected as the optimal latent class solution.
Table 3.
3-class Model | 1 | 2 | 3 | ||
---|---|---|---|---|---|
Latent Class Membership Probabilities | 0.597 | 0.227 | 0.176 | ||
Item Response Probabilities |
|||||
Tobacco | No use | 0.514 | 0.366 | 0.067 | |
Less than a pack a day | 0.337 | 0.522 | 0.400 | ||
Pack a day or more | 0.149 | 0.112 | 0.534 | ||
Alcohol | No use | 0.672 | 0.254 | 0.203 | |
Less than 3–4 days a week | 0.263 | 0.746 | 0.374 | ||
3–4 days a week or more | 0.065 | 0.000 | 0.423 | ||
Marijuana | No use | 0.973 | 0.431 | 0.427 | |
Less than once a week | 0.000 | 0.391 | 0.210 | ||
Once a week or more | 0.027 | 0.178 | 0.364 | ||
Crack, Cocaine | No use | 0.965 | 0.744 | 0.173 | |
or Heroin | Less than once a week | 0.015 | 0.189 | 0.280 | |
Once a week or more | 0.020 | 0.068 | 0.547 | ||
Other Drugs | No use | 0.992 | 0.856 | 0.893 | |
Some use | 0.008 | 0.144 | 0.107 | ||
4-class Model | 1 | 2 | 3 | 4 | |
Latent Class Membership Probabilities | 0.444 | 0.293 | 0.083 | 0.181 | |
Item Response Probabilities | |||||
Tobacco | No use | 0.767 | 0.000 | 0.543 | 0.093 |
Less than a pack a day | 0.000 | 1.000 | 0.325 | 0.386 | |
Pack a day or more | 0.233 | 0.000 | 0.132 | 0.521 | |
Alcohol | No use | 0.678 | 0.536 | 0.000 | 0.202 |
Less than 3–4 days a week | 0.287 | 0.362 | 1.000 | 0.422 | |
3–4 days a week or more | 0.035 | 0.102 | 0.000 | 0.377 | |
Marijuana | No use | 0.964 | 0.856 | 0.000 | 0.415 |
Less than once a week | 0.000 | 0.102 | 0.681 | 0.221 | |
Once a week or more | 0.036 | 0.042 | 0.319 | 0.364 | |
Crack, Cocaine | No use | 0.944 | 0.917 | 0.779 | 0.130 |
or Heroin | Less than once a week | 0.029 | 0.066 | 0.221 | 0.280 |
Once a week or more | 0.027 | 0.017 | 0.000 | 0.590 | |
Other Drugs | No use | 0.977 | 0.970 | 0.811 | 0.881 |
Some use | 0.024 | 0.030 | 0.189 | 0.119 |
The LTA Model
The next stage of analysis involved extending the LCA model identified above to a longitudinal setting, where individual drug use behavior was assessed at two time points. LTA was used to estimate the incidence of transitions in drug use latent classes from Time 1 to Time 2. To validate the selection of the 4-class model, latent transition models with 2 to 5 latent classes were estimated and compared. Measurement invariance was imposed across time to ensure that the meaning of the latent classes was held constant.
Model selection information for the candidate LTA models is shown in the third panel of Table 2. The BIC indicated that the 4-class model was optimal, and the AIC suggested a 5-class model. The 5-class LTA model essentially divided the CCH Users latent class into two CCH Users classes that differed in their probability of regular marijuana use (.15 versus .69). Based on the limited information gained by moving to the less stable 5-class solution, we selected the 4-class LTA model for examining transitions over time in drug use.
Before reviewing the parameter estimates, we tested the assumption of measurement invariance across time using a likelihood-ratio test. The LTA with measurement invariance imposed across time (df =26,192, G2 = 1256.4) was compared to the same model without measurement invariance imposed (df =26,156, G2 = 1230.7). The difference in the G2 statistic for these models was not significant (df =36, G2 = 25.7, p = .90), suggesting that measurement invariance did indeed hold across time. Parameter estimates for this LTA model are shown in Table 4.
Table 4.
Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users | ||
---|---|---|---|---|---|
Latent Class Membership Probabilities | |||||
Time 1 | 0.309 | 0.277 | 0.196 | 0.218 | |
Time 2 | 0.309 | 0.268 | 0.184 | 0.239 | |
Item Response Probabilities | |||||
Tobacco | No use | 0.973 | 0.045 | 0.260 | 0.161 |
Less than a pack a day | 0.027 | 0.668 | 0.571 | 0.414 | |
Pack a day or more | 0.000 | 0.287 | 0.168 | 0.426 | |
Alcohol | No use | 0.654 | 0.866 | 0.000 | 0.211 |
Less than 3–4 days a week | 0.323 | 0.116 | 0.863 | 0.454 | |
3–4 days a week or more | 0.024 | 0.018 | 0.137 | 0.335 | |
Marijuana | No use | 0.946 | 0.862 | 0.548 | 0.529 |
Less than once a week | 0.040 | 0.079 | 0.244 | 0.218 | |
Once a week or more | 0.014 | 0.059 | 0.208 | 0.253 | |
Crack, Cocaine | No use | 0.988 | 0.932 | 0.969 | 0.122 |
or Heroin | Less than once a week | 0.012 | 0.056 | 0.031 | 0.271 |
Once a week or more | 0.000 | 0.012 | 0.000 | 0.606 | |
Other Drugs | No use | 0.992 | 0.992 | 0.865 | 0.871 |
Some use | 0.009 | 0.008 | 0.135 | 0.129 | |
Transition Probabilities | |||||
Time 2 Latent Class Membership | |||||
Time 1 Latent Class Membership | Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users | |
Non-users | 0.998 | 0.000 | 0.000 | 0.002 | |
Smokers | 0.000 | 0.900 | 0.000 | 0.100 | |
Moderate Drinkers/Smokers | 0.000 | 0.095 | 0.905 | 0.000 | |
CCH Users | 0.000 | 0.000 | 0.030 | 0.970 |
Note: CCH is crack, cocaine or heroin.
The latent class membership probabilities were essentially identical at the two times. These numbers represent the marginal proportions of women in the drug use behavior classes at each time, but do not provide any insight as to how, if at all, women transition between drug use behavior classes over time. The four latent classes in the LTA model have a very similar interpretation to those identified in the 4-class LCA model. We labeled the classes in the LTA model Non-users (30.9% at each time), Smokers (27.7% at Time 1, 26.8% at Time 2), Moderate Drinkers/Smokers (19.6% at Time 1, 18.4% at Time 2), and CCH Users (21.8% at Time 1, 23.9% at Time 2).
The bottom panel of Table 4 shows the estimated transition probabilities in drug use class membership over time. Note that each row represents a set of probabilities that are conditioned on Time 1 class membership, and thus each row sums to 1 within rounding. An interesting finding is that regardless of Time 1 drug use class membership, no women transitioned into two of the other latent classes at Time 2. Rather, women were very likely to be in the same drug use class at Time 2, and had a small (but often nontrivial) probability of switching to one other drug use class over time. Specifically, nearly all Non-users at Time 1 were expected to be Non-users again at Time 2 (with probability .998), and these women had a .002 probability of transitioning to the CCH Users class. Women in the Smokers latent class at Time 1 had a probability of .900 of being in that class again at Time 2 and a probability of .100 of transitioning to the CCH Users class. Women in the Moderate Drinkers/Smokers class were likely to remain in that class at Time 2 (with probability .905), and had a probability of .095 of transitioning to the Smokers class. Finally, among women in the CCH Users class, the probability of remaining in that heavy-usage class was .970; these women had a probability of .030 of transitioning to the Moderate Drinkers/Smokers class.
In sum, the overall amount of change in drug use behavior between Time 1 and Time 2 was quite small. Not surprisingly, women at each end of the drug use spectrum (i.e., the Non-users and CCH Users) were most stable over time. Women in the Smokers class were at the highest risk of transitioning to the CCH Users class over time, suggesting a possible group of women to target for receiving a program aimed to prevent heavy drug use.
The Multiple-Groups LTA Model
Next, we investigated racial differences in initial drug use and transitions in use over time by incorporating race as a grouping variable. Group differences in the latent class membership probabilities (γ parameters) and transition probabilities (τ parameters) were of primary interest. However, before examining the solution, the assumption of measurement invariance across groups was tested using a likelihood-ratio test. The multiple-groups LTA with measurement invariance imposed across groups (df = 78650, G2 = 1618.4) was compared to the same model without measurement invariance imposed (df = 78578, G2 = 1499.8) to test the plausibility of equating measurement across groups. This test was statistically significant (df = 72, G2 = 118.6, p < .01), suggesting that measurement was at least somewhat different across groups. Because this is a powerful statistical test, however, with 72 df, we examined the measurement model within each group. Upon close inspection, we found that the pattern of item-response probabilities was not different across the groups. In other words, the solution based on item-response probabilities that were freely estimated within each group did not yield qualitatively different classes across races. We also examined the information criteria, which supported the simpler model with equal measurement across groups (e.g., AIC = 1780.4 and BIC = 2114.1 for the model with measurement constrained to be equal across groups; AIC = 1805.8 and BIC = 2436.2 for the model that allowed measurement to vary across groups). Based on this information, we chose to impose measurement invariance in order to increase parsimony and, importantly, ensure that the latent classes have the same meaning within each of the three groups. This facilitates group comparisons in the latent class membership probabilities and the transition probabilities.
These two sets of parameters are presented in Table 5 for the African American, White, and Other race groups. The most striking finding was that African American women were approximately twice as likely to be in the CCH Users class than their White and Other race counterparts. Over 30 percent of African American women were expected to be in this heavy drug use class at both times, compared to between 11 and 19 percent of women in other racial groups. No strong differences across racial groups in their transition probabilities were detected, suggesting that any group differences were established by Time 1.
Table 5.
Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users | ||
---|---|---|---|---|---|
Latent Class Membership Probabilities | |||||
Time 1 | Af. Am. | 0.290 | 0.260 | 0.142 | 0.308 |
White | 0.352 | 0.293 | 0.167 | 0.188 | |
Other | 0.387 | 0.217 | 0.249 | 0.148 | |
Time 2 | Af.Am. | 0.293 | 0.243 | 0.154 | 0.310 |
White | 0.335 | 0.292 | 0.185 | 0.118 | |
Other | 0.407 | 0.214 | 0.212 | 0.166 | |
Transition Probabilities | |||||
Time 2 Latent Class Membership | |||||
Time 1 Latent Class Membership |
Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users | |
Af. Am. | Non-users | 0.979 | 0.000 | 0.000 | 0.022 |
Smokers | 0.034 | 0.854 | 0.000 | 0.113 | |
Moderate Drinkers/Smokers | 0.000 | 0.000 | 1.000 | 0.000 | |
CCH Users | 0.000 | 0.071 | 0.039 | 0.890 | |
White | Non-users | 0.945 | 0.055 | 0.000 | 0.000 |
Smokers | 0.007 | 0.769 | 0.082 | 0.141 | |
Moderate Drinkers/Smokers | 0.000 | 0.183 | 0.817 | 0.000 | |
CCH Users | 0.000 | 0.088 | 0.132 | 0.779 | |
Other | Non-users | 1.000 | 0.000 | 0.000 | 0.000 |
Smokers | 0.000 | 0.908 | 0.000 | 0.092 | |
Moderate Drinkers/Smokers | 0.000 | 0.071 | 0.854 | 0.075 | |
CCH Users | 0.137 | 0.000 | 0.000 | 0.863 |
Note: CCH is crack, cocaine or heroin and Af. Am. is African American.
The LTA Model with a Covariate
Finally, we incorporated women's reported participation in alcohol or drug treatment as a covariate to predict Time 1 drug use and transitions in use over time. Before attempting to predict transitions in drug use behavior, it was important to consider estimation issues that were likely to occur due to τ parameters being estimated on the boundary of the parameter space. Specifically, nine transition probabilities were esitmated to be approximately equal to zero (i.e., ≤ .01). We modified the LTA model slightly to fix these nine parameters to be exactly zero (and thus this model estimated nine fewer parameters). This restricted model had 26,201 df with a G2 = 1256.5. A likelihood-ratio test was conducted to determine whether restricting the very small transitions to zero affected the overall model fit. The result was a difference G2 of 0.1 with 9 df (p > .99), providing support that these nine transition probabilities could be fixed to the value zero without impacting model fit. Fixing these values essentially eliminated nine parameters from the statistical model, thereby considerably reducing the complexity of the model involving a predictor of transitions in drug use behavior.
As described above, alcohol or drug use treatment reported in the six-month period leading up to Time 1 was included as a predictor of initial drug use class membership. In addition, alcohol or drug use treatment reported in the one-year period between Time 1 and Time 2 was included as a predictor of transitions in drug use behavior during that time period.
Reported participation in an alcohol or drug treatment program in the six months preceding the Time 1 assessment was a significant predictor of latent class membership at Time 1 (p < .0001). Table 6 shows the logistic regression coefficients and odds ratios corresponding to the association between alcohol/drug treatment and Time 1 latent class membership. Non-users were specified as the reference group; therefore, the coefficients represent the change in log-odds of membership in each of the other three drug use latent classes compared to membership in the Non-users latent class corresponding to reported participation in alcohol/drug treatment. Compared to women who reported no alcohol/drug treatment, women who reported recent treatment were roughly 10 times more likely to be in the Smokers or CCH Users class relative to the Non-users class. Women who reported treatment were not at increased odds of membership in the Moderate Drinkers/Smokers class relative to the Non-users class (OR = 0.76).
Table 6.
β | Odds Ratio | |
---|---|---|
Non-users | — | 1.00 |
Smokers | 2.30 | 9.93 |
Moderate Drinkers/Smokers | −0.28 | 0.76 |
CCH Users | 2.32 | 10.18 |
Note: CCH is crack, cocaine or heroin. Dash indicates the reference latent class.
Table 7 shows, conditional on Time 1 drug use, the logistic regression coefficients corresponding to the change in log-odds of membership in drug use class at Time 2 to a particular latent class relative to staying in the same latent class corresponding to reported participation in alcohol/drug treatment. The overall association between reported treatment and drug use transitions was statistically significant (df = 3, G2 = 599.5, p < .0001). Because essentially all women in the Non-users class at Time 1 were in that class again at Time 2, the effect of treatment on transitions was not estimated for Non-users. However, as Table 4 showed, 10.0% of women in the Smokers class were expected to transition to the CCH Users class, 9.5% of women in the Moderate Drinkers/Smokers were expected to transition to the Smokers class, and 3.0% of women in the CCH Users class were expected to transition to the Moderate Drinkers/Smokers class. The test of association between reported treatment and drug use reflects a joint test of significance of the effect of treatment on these three transitions.
Table 7.
β | ||||
---|---|---|---|---|
Time 2 Latent Class Membership | ||||
Time 1 Latent Class Membership |
Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users |
Non-users | 1.00a | 1.00a | 1.00a | 1.00a |
Smokers | 0.00b | — | 0.00b | 0.81 |
Moderate Drinkers/Smokers | 0.00b | 1.76 | — | 0.00b |
CCH Users | 0.00b | 0.00b | −0.77 | — |
Odds Ratio | ||||
Time 2 Latent Class Membership | ||||
Time 1 Latent Class Membership |
Non-users | Smokers | Moderate Drinkers/ Smokers |
CCH Users |
Non-users | 1.00a | 1.00a | 1.00a | 1.00a |
Smokers | 1.00b | — | 1.00b | 2.25 |
Moderate Drinkers/Smokers | 1.00b | 5.84 | — | 1.00b |
CCH Users | 1.00b | 1.00b | 0.46 | — |
Note: CCH is crack, cocaine or heroin. Dashes indicate the reference latent class.
The multinomial logistic regression was skipped for this row of the transition maxtrix.
This transition probability was constrained to be equal to zero.
Among Time 1 Smokers, reported alcohol/drug treatment in the year between Time 1 and Time 2 corresponded to an odds ratio of 2.25 for this transition. In other words, among Time 1 Smokers, women who reported treatment were 2.25 times more likely to transition to the CCH Users class relative to remaining in the Smokers class compared to women who reported no treatment. Among Time 1 Moderate Drinkers/Smokers, women who reported receiving alcohol/drug treatment in the following year were nearly six times more likely to be in the Smokers class, relative to the Moderate Drinkers/Smokers class, at Time 2 compared to women who reported no treatment. Finally, among Time 1 CCH Users, women who reported receiving treatment were half as likely to transition to the Moderate Drinkers/Smokers class relative to remaining in the CCH Users class compared to women who reported no treatment.
Discussion
Modeling drug use behavior as a categorical latent variable provided a nuanced picture of this complex behavior and a more holistic view of drug use in this population. Nearly one-third of the women belonged to the Non-users latent class at each time, and drug use behavior was remarkably stable in this population. Women in the Smokers latent class look quite similar to the Non-users on all aspects of behavior except for engaging in moderate tobacco use. However, these women are at risk for transitioning to the more problematic CCH Users class over time. Although the Moderate Drinkers/Smokers latent class was characterized by use of two substances (moderate use of tobacco and alcohol), these women were not at risk for transitioning to the CCH Users latent class. Rather, those who did transition to a different drug use latent class moved to the Smokers class; in other words, they reduced their moderate drinking. The CCH Users latent class comprised women who were likely to be using CCH at least weekly, in conjunction with moderate to heavy use of tobacco and alcohol. Unfortunately, these women were quite stable in behavior over time, with only three percent transitioning to a less problematic drug use latent class.
It can be useful to consider drug use latent classes that were not identified in this sample. For example, no class emerged that was characterized by high rates of marijuana use or other drug use. Interestingly, occasional use of CCH was not endorsed widely by women in any drug use class. Instead, three latent classes consisted of non-users and one latent class of regular users (once a week or more) emerged. This fact, along with the stability over time among women in the CCH Users latent class, is indicative of the highly addictive nature of these drugs.
Not surprisingly, participation in an alcohol or drug treatment program was associated with drug use behavior at Time 1. Specifically, at Time 1, women who participated in treatment in the six months prior to the Time 1 assessment were about ten times more likely to be Smokers or CCH Users relative to Non-users compared to women who did not participate in treatment. Participation in treatment did not, however, increase odds of membership in the Moderate Drinkers/Smokers latent class relative to the Non-users latent class. Although causality cannot be inferred from this analysis, these findings suggest the possibility that CCH Users who pursued treatment were not successful, but that treatment among women in the Moderate Drinkers/Smokers latent class was more succesful such that they quit drinking but continued to smoke. Such hypotheses require a longitudinal analysis to unpack these associations over time.
Indeed we found (see Table 7) that women in the Moderate Drinkers/Smokers latent class at Time 1 who pursued treatment were nearly six times more likely to transition to the Smokers class relative to remaining in the same class over time. In contrast, women in the CCH Users class who participated in treatment were most likely to continue to use CCH.
Challenges in LCA and LTA Modeling
As in all mixture models, model selection in LCA and LTA can pose a significant challenge to the analyst. One technical reason that model selection can be so difficult is the fact that little is known about statistical power in LCA and LTA, and the power to detect less common latent classes will change as repeated measures are included (i.e. moving from LCA to LTA). Another technical reason for this challenge is that it is increasingly difficult to identify the maximum likelihood solution as models become more complex (i.e. model identification becomes worse with the specification of additional latent classes.) A more conceptual challenge can arise when conducting model selection in LTA, where an identical set of latent classes may not emerge at each time point. We chose to explore the latent class structure at each time point first using LCA, and then extend the model to longitudinal data using LTA. This allowed us to become familiar with the drug use patterns that emerged at each time point and ensure that these patterns were represented in the final LTA model. Another approach would be to first conduct model selection using LTA, and then fit LCA models at each time point to verify that the drug use patterns identified in LTA were representative of behavior at each time point. Both of these approaches achieve the objective of accommodating heterogeneity in behavior that exists at each time point.
One recent study [31] shows that under certain conditions both the BIC and a parametric bootstrap of the likelihood ratio test (BLRT) [28] perform well in terms of selection of the number of latent classes. Although the computationally intensive procedure for the BLRT is not available in an automated form in the public version of PROC LCA and PROC LTA at this time, we did conduct analyses to obtain BLRT p-values for the set of LCA models considered in this study. Results supported the selection of the 4-class model at both Time 1 and Time 2. While reassuring that our model selection procedure led us to the same conclusion, this finding suggests that the BIC was severely underpowered for the current study. Had we relied solely on the BIC for model selection, we would have selected an overly simplistic 2-class model.
Recoding variables to be used as indicators in LCA and LTA can at times be somewhat subjective or data-driven. Whenever possible, it can be helpful to ground such decisions in theory or to choose coding schemes that carefully match the scientific questions at hand. In this study we were interested in examining the association between participation in alcohol or drug treatment program and drug use behavior. Therefore, we paid particular attention to the fact that participants were referred to an alcohol or drug abuse counselor if their reported use was at or above a certain level. For alcohol use the cutoff was “3–4 days a week or more” (response category 3) and for marijuana, CCH, and other drugs the cutoff was for any reported use (response category 2). Based on this information, and the fact that we wanted to be as consistent across substances as possible, we opted to create three-level variables whenever possible (the usage rate of other drugs was too low to permit distinguishing between levels of use on this item).
As with any longitudinal analysis, in LTA the choice of temporal timing of covariates has important implications for interpretation of the results. In this study, we included participation in an alcohol or drug treatment program in the past six months as a predictor of Time 1 drug use behavior, and participation between Time 1 and Time 2 as a predictor of transitions in drug use behavior. A logical alternative to coding the predictor would have been to use reported participation in an alcohol or drug treatment program prior to Time 1 drug use as a predictor for both initial drug use and transitions over time in this behavior. This would avoid the issue that we could not determine whether treatment during that year preceded or followed a change in drug use during that same year. However, by using participation in a treatment program between Time 1 and Time 2 as a predictor for the transitions we were able to model the proximal association between participation and changes in drug use.
Strengths of the Approach and Future Directions
Broadly speaking, there are many other statistical models related to LCA and LTA that also fall under the definition of finite mixture models [17, 28, 35]. For example, one recent study extended the LTA model to examine the co-occurrence over time of discrete change in two processes [7]. In this study, transitions in adolescent sexual risk behavior latent classes and transitions in alcohol use behavior latent classes were jointly modeled. This technique, called associative latent transition analysis [18], can be thought of as a discrete-outcome analogue to the parallel latent growth model [38]. Another approach, which has become quite popular in recent years for identifying underlying population subgroups based on smooth trajectories of change over time, is a method for identifying latent classes defined by growth curves [29, 30]. Numerous studies have employed this methodology to examine trajectories of substance use (e.g., [10, 14, 15]).
Recent advances in statistical techniques for modeling the etiology of substance use have enabled scientists to address new questions in their research. LCA and LTA play an important role in this field. Because substance use scientists have become more familiar with the advantages of these techniques, and because user-friendly software such as PROC LCA and PROC LTA are now available, these methods are being adopted widely. LCA and LTA, along with other related statistical approaches, offer scientists a substantially different lens through which to view their data than do more traditional approaches such as multivariate regression. These different approaches can offer complementary insight into complex behaviors such as drug use, thereby allowing the field to continue to advance. Future research on LCA, LTA and related methods is called for to help scientists better understand issues such as statistical power, model selection, and the utility of mixture regression models. As this methodological work proceeds, mixture models will continue to move into the forefront of substance use research.
Acknowledgments
This research was supported by National Institute on Drug Abuse grant P50 DA 10075. Data in this manuscript were collected by the Women's Interagency HIV Study (WIHS) Collaborative Study Group with centers (Principal Investigators) at New York City/Bronx Consortium (Kathryn Anastos); Brooklyn, NY (Howard Minkoff); Washington, DC, Metropolitan Consortium (Mary Young); The Connie Wofsy Study Consortium of Northern California (Ruth Greenblatt); Los Angeles County/Southern California Consortium (Alexandra Levine); Chicago Consortium (Mardge Cohen); Data Coordinating Center (Stephen Grange). The WIHS is funded by the National Institute of Allergy and Infectious Diseases (U01 AI 35004, U01 AI 31834, U01 AI 34994, U01 AI 34989, U01 AI 34993, and U01 AI 42590) and by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (U01 HD 32632). The study is co-funded by the National Cancer Institute, the National Institute on Drug Abuse, and the National Institute on Deafness and Other Communication Disorders. Funding is also provided by the National Center for Research Resources (UCSF-CTSI Grant Number UL1 RR 024131). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.
Appendix: PROC LCA and PROC LTA Syntax
I. Syntax for the baseline LCA
PROC LCA DATA=wihsdata.sidata; TITLE1 ’Baseline LCA - 4 classes - 5 indicators’; TITLE2 ’Time 1 (Visit 2)’; NCLASS 4; ITEMS tob_2 alc_2 mar_2 cch_2 oth_2; CATEGORIES 3 3 3 3 2; CORES 2; SEED 3608; NSTARTS 1000; run;
II. Syntax for the baseline LTA
PROC LTA DATA=wihsdata.sidata START=LTAst; TITLE1 ’Baseline LTA - 4 classes - 5 indicators’; TITLE2 ’Time 1 to Time 2 (Visit 2 to Visit 4)’; TITLE3 ’Starting Values, Measurement Invariance’; NSTATUS 4; NTIMES 2; ITEMS tob_2 alc_2 mar_2 cch_2 oth_2 tob_4 alc_4 mar_4 cch_4 oth_4; CATEGORIES 3 3 3 3 2; CORES 2; MEASUREMENT times; run;
III. Syntax for LTA with a grouping variable
PROC LTA DATA=wihsdata.sidata START=LTAstartgrp; TITLE1 ’LTA w/ Race Grouping Variable - 4 classes - 5 indicators’; TITLE2 ’Time 1 to Time 2 (Visit 2 to Visit 4)’; TITLE3 ’Starting Values, Measurement Invariance’; NSTATUS 4; NTIMES 2; ITEMS tob_2 alc_2 mar_2 cch_2 oth_2 tob_4 alc_4 mar_4 cch_4 oth_4; CATEGORIES 3 3 3 3 2; GROUPS racetri; GROUPNAMES Black White Other; CORES 2; MEASUREMENT groups times; run;
IV. Syntax for LTA with a covariate predicting Time 1 latent class membership and Time 1 to Time 2 transitions
PROC LTA DATA=wihsdata.sidata START=LTAstartcov RESTRICT=LTArestrictcov; TITLE1 ’LTA w/ Alcohol/Drug Txt as a Covariate - 4 classes - 5 indicators’; TITLE2 ’Time 1 to Time 2 (Visit 2 to Visit 4)’; TITLE3 ’Starting Values, Restrictions, Time 1 & Transition Covariate’; NSTATUS 4; NTIMES 2; ITEMS tob_2 alc_2 mar_2 cch_2 oth_2 tob_4 alc_4 mar_4 cch_4 oth_4; CATEGORIES 3 3 3 3 2; COVARIATES1 treat_2; COVARIATES2 treat_4; REFERENCE1 1; REFERENCE2 0 2 3 4; CORES 2; BETA PRIOR = 1; run;
V. Syntax for the starting values and restrictions used in the LTA with a covariate
data LTAstartcov; input PARAM $ GROUP VARIABLE $ TIME STATUS RESPCAT ESTLS1 ESTLS2 ESTLS3 ESTLS4; cards; DELTA 1 . 1 . . .3 .3 .2 .2 TAU 1 . 1 1 . 1 0 0 0 TAU 1 . 1 2 . 0 .9 0 .1 TAU 1 . 1 3 . 0 .1 .9 0 TAU 1 . 1 4 . 0 0 .1 .9 RHO 1 tob_2 1 . 1 .8 .1 .3 .2 RHO 1 alc_2 1 . 1 .6 .8 .2 .2 RHO 1 mar_2 1 . 1 .8 .8 .6 .6 RHO 1 cch_2 1 . 1 .8 .8 .8 .1 RHO 1 oth_2 1 . 1 .9 .9 .8 .8 RHO 1 tob_2 1 . 2 .1 .6 .6 .4 RHO 1 alc_2 1 . 2 .3 .1 .7 .4 RHO 1 mar_2 1 . 2 .1 .1 .2 .2 RHO 1 cch_2 1 . 2 .1 .1 .1 .3 RHO 1 oth_2 1 . 2 .1 .1 .2 .2 RHO 1 tob_2 1 . 3 .1 .3 .1 .4 RHO 1 alc_2 1 . 3 .1 .1 .1 .4 RHO 1 mar_2 1 . 3 .1 .1 .2 .2 RHO 1 cch_2 1 . 3 .1 .1 .1 .6 RHO 1 tob_4 2 . 1 .8 .1 .3 .2 RHO 1 alc_4 2 . 1 .6 .8 .2 .2 RHO 1 mar_4 2 . 1 .8 .8 .6 .6 RHO 1 cch_4 2 . 1 .8 .8 .8 .1 RHO 1 oth_4 2 . 1 .9 .9 .8 .8 RHO 1 tob_4 2 . 2 .1 .6 .6 .4 RHO 1 alc_4 2 . 2 .3 .1 .7 .4 RHO 1 mar_4 2 . 2 .1 .1 .2 .2 RHO 1 cch_4 2 . 2 .1 .1 .1 .3 RHO 1 oth_4 2 . 2 .1 .1 .2 .2 RHO 1 tob_4 2 . 3 .1 .3 .1 .4 RHO 1 alc_4 2 . 3 .1 .1 .1 .4 RHO 1 mar_4 2 . 3 .1 .1 .2 .2 RHO 1 cch_4 2 . 3 .1 .1 .1 .6 ; run; data LTArestrictcov; input PARAM $ GROUP VARIABLE $ TIME STATUS RESPCAT ESTLS1 ESTLS2 ESTLS3 ESTLS4; cards; DELTA 1 . 1 . . 1 1 1 1 TAU 1 . 1 1 . 1 0 0 0 TAU 1 . 1 2 . 0 1 0 1 TAU 1 . 1 3 . 0 1 1 0 TAU 1 . 1 4 . 0 0 1 1 RHO 1 tob_2 1 . 1 2 16 30 44 RHO 1 alc_2 1 . 1 3 17 31 45 RHO 1 mar_2 1 . 1 4 18 32 46 RHO 1 cch_2 1 . 1 5 19 33 47 RHO 1 oth_2 1 . 1 6 20 34 48 RHO 1 tob_2 1 . 2 7 21 35 49 RHO 1 alc_2 1 . 2 8 22 36 50 RHO 1 mar_2 1 . 2 9 23 37 51 RHO 1 cch_2 1 . 2 10 24 38 52 RHO 1 oth_2 1 . 2 11 25 39 53 RHO 1 tob_2 1 . 3 12 26 40 54 RHO 1 alc_2 1 . 3 13 27 41 55 RHO 1 mar_2 1 . 3 14 28 42 56 RHO 1 cch_2 1 . 3 15 29 43 57 RHO 1 tob_2 2 . 1 2 16 30 44 RHO 1 alc_2 2 . 1 3 17 31 45 RHO 1 mar_2 2 . 1 4 18 32 46 RHO 1 cch_2 2 . 1 5 19 33 47 RHO 1 oth_2 2 . 1 6 20 34 48 RHO 1 tob_2 2 . 2 7 21 35 49 RHO 1 alc_2 2 . 2 8 22 36 50 RHO 1 mar_2 2 . 2 9 23 37 51 RHO 1 cch_2 2 . 2 10 24 38 52 RHO 1 oth_2 2 . 2 11 25 39 53 RHO 1 tob_2 2 . 3 12 26 40 54 RHO 1 alc_2 2 . 3 13 27 41 55 RHO 1 mar_2 2 . 3 14 28 42 56 RHO 1 cch_2 2 . 3 15 29 43 57 ; run;
Footnotes
Copyright 2002–2003 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.
Contributor Information
Stephanie T. Lanza, The Methodology Center, The Pennsylvania State University
Bethany C. Bray, Department of Psychology, Virginia Polytechnic Institute and State University
References
- 1.Agresti A. An Introduction to Categorical Data Analysis. Hoboken, NJ: John Wiley & Sons, Inc.; 2007. [Google Scholar]
- 2.Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19:716–723. [Google Scholar]
- 3.Auerbach KJ, Collins LM. A multidimensional developmental model of alcohol use during emerging adulthood. Journal of Studies on Alcohol. 2006;67(6):917–925. doi: 10.15288/jsa.2006.67.917. [DOI] [PubMed] [Google Scholar]
- 4.Barkan SE, Melnick SL, Preston-Martin S, Weber K, Kalish LA, Miotti P, Young M, Greenblatt R, Sacks H, Feldman J. The Women's Interagency HIV Study. Epidemiology. 1998;9(2):11725. [PubMed] [Google Scholar]
- 5.Bollen KA, Curran PJ. Latent Curve Models: A Structural Equation Perspective. Hoboken, NJ: Wiley; 2006. [Google Scholar]
- 6.Bozdogan H. Model selection and akaike information criterion (AIC): The general theory and its analytical extensions. Psychometrika. 1987;52:345–370. [Google Scholar]
- 7.Bray BC, Lanza ST, Collins LM. Modeling relations among discrete developmental processes: A general approach to associative latent transition analysis. Structural Equation Modeling. doi: 10.1080/10705511.2010.510043. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bureau of Labor Statistics, U.S. Department of Labor. National Longitudinal Survey of Youth 1997 cohort, 1997–2006 (rounds 1–9) [Data file]. Produced by the National Opinion Research Center, the University of Chicago and distributed by the Center for Human Resource Research. Columbus, OH: The Ohio State University; 2007. [Google Scholar]
- 9.Celeux G, Soromenho G. An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification. 1996;13:195–212. [Google Scholar]
- 10.Chassin L, Pitts SC, Prost J. Binge drinking trajectories from adolescence to emerging adulthood in a high-risk sample: Predictors and substance abuse outcomes. Journal of Consulting and Clinical Psychology. 2002;70:67–78. [PubMed] [Google Scholar]
- 11.Chung H, Flaherty BP, Schafer JL. Latent class logistic regression: Application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society, Series A. 2006;169(Part 4):723–743. [Google Scholar]
- 12.Chung H, Lanza ST, Loken E. Latent transition analysis: Inference and estimation. Statistics in Medicine. 2008;27(11):1834–1854. doi: 10.1002/sim.3130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chung H, Park Y, Lanza ST. Latent transition analysis with covariates: Pubertal timing and substance use behaviors in adolescent females. Statistics in Medicine. 2005;24:2895–2910. doi: 10.1002/sim.2148. [DOI] [PubMed] [Google Scholar]
- 14.Chung T, Maisto SA, Cornelius JR, Marti CS. Adolescents' alcohol and drug use trajectories in the year following treatment. Journal of Studies on Alcohol. 2004;65:105–114. doi: 10.15288/jsa.2004.65.105. [DOI] [PubMed] [Google Scholar]
- 15.Colder CR, Campbell RT, Ruel E, Richardson JL, Flay BR. A finite mixture model of growth trajectories of adolescent alcohol use: Predictors and consequences. Journal of Consulting and Clinical Psychology. 2002;70:976–985. doi: 10.1037//0022-006x.70.4.976. [DOI] [PubMed] [Google Scholar]
- 16.Collins LM, Lanza ST. Latent class and latent transition analysis: With applications in the social, behavioral and health sciences. New York: Wiley; 2010. [Google Scholar]
- 17.Everitt BS, Hand DJ. Finite Mixture Distributions. London: Chapman and Hall; 1981. [Google Scholar]
- 18.Flaherty BP. Testing the degree of cross-sectional and longitudinal dependence between two discrete dynamic processes. Developmental Psychology. 2008;44(2):468–480. doi: 10.1037/0012-1649.44.2.468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika. 1974;61:215–231. [Google Scholar]
- 20.Graham JW, Collins LM, Wugalter SE, Chung NK, Hansen WB. Modeling transitions in latent stage-sequential processes: A substance use prevention example. Journal of Consulting and Clinical Psychology. 1991;59:48–57. doi: 10.1037//0022-006x.59.1.48. [DOI] [PubMed] [Google Scholar]
- 21.Lanza ST, Collins LM. Pubertal timing and the onset of substance use in females during early adolescence. Prevention Science. 2002;3(1):69–82. doi: 10.1023/a:1014675410947. [DOI] [PubMed] [Google Scholar]
- 22.Lanza ST, Collins LM. A new SAS procedure for latent transition analysis: The development of dating and sexual risk behavior. Developmental Psychology. 2008;44(2):446–456. doi: 10.1037/0012-1649.44.2.446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lanza ST, Collins LM, Lemmon DR, Schafer JL. PROC LCA: A SAS procedure for latent class analysis. Structural Equation Modeling. 2007;14(4):671–694. doi: 10.1080/10705510701575602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lanza ST, Lemmon D, Dziak JJ, Huang L, Schafer JL, Collins LM. PROC LCA & PROC LTA User's Guide, Version 1.2.5. Penn State, University Park, PA: The Methodology Center; 2010. [Google Scholar]
- 25.Lanza ST, Savage JS, Birch LL. Identification and prediction of latent classes of weight-loss strategies among women. Obesity. 2010;18:833–840. doi: 10.1038/oby.2009.275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lazarsfeld PF, Henry NW. Latent Structure Analysis. Boston, MA: Houghton Mifflin; 1968. [Google Scholar]
- 27.Lin TH, Dayton CM. Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics. 1997;22:249–264. [Google Scholar]
- 28.McLachlan G, Peel D. Finite Mixture Models. New York, NY: John Wiley and Sons Inc.; 2000. [Google Scholar]
- 29.Muthén BO, Sheddon K. Finite mixture modeling with mitxure outcomes using the EM algorith. Biometrics. 1999;55:463–469. doi: 10.1111/j.0006-341x.1999.00463.x. [DOI] [PubMed] [Google Scholar]
- 30.Nagin DS. Group-based Modeling of Development. Cambridge, MA: Harvard University Press; 2005. [Google Scholar]
- 31.Nylund KL, Asparouhov T, Muthn BO. Deciding on the number of classes in latent class analysis and growth mixture modeling: A monte carlo simulation study. Structural Equation Modelling. 2007;14:535569. [Google Scholar]
- 32.Reboussin BA, Song E-Y, Shrestha A, Lohman KK, Wolfson M. A latent class analysis of underage problem drinking: Evidence from a community sample of 16–20 year olds. Drug and Alcohol Dependence. 2006;83(3):199–209. doi: 10.1016/j.drugalcdep.2005.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schwartz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
- 34.Sclove SL. Application of model-selection criteria to some problems in multivariate analysis. Psychmetrika. 1987;52:333–343. [Google Scholar]
- 35.Titterington D, Smith A, Makov U. Statistical Analysis of Finite Mixture Distributions. Chichester: Wiley; 1985. [Google Scholar]
- 36.Udry JR. The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002. Chapel Hill, NC: 2003. [Google Scholar]
- 37.Velicer WF, Martin RA, Collins LM. Latent transition analysis for longitudinal data. Addiction. 1996;91:S197–S209. [PubMed] [Google Scholar]
- 38.Wang J. Gauging growth trajectories of two related outcomes simultaneously: An application of parallel latent growth modeling. Advances and Applications in Statistical Sciences. 2009;1:1–23. [Google Scholar]