Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Aug 1.
Published in final edited form as: Exp Clin Psychopharmacol. 2015 Aug;23(4):291–301. doi: 10.1037/pha0000029

Sex differences in the latent class structure of alcohol use disorder: Does (dis)aggregation of indicators matter?

Emilie M Shireman 1, Douglas Steinley 1, Kenneth Sher 1
PMCID: PMC4546808  NIHMSID: NIHMS715250  PMID: 26237327

Introduction

Among the research and practice communities there are disagreements as to how many diagnostic categories are needed to fully characterize the true nature of some psychological disorders, or whether a dimensional structure is a more valid representation of the data (Hasin et al, 2013). This is an important consideration in the determination of the public health burden of a disorder (Agrawal, Heath, & Lynskey, 2011). Alcohol Use Disorder (AUD), specifically, has moved from a three-categorical structure in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (no diagnosis, alcohol abuse, and alcohol dependence; DSM-IV) to a four-categorical, severity-graded structure in the DSM-Fifth Edition (no diagnosis, mild, moderate, and severe AUD; DSM-5, American Psychiatric Association, 2013).

Further complicating matters, population subgroups can have alternative presentations of AUD. Many researchers have noted sex differences in the presentation of AUD, in addition to differential social consequences, comorbidity, and biological course, resulting in difficulties in AUD assessment (Babor et al, 1992; Brady & Randall, 1999; Bucholz et al, 1996; Saha, Chou, & Grant, 2006; Winokur, Rimmer, & Reich, 1971; Schuckit et al, 1969). Nevertheless, researchers attempting to model the diagnostic structure of AUD frequently combine men and women and model diagnostic groups with latent class analysis (LCA). There have been several class structures represented in the literature (see Table 1). Some studies support the DSM-5 severity-graded class structure (Beseler et al, 2012; Chung et al, 2001; Sacco et al, 2009), others support a 4-6 class structure of groups which differ in the symptoms or criteria they tend to endorse (Kendler et al, 1998; Moss et al, 2007; Rist et al, 2009; Smith & Shevlin, 2008), while other research indicates that AUD lies on a latent dimension of severity (Heath et al, 1994). Very few studies include parallel analyses for men and women (Bucholz et al, 1996), and some examine only men (Heath et al, 1994; Kendler et al, 1998).

Table 1.

LCAs of AUD

Study Year # Classes Input Variables Sample Gender
Bucholz et al 1996 4 27 Symptoms of AUD Relatives of
alcoholics
Separated
Beseler et al 2012 3 AUD diagnostic criteria Undergraduates Combined
Chassin et al 2004 3 Alcohol and drug use COAs1, matched
non-COAs
Combined
Chung et al 2001 3 AUD diagnostic criteria Adolescent
clinical
Combined
Heath et al 1994 5 Symptoms of alcoholism Twins, Australia Men Only
Kendler et al 1998 5 Reasons for temperance
board registration, age at
onset
Twins, clinical Men Only
Moss et al 2007 5 AUD diagnostic criteria
and lifetime psychiatric
diagnoses
Clinical Combined
Reboussin et al 2006 2 Drinking behaviors
and alcohol-related
problems
Underage,
community
Combined
Rist et al 2009 4 Alcohol consumption Clinical Combined
Sacco et al 2009 3 Alcohol consumption and
AUD diagnostic indicators
Older individuals,
NESARC2
Combined
Smith et al 2008 6 Alcohol consumption and
related problems
Population
representative,
UK
Combined
Whitesell et al 2006 4(lifetime)
3(past year)
Alcohol and drug use Native American
and Population-
Representative
Combined
1

Children of Alcoholics

2

National Epidemiologic Survey of Alcohol and Related Conditions Table 2

The LCAs of AUD conducted to date have also varied in that they use different sets of observed indicators of the latent classes. In some studies, symptom items, meant to describe the criteria in more layman’s terms, are used as input variables (Bucholz et al, 1996; Heath et al, 1994), others use an aggregated distillation of the symptoms into each of the diagnostic criteria (Beseler et al, 2012; Chung et al, 2001; Moss et al, 2007; Sacco et al, 2009), while other studies have used more inclusive sets of indicators including (noncriteria) drinking consequences, and various measures of alcohol consumption (Chassin et al, 2004; Smith et al, 2008; Whitesell et al, 2006). Considering the sensitivity of LCA to the number and quality of indicators (Nylund, Asparouhov, & Muthen, 2007; Raftery & Dean, 2006; Swanson, Lindenberg, Bauer & Crosby, 2012), differential data structure may be partially explaining differences in class structure of AUD reported in the literature. To clarify these issues, this paper examines (1) the latent class structure of AUD, (2) sex differences in this structure, and (3) and aggregating data from item-level symptoms to diagnostic criteria impacts the analysis, and whether this differs between men and women.

Reduction of the symptom items into criteria has been largely regarded as an arbitrary decision. Although the third column in Table 1 demonstrates that class structure of AUD has varied based on whether symptoms or criteria were used this cannot be determined to be the sole cause for differential results, as these studies employed different samples (some combining men and women together), as well as inclusion of covariates and other predictors. In fact, other analyses have shown concordant results between symptom and criteria data. Replication of a structural equation modeling technique with symptom items has been examined by Harford and Muthen (2000). However, a direct comparison of LCAs with symptom and criteria items has not been undertaken, to our knowledge. As was discussed above, LCA is a technique particularly susceptible to the number and quality of inputted indicators, so it is likely that class structure is significantly impacted by data aggregation. Separating the effect of data aggregation from the effect of gender differences is impossible by just examination of the literature, this paper will conduct parallel analyses, examining the effect of sample and data structures separately.

Latent Class Analysis

Latent class analysis (LCA) is special case of the finite mixture model and a common tool for characterizing a latent construct measured by categorical indicators. LCA models a number of unobserved subpopulations to identify categories of a construct (also referred to as groups or classes), with the goal of fully explaining all the variance in the data. LCA operates by iteratively finding maximum likelihood parameter estimates of the population proportion and the “mean profile” for each cluster (i.e., the probability of endorsing each indicator). Classes in an LCA are assumed to be internally homogeneous, meaning every individual within a class has an identical set of item endorsement probabilities. Classes are also assumed to be locally independent, meaning the indicators do not covary conditional on class membership (for more detail on the computation of LCAs, see Bartholomew, Knott, & Moustaki, 2011). Researchers have examined plots of the mean profiles to determine whether a construct exhibits severity-distinguished structure, combined with incrementally better-fitting solutions as the number of classes increases. Such behavior has been found in LCAs of AUD, and some have thus argued for a dimensional conceptualization of the disorder (Morey, Skinner & Blashfeld, 1984; Bucholz et al, 1996). The following analyses will help determine if these conclusions were partly based on the sex differences and data aggregation described in the following section.

Data Aggregation

LCA has been shown to provide more accurate results when more indicators are included in the analysis (Albert & Dodd, 2004; Brusco, 2004; Brusco & Cradit, 2001; Milligan, 1989; Yang, 2006). Increasing the number of indicators improves the results because the multivariate distribution of the cluster structure becomes clearer, making the clusters more separated. When symptom data are aggregated to form diagnostic criteria, this removes a large number of potentially informative indicators of the latent structure of the construct (e.g., AUD). Thus, a person centered technique like LCA may be optimally suited with a larger number of indicators without data aggregation.

However, additional indicators will diminish performance of the LCA when those indicators are non-informative with regard to the individuals’ group membership (Brusco, 2004; Raftery & Dean, 2006; Swanson, Lindenberg, Bauer, & Crosby, 2012). These “masking” variables (Fowlkes & Mallows, 1986), can obscure class structure that would otherwise be found by introducing noise in the multivariate distribution of the data.

Using LCA as an exploratory technique (i.e., when the number of clusters and their structure is unknown) means that it is impossible to definitively determine whether the inclusion of variables is helping or hurting the accuracy of the solution. We can, however, examine the results between analyses using more or fewer indicators (disaggregated versus aggregated data) for differences, to decide what substantive information is lost or gained between the two approaches. If the aggregation from symptoms to criteria is benign, the same structure should result in both.

Method

Analyses used the National Epidemiologic Survey of Alcohol and Related Conditions (NESARC; Grant, Dawson, Stinson, Chou, Dufour, & Pickering, 2004). NESARC was collected by the US Bureau of the Census and the National Institute on Alcohol Abuse and Alcoholism, and assessed a very large, nationally representative sample via in-person interviews, the first wave being collected in 2001-2002, the second wave in 2004-2005, with a cross-sectional sample collected in 2012-2013. NESARC is racially diverse, with an oversampling of Blacks and Hispanics (25% Hispanic, 19% Black), as well as geographically diverse (using all US census regions; for more detail on sampling and interview procedures see http://niaaa.census.gov/). Analyses used Wave 2 data and excluded those who reported abstaining from alcohol use in the past year (28.61% of men, 41.35% of women), resulting in a final sample size of N = 22177, 10395 men and 11782 women. Observations were weighted in analyses to obtain estimates representative of the U.S. population.

Measures

DSM-5 (American Psychiatric Association, 2013) criteria items are assessed with the Alcohol Use Disorders and Associated Disabilities Interview Schedule (AUDADIS-IV; Grant et al, 2001). What will be referred to as the “disaggregated” data is the set of 34 items which are used in the AUDADIS algorithm to determine whether the respondent exhibits a particular DSM-5 AUD criterion (this excludes any items unrelated to the “legal consequences” DSM-IV criterion or any other construct unrelated to DSM-5 AUD). A description of these items, the criteria they are used to measure, and endorsement rates for the analysis sample are given in Tables 2 and 3. What will be referred to as the “aggregated” data is the reduced set of 11 diagnostic criteria of AUD created by an individual’s endorsement of an adequate number of criteria-related symptoms. Per the AUDADIS-IV algorithm, all criteria except withdrawal are coded as positive if the individual endorses at least one of the criteria-related symptoms. The withdrawal criterion is coded positive if the individual exhibits all of the first eight withdrawal-related symptoms shown in Table 2, or at least one of the last two. If these are not satisfied, the individual does not meet the criterion. The diagnostic criteria, their descriptions, and the endorsement rates for the samples considered in the following analyses are given in Table 4.

Table 2.

DSM-5 Symptom Items

Abbreviation Description % Men
Endorsed
% Women
Endorsed
Tolerance LESSEFFECT Find that your usual number of drinks had much less effect on you than
it once did
0.069 0.049
MUCHMORE Find that you had to drink much more than you once did to get the
effect you wanted
0.038 0.023
FIFTH Drink as much as a fifth of liquor in one day 0.045 0.007
INCREASE Increase your drinking because the amount you used to drink didnt give
you the same effect anymore
0.023 0.014
Quit STOP More than once want to stop or cut down on your drinking 0.160 0.091
TRY More than once try to stop or cut down on your drinking but found you
couldnt do it
0.038 0.017
Larger Longer LARGER Have a period when you ended up drinking more than you meant to 0.149 0.092
LONGER Have a period when you kept on drinking for longer than you had
intended to
0.121 0.062
Withdrawal ASLEEP Have trouble falling asleep or staying asleep (when the effects of alcohol
were wearing off)
0.056 0.041
SHAKING Find yourself shaking (when the effects of alcohol were wearing off) 0.020 0.010
ANXIOUS Feel anxious or nervous (when the effects of alcohol were wearing off) 0.033 0.015
SICK Feel sick to your stomach or vomit (when the effects of alcohol were
wearing off)
0.085 0.072
RESTLESS Feel more restless than is usual for you (when the effects of alcohol were
wearing off)
0.049 0.031
SWEAT Find yourself sweating or your heart beating fact (when the effects of
alcohol were wearing off)
0.044 0.024
HALLUCINATE See, feel, or hear things that werent really there (when the effects of
alcohol were wearing off)
0.007 0.003
SEIZURES Have fits or seizures (when the effects of alcohol were wearing off) 0.001 0.001
DRUGOVER Take a drink or use any drug or medicine, other than aspirin, Advil, or
Tylenol to get over any of the bad effects of drinking
0.034 0.020
DRUGKEEP Take a drink or use any drug or medicine, other than aspirin, Advil, or
Tylenol to keep from having any of these bad aftereffects of drinking
0.023 0.014
Time spent TIMEDRINK Have a period when you spent a lot of time drinking 0.036 0.015
TIMESICK Have a period when you spent a lot of time being sick or getting over
the bad aftereffects of drinking
0.012 0.007
Give up IMPORTANT Give up or cut down on activities that were important to you in order
to drink, like work, school, or associating with friends or relatives
0.011 0.004
PLEASURE Give up or cut down on activities that you were interested in or that
gave you pleasure in order to drink
0.011 0.004
Continued DEPRESSED Continue to drink even though you knew it was making you feel
depressed, uninterested in things, or suspicious or distrustful of other
people
0.027 0.013
HEALTH Continue to drink even though you knew it was causing you a health
problem or making a health problem worse
0.047 0.020
BLACKOUT Continue to drink even though you had experienced a prior blackout,
that is, awakened the next day not being able to remember some of the
things you did while drinking or after drinking
0.032 0.014
Social problems FAMILY Have a period when your drinking or being sick from drinking often
interfered with taking care of your home or family
0.011 0.006
SCHOOL Have job or school troubles because of your drinking or being sick from
drinking, like missing too much work, not doing your work well, being
demoted or losing a job, or being suspended, expelled, or dropping out
of school
0.007 0.002

Table 3.

DSM-5 Symptom Items, Continued

Abbreviation Description % Men
Endorsed
% Women
Endorsed
Hazard DRIVEWHILE More than once drive a car or other vehicle while you were drinking 0.107 0.040
DRIVEAFTER More than once drive a car, motorcycle, truck, boat, or other vehicle
after having too much to drink
0.036 0.011
HURT Get into situations while drinking or after drinking that increased your
chances of getting hurt, like swimming, using machinery, or walking in
a dangerous area or around heavy traffic
0.026 0.009
Social FAMILYFRIENDS Continue to drink even though you knew it was causing you trouble with
your family or friends
0.017 0.005
FIGHT Get into physical fights while drinking or right after drinking 0.012 0.003
Craving BADLY Want to drink so badly couldn’t think of anything else 0.012 0.005
DESIRE Feel very strong desire to drink 0.054 0.028

Table 4.

DSM-5 AUD Criteria

Abbreviation Description % Men
Endorsed
% Women
Endorsed
TOL Tolerance of alcohol 0.111 0.060
WITHD Withdrawal after using alcohol 0.093 0.065
LARGERLONGER Substance used more than intended 0.176 0.105
CUTD Desire or efforts to cut down on use 0.165 0.093
TIME Spent all day using substance or recovering from effects 0.040 0.018
GIVEUP Important activities given up because of use 0.015 0.005
CONTINUE Continued use despite health problems 0.072 0.033
ROLE Failure to fulfill role obligations 0.015 0.007
HAZARD Use in physically hazardous situations 0.134 0.051
SOCIAL Use despite social/interpersonal problems 0.025 0.008
CRAVING Craving or a strong desire or urge to use alcohol 0.056 0.029

Analysis

The following series of LCAs were conducted in Mplus Version 7 (Muthen & Muthen, 2012). Each LCA was initialized 50 times (with 50 iterations for the initial stage of estimation). LCAs were fit with 1-10 classes and the solutions were evaluated for (1) model fit, (2) substantive quality, and (3) classification agreement. The Bayesian Information Criterion (BIC; Schwarz, 1978) was used as a measure of model fit, which selects a model to maximize the probability of observing the data given the model, while controlling for the capitalization on chance by penalizing the likelihood function using the number of parameters in the LCA. The BIC has shown to be adequate in the selection of the correct number of clusters in an LCA (Lin & Dayton, 1997; Nylund, Asparouhov, & Muthen, 2007). In addition to the BIC, the plots of the mean profiles were created to consider the substantive quality of the solution. The structure of these profiles can indicate whether the classes are distinguished by increasing probabilities of item endorsement, meaning that the classes only differ in their severity of AUD (Ferguson, 1983). To determine whether differential fit led to a highly differentiated solution, we used the Adjusted Rand Index (ARI; Hubert & Arabie, 1985) to assess classification agreement between alternative models and data aggregation levels. The ARI is a measure of cluster solution agreement that can be used to compare the results of cluster solutions with different numbers of clusters by analyzing the number of pairs clustered together in both solutions, apart in both solutions, and the number of times pairs of individuals are discordantly clustered (different clusters in one, same in the other, or vice versa). There are established conventions for the ARI that denote the adequacy of the agreement of the solutions at different magnitudes of the ARI. They are: > 0.9 indicates “excellent” agreement, > 0.8 “good”, > .65 “moderate”, and < .65 is considered “poor” agreement between the two solutions (Steinley, 2004).

Results and Discussion

Table 4 shows the BICs by aggregation level and sex inclusion. Perhaps most striking about the results is that many more classes are found for the data containing disaggregated symptoms, and the number of classes found is not consistent across sex inclusion (9 classes for men, 6 classes for women, and 9 classes for sexes combined). When the data are aggregated into diagnostic criteria, the class structure simplifies into consistent, severity-graded classes (4 classes for men, women, and men and women combined). The 4-class structure in the aggregated criteria largely resembles the diagnostic structure in the DSM-5 (i.e., no diagnosis, mild, moderate, and severe).

An examination of the mean profile plots in Figures 1 and 2 highlights the nature of the differences in results in LCAs based on aggregated (i.e., criterion) versus disaggregated (i.e., specific symptoms). Most importantly, the classes in the disaggregated data are not distinguished by only increasing probabilities of item endorsement. The “crossover” of classes is not seen in the mean profile plots of the aggregated data, whereas there are several instances of this in the disaggregated symptoms (e.g., LESSEFFECT, MUCHMORE, among others)1. When the data are aggregated, individuals in “high tolerance” classes (Class 7 for men and Class 6 for women, distinguished by having a high probability of endorsing one or more tolerance items but little other symptoms) are classified differentially. 71% of women in the high tolerance class are categorized into Class 4 when data are aggregated, which is the lowest class (if we are using the parlance of the DSM-5, this would be the class with no diagnosis), with the next largest proportion allocated into Class 2, the moderately high class (26%). With men, most in the “high tolerance” class are classified into the low group as well (61%), with the next highest proportion being categorized into Class 1, the “high” class (25%). Note, however, these alternatively classified individuals represented a small proportion of the entire sample and the classification agreement between the aggregated and disaggregated data is still in the adequate range (ARI = .78 for men, .80 for women, and .81 for men and women combined).

Figure 1.

Figure 1

Mean Profile Plots for Disaggregated Data

Figure 2.

Figure 2

Mean Profile Plots for Aggregated Data

In general, the larger number of classes derived from disaggregated data serve to resolve intermediate levels of severity not captured in the solutions based on the aggregated data. In addition, in the solutions based on the disaggregated data, there are instances where some symptoms of a criterion appear to diverge in endorsement likelihood from other symptoms of that criterion. Whether or not such differences in overall severity or in symptom configuration potentially provide a foundation for improving clinical diagnosis is an empirical issue that rests upon further construct validation. In the absence of such validation, the clinical utility of the more differentiated solutions must be viewed cautiously owing to both the difficulty of working with so many diagnostic subtypes in a clinical settings and the very low base rates of some classes. Nevertheless, it could be that these more refined classes do reveal important etiological or clinical heterogeneity and, consequently, warrant further investigation.

The results also show that combining men and women has a differential effect based on whether the data are aggregated. When the data are disaggregated, combining genders led to two classes (instead of one) which exhibit low probabilities on most symptoms with the exception of tolerance items (Classes 4 and 8). However, women largely maintain their classifications–only being classified into 6 of the 9 classes (the remaining 3 classes are comprised entirely of men). When the data are aggregated, the classes look almost identical between the data with men, women, and the sexes combined. Examining the classifications between these solutions, however, shows that even though the mean profiles look nearly identical, when men and women are combined, women are pushed into lower severity classes (10% of women are alternatively classified when men are included in the sample).

These results suggest that the impact of aggregating symptoms into criteria removes a group for whom their only AUD symptoms are tolerance-related. These “high tolerance” individuals have heterogeneous classifications when data are aggregated. Some are classified into a class receiving a DSM-5 AUD diagnosis (39% of high tolerance men are in a class which would receive a diagnosis, 29% of high tolerance women). When men and women are combined into a single dataset, men maintain their class assignment, but women are typically alternatively classified, some remaining in the highest severity class, some moved into a lower severity class.

Post Hoc Analysis: Split-Half Replication

The cluster solutions were examined with split-half replication. That is, the data sets were each split in half randomly, and the same procedures employed above were carried out with both random halves. The BICs for the disaggregated symptoms are presented in Table 5 and the aggregated criteria in Table 6.

Table 5.

BICs by Sex and Data Aggregation Level

Classes DisagB DisagM DisagW AgB AgM AgW
1 195078 113802 78606 104091 59898 42518
2 151201 89314 61094 83911 48687 34566
3 143504 84664 58489 81604 47340 33788
4 142016 83854 58048 81341 47198 33775
5 140921 83360 57758 81342 47239 33809
6 140243 83166 57713 81389 47314 33878
7 140149 83106 57771 81467 47385 33948
8 149980 83099 57902 81545 47461 34036
9 139907 83087 58057 81633 47535 34121
10 139922 83170 58250 81717 47624 34217

B - Both men and women, M - Men only, W - Women only

Table 6.

Split Half BICs - Disaggregated Symptoms

Classes Men0 Men1 Men2 Women0 Women1 Women2 Both0 Both1 Both2
1 113802 63467 59454 78606 44599 41462 195078 96929 98415
2 89314 50264 47517 61094 34501 33246 151201 75690 76027
3 84664 47706 45083 58489 32306 32388 143504 72218 72087
4 83854 46240 44556 58048 31216 32418 142016 71568 71514
5 83360 46204 44559 57758 31409 32520 140921 71191 71052
6 83166 46338 44654 57713 31669 32827 140243 71035 70953
7 83106 46622 44944 57771 32061 33225 140149 71039 70913
8 83099 46973 45369 57902 32524 33647 149980 71059 70979
9 83087 47329 45690 58057 32984 34190 139907 71176 71032
10 83170 47765 46172 58250 33437 34601 139922 71302 71147

0 - Full sample, 1 - First split half, 2 - Second split half DNC - Did Not Converge

The most striking observation from this analysis is that the cluster structure is not consistent between the two split halves for men, women, or the two combined when the data are the disaggregated symptoms. With the aggregated criteria, the best-fitting number of clusters was consistent between halves for men, and men and women combined, but not for women alone. A potential explanation for the lack of split-half replication in the aggregated data containing women could be related to the lower prevalences of the criteria in this group. When the data are split in half, the most severe class for women (comprising only .7% of the full sample) is so small as to be nearly nonexistent in the split halves, resulting in less classes being necessary for the full description of the data. The split half solutions were then combined into a single dataset, and the classifications compared using the ARI between the solutions obtained via split half and the combined data (see Analysis section).

The results of this ARI comparison show another instance where the combining of men and women leads to different conclusions about the impact of aggregating data. Specifically, when the sexes are analyzed separately, the split halves have much more consistent structure than when the sexes are combined. When data were aggregated, even though these were all four-cluster solutions, the split halves of the data with sexes combined had poor classification consistency (ARI = .33). When the sexes were separated, however, agreements were good between the split halves (.84 for men, .74 for women). When data are disaggregated symptoms, no solution had highly consistent classifications (ARI = .37 for men, .27 for women, .37 for men and women combined).

Conclusion

The best fitting class structure when analyzing AUD criteria was found to be a four-category, severity-graded solution. However, very different solutions were found when analyzing disaggregated symptoms, and these structures varied between males and females. Although aggregating the data provided an arguably “cleaner” structure of AUD with a smaller number of easily interpretable classes, these findings of severity-graded classes are of limited theoretical interest in that they suggest subtypes that tend to differ more “quantitatively” (i.e., by severity) than “qualitatively” (i.e., by unique configurations of indicators). Moreover, there were patterns of AUD presentation of theoretical and clinical interest that were lost when the symptoms are aggregated to criteria. Specifically, a class of individuals in both men and women arises for which there is little symptomatology but elevated tolerance (i.e., a high probability to state that they have diminished effects from their usual number of drinks and that they have to drink much more to experience the same effect). In analyses of the aggregated data, these individuals are largely categorized into the “no diagnosis” group, for which the probability of endorsing the tolerance criterion for the class is less than .05. To the extent that such individuals have different courses or complications than those without other symptoms, obscuring this group represents a loss of important etiological information. While high tolerance by itself may not seem like a problematic phenotype, it could presage a greater likelihood of progression of AUD and/or increased likelihood of medical or psychiatric complications associated with heavy consumption.

The results from the aggregated data appear to map on to the diagnostic structure of the DSM-5 (that is, 4 clusters separated by an increasing probability of endorsing the criteria). The stability of this cluster solution, however, only exists when men and women are analyzed separately. This could be due to the fact that high-severity women are very rare (.7% of the full sample), leading to fewer classes which are needed to describe the lower variance. Further research should examine these symptom items for whether or not they are masking cluster structure (Steinley & Brusco, 2008). This could serve to identify subsets of symptoms that produce a more consistent cluster structure thereby aiding in the reproducibility of health-related research (Collinsa & Tabak, 2014).

Additionally, there is a lack of general consensus of how to “lump” or “split” criteria across different diagnostic systems. For example, the two DSM-IV criteria of TIMESPENT and GIVEUP are combined into a single criterion in the International Classification of Diseases - 10th Edition (ICD-10; World Health Organization, 1992) and it appears that a similar “lumping” of the DSM LARGERLONGER and CUTD occurs as well. Regardless of how they are operationalized, it is not clear how “narrow” or “broad” each diagnostic criterion should be. Although systematically explored in the current studies, this basic issue could also materially affect the nature of solutions obtained in the LCA. Thus, resolution of the issues raised here go beyond whether to aggregate symptom level data into criteria but also whether to aggregate or disaggregate criteria into small or larger criteria sets. The type of diagnostic research we are proposing involves numerous types of permutations in order to more fully understand the conditional nature of findings and point the way to the most robust and meaningful typologies.

Recommendations

Results from the above analyses lend themselves to recommendations improving the state of the art of LCA in research on AUD and related topics. As is documented in basic research, the number and type (i.e., whether an indicator is “masking” or not) of input indicators in an LCA is pivotal in finding accurate results. However, applied researchers do not know the optimal set of indicators, so in this situation it seems sensible to explore a wide set of solutions, including several levels of data aggregation and examining their correspondence with external covariates to obtain construct validity. Although researchers often use aggregated sets of indicators, the aggregation strategy (as it relates to data reduction) is often arbitrary and does not necessarily “carve nature at her joints”. In fact, it has been shown that sum scores, as well as other approaches to linearly combining variables, are likely to degrade the ability to uncover cluster structure, as compared to a more thoughtful approach to assessing the importance of individual indicators (Steinley, Brusco, & Henson, 2012). Considering the sensitivity of LCA to the number of indicators, shown in a line of methodological research and supported by the results in this paper, it is recommended that data are not aggregated prior to analysis.

Limitations

The data used for this analyses, although providing a very large data set and a diverse sample, is an older dataset (being collected between 2004 and 2005). This renders the data susceptible to period and cohort effects (see Kerr et al, 2004). Results should be replicated with newer data, including the recently released NESARC III (http://www.niaaa.nih.gov/research/nesarc-iii) data set which provides a more contemporary survey of AUD symptoms in the general population.

Note that consistent with drinking patterns of men and women in the United States, NESARC Wave II contains many more female abstainers than male abstainers (41.35% of women had abstained from alcohol in the last 12 months before interview, 28.61% of men). Abstainers represent a type of missing in that were they to drink, we might observe drinking patterns not manifested in our current sample of drinkers. Future research could examine the same issues in cultures where the rates of abstinence are both more similar between men and women and also lower than in the United States (e.g., Australia, see Wilsnack et al, 2000; Wilsnack et al, 2009). Replication in such a population would suggest that the findings presented here are not conditioned upon censoring occasioned by differentially abstinence rates.

Additionally, we did not examine other commonly used measures of AUD such as the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I; First et al, 2002) or the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA-II; Bucholz et al, 1994). These alternative interviews use different items and aggregation methods to determine whether individuals exhibit one or more diagnosis criteria. Similar analyses should be conducted with these and other alternative measures to make a more general statement on the effect of aggregating AUD symptom data to criteria.

We did not attempt to determine the possible reasons for the differences between men and women. This phenomenon may exist across other demographic groups. For example, studies of AUD using older individuals have found different latent variable models than those using a younger sample within the same data set (Sacco et al, 2009; Saha et al, 2007), these differences could be an artifact of combining these population subgroups. Future research should inspect the effect of aggregation within these alternative groupings as well.

Finally, LCA is a widely used technique to model AUD, but techniques such as Factor Mixture Modeling and Mixtures of Factor Analyzers have become more popular in recent years, combining the ideas of a dimensional and categorical structure of AUD (Lubke & Muthen, 2005; McLachlan & Peel, 2000). A similar treatment of these techniques to determine whether the structure of the data is unduly influencing these results would be a useful future direction for this line of research.

Table 7.

Split Half BICs - Aggregated Criteria

Classes Men0 Men1 Men2 Women0 Women1 Women2 Both0 Both1 Both2
1 59898 30423 29547 42518 21303 21298 104091 51838 52338
2 48687 24679 24155 34566 17162 17555 83911 42003 42080
3 47340 24077 23486 33788 16755 17270 81604 41014 40860
4 47198 24060 23460 33775 16784 17316 81341 40937 40769
5 47239 24104 23530 33809 16850 17370 81342 40978 40813
6 47314 24177 23608 33878 16925 17440 81389 41048 40888
7 47385 24251 23684 33948 17006 17523 81467 41123 40967
8 47461 24326 23761 34036 17086 17598 81545 41205 41057
9 47535 24406 23836 34121 17177 17678 81633 41288 41149
10 47624 24498 23916 34217 17264 17760 81717 41362 41233

0 - Full sample, 1 - First split half, 2 - Second split half

DNC - Did Not Converge

Footnotes

1

Given the large number of classes, it is difficult to determine this visually. Examining the average correlations of the item thresholds within each class for each dataset show a much stronger relationship between within-class item thresholds for the aggregated data (approximately .98 for all datasets) than with disaggregated (.64 for men and the sexes together, .73 for women). This is further evidence that the classes in the disaggregated data differ by more than simply severity.

References

  1. Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427–435. doi: 10.1111/j.0006-341X.2004.00187.x. [DOI] [PubMed] [Google Scholar]
  2. American Psychiatric Association . Diagnostic and statistical manual of mental disorders. 5th American Psychiatric Publishing; Arlington, VA: 2013. [Google Scholar]
  3. Agrawal A, Heath AC, Lynskey MT. DSMIV to DSM5: the impact of proposed revisions on diagnosis of alcohol use disorders. Addiction. 2011;106(11):1935–1943. doi: 10.1111/j.1360-0443.2011.03517.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Babor TF, Hofmann M, DelBoca FK, Hesselbrock V, Meyer RE, Dolinsky ZS, Rounsaville B. Types of alcoholics, I: evidence for an empirically derived typology based on indicators of vulnerability and severity. Archives of General Psychiatry. 1992;49:599–608. doi: 10.1001/archpsyc.1992.01820080007002. [DOI] [PubMed] [Google Scholar]
  5. Bartholomew DJ, Knott M, Moustaki I. Latent variable models and factor analysis: A unified approach. Vol. 904. John Wiley & Sons; 2011. [Google Scholar]
  6. Bauer DJ, Curran PJ. Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]
  7. Beseler CL, Taylor LA, Kraemer DT, Leeman RF. A Latent Class Analysis of DSMIV Alcohol Use Disorder Criteria and Binge Drinking in Undergraduates. Alcoholism: Clinical and Experimental Research. 2012;36:153–161. doi: 10.1111/j.1530-0277.2011.01595.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brady KT, Randall CL. Gender differences in substance use disorders. Psychiatric Clinics of North America. 1999;22:241–252. doi: 10.1016/s0193-953x(05)70074-5. [DOI] [PubMed] [Google Scholar]
  9. Brusco MJ. Clustering binary data in the presence of masking variables. Psychological Methods. 2004;9:510–523. doi: 10.1037/1082-989X.9.4.510. [DOI] [PubMed] [Google Scholar]
  10. Brusco MJ, Cradit JD. A variable-selection heuristic for K-means clustering. Psychometrika. 2001;66(2):249–270. [Google Scholar]
  11. Bucholz KK, Heath AC, Reich T, Hesselbrock VM, Krarner JR, Nurnberger JI, Schuckit MA. Can we subtype alcoholism? A latent class analysis of data from relatives of alcoholics in a multicenter family study of alcoholism. Alcoholism: Clinical and Experimental Research. 1996;20(8):1462–1471. doi: 10.1111/j.1530-0277.1996.tb01150.x. [DOI] [PubMed] [Google Scholar]
  12. Bucholz KK, Cadoret R, Cloninger CR, et al. Semi-structured psychiatric interview for use in genetic linkage studies: A report on the reliability for the SSAGA. Journal of Studies on Alcohol. 1994;55:149–158. doi: 10.15288/jsa.1994.55.149. [DOI] [PubMed] [Google Scholar]
  13. Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014;505(7485):612. doi: 10.1038/505612a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chassin L, Flora DB, King KM. Trajectories of alcohol and drug use and dependence from adolescence to adulthood: the effects of familial alcoholism and personality. Journal of Abnormal Psychology. 2004;113(4):483. doi: 10.1037/0021-843X.113.4.483. [DOI] [PubMed] [Google Scholar]
  15. Chung T, Martin CS. Classification and course of alcohol problems among adolescents in addictions treatment programs. Alcoholism: Clinical and Experimental Research. 2001;25:1734–1742. [PubMed] [Google Scholar]
  16. Ferguson TS. Bayesian density estimation via mixtures of normal distributions. In: Rizvi MH, Rustagi JS, Siegmund D, editors. Recent advances in statistics. Academic Press; New York: 1983. pp. 287–302. [Google Scholar]
  17. First M, Spitzer R, Gibbon M, Williams J. Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Patient Edition (SCID-I/P) Biometrics Research, New York State Psychiatric Institute; New York: 2002. [Google Scholar]
  18. Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association. 1983;78:553–569. [Google Scholar]
  19. Grant BF, Dawson DA, Stinson FS, Chou PS, Kay W, Pickering R. The Alcohol Use Disorder and Associated Disabilities Interview Schedule-IV (AUDADIS-IV): reliability of alcohol consumption, tobacco use, family history of depression and psychiatric diagnostic modules in a general population sample. Drug and Alcohol Dependence. 2003;71(1):7–16. doi: 10.1016/s0376-8716(03)00070-x. [DOI] [PubMed] [Google Scholar]
  20. Grant BF, Dawson DA, Stinson FS, Chou SP, Dufour MC, Pickering RP. The 12-month prevalence and trends in DSM-IV alcohol abuse and dependence: United States 1991-1992 and 2001-2002. Drug and Alcohol Dependence. 2004;74:223–234. doi: 10.1016/j.drugalcdep.2004.02.004. [DOI] [PubMed] [Google Scholar]
  21. Harford TC, Muthen BO. The dimensionality of alcohol abuse and dependence: a multivariate analysis of DSM-IV symptom items in the National Longitudinal Survey of Youth. Journal of Studies on Alcohol and Drugs. 2001;62(2):150. doi: 10.15288/jsa.2001.62.150. [DOI] [PubMed] [Google Scholar]
  22. Hubert L, Arabie P. Comparing partitions. Journal of classification. 1985;2(1):193–218. [Google Scholar]
  23. Kahler CW, Strong DR, Hayaki J, Ramsey SE, Brown RA. An item response analysis of the alcohol dependence scale in treatment-seeking alcoholics. Journal of Studies on Alcohol and Drugs. 2003;64(1):127. doi: 10.15288/jsa.2003.64.127. [DOI] [PubMed] [Google Scholar]
  24. Kahler CW, Strong DR, Stuart GL, Moore TM, Ramsey SE. Item functioning of the alcohol dependence scale in a high-risk sample. Drug and Alcohol Dependence. 2003;72(2):183–192. doi: 10.1016/s0376-8716(03)00199-6. [DOI] [PubMed] [Google Scholar]
  25. Kendler KS, Karkowski LM, Prescott CA, Pedersen NL. Latent class analysis of temperance board registrations in Swedish male-male twin pairs born 1902 to 1949: Searching for subtypes of alcoholism. Psychological Medicine. 1998;28(04):803–813. doi: 10.1017/s003329179800676x. [DOI] [PubMed] [Google Scholar]
  26. Langenbucher JW, Labouvie E, Martin CS, Sanjuan PM, Bavly L, Kirisci L, Chung T. An Application of Item Response Theory Analysis to Alcohol, Cannabis, and Cocaine Criteria in DSM-IV. Journal of Abnormal Psychology. 2004;113(1):72. doi: 10.1037/0021-843X.113.1.72. [DOI] [PubMed] [Google Scholar]
  27. Lin TH, Dayton CM. Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics. 1997;22(3):249–264. [Google Scholar]
  28. Lubke GH, Muthn B. Investigating population heterogeneity with factor mixture models. Psychological methods. 2005;10(1):21. doi: 10.1037/1082-989X.10.1.21. [DOI] [PubMed] [Google Scholar]
  29. McLachlan G, Peel D. Finite mixture models. John Wiley & Sons; 2004. [Google Scholar]
  30. Morey LC, Skinner HA, Blashfield RK. A typology of alcohol abusers: correlates and implications. Journal of Abnormal Psychology. 1984;93:408–417. doi: 10.1037//0021-843x.93.4.408. [DOI] [PubMed] [Google Scholar]
  31. Moss HB, Chen CM, Yi HY. Subtypes of alcohol dependence in a nationally representative sample. Drug and alcohol dependence. 2007;91(2):149–158. doi: 10.1016/j.drugalcdep.2007.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Muthen BO, Grant B, Hasin D. The dimensionality of alcohol abuse and dependence: factor analysis of DSM-III-R and proposed DSM-IV criteria in the 1988 National Health Interview Survey. Addiction. 1993;88(8):1079–1090. doi: 10.1111/j.1360-0443.1993.tb02127.x. [DOI] [PubMed] [Google Scholar]
  33. Muthen B, Muthen L. Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and experimental research. 2000;24(6):882–891. [PubMed] [Google Scholar]
  34. Muthen L, Muthen B. Mplus Users Guide. Seventh Muthen & Muthen; Los Angeles, CA: 1998-2012. [Google Scholar]
  35. Nelson CB, Rehm J, Bedirhan T, Grant B, Chatterji S. Factor structures for DSM-IV substance disorder criteria endorsed by alcohol, cannabis, cocaine and opiate users: results from the WHO reliability and validity study. Addiction. 1999;94(6):843–855. doi: 10.1046/j.1360-0443.1999.9468438.x. [DOI] [PubMed] [Google Scholar]
  36. Nolen-Hoeksema S, Hilt L. Possible contributors to the gender differences in alcohol use and problems. The Journal of general psychology. 2006;133(4):357–374. doi: 10.3200/GENP.133.4.357-374. [DOI] [PubMed] [Google Scholar]
  37. Nylund KL, Asparouhov T, Muthen BO. Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling. 2007;14(4):535–569. [Google Scholar]
  38. Raftery AE, Dean N. Variable selection for model-based clustering. Journal of the American Statistical Association. 2006;101(473):168–178. [Google Scholar]
  39. Sacco P, Bucholz KK, Spitznagel EL. Alcohol use among older adults in the national epidemiologic survey on alcohol and related conditions: a latent class analysis. Journal of Studies on Alcohol and Drugs. 2009;70(6):829. doi: 10.15288/jsad.2009.70.829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Saha TD, Chou SP, Grant BF. Toward an alcohol use disorder continuum using item response theory: results from the National Epidemiologic Survey on Alcohol and Related Conditions. Psychological Medicine. 2006;36(7):931–942. doi: 10.1017/S003329170600746X. [DOI] [PubMed] [Google Scholar]
  41. Saha TD, Stinson FS, Grant BF. The role of alcohol consumption in future classifications of alcohol use disorders. Drug and Alcohol Dependence. 2007;89(1):82–92. doi: 10.1016/j.drugalcdep.2006.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Schuckit M, Pitts FN, Reich T, King LJ, Winokur G. Alcoholism: I. Two types of alcoholism in women. Archives of General Psychiatry. 1969;20(3):301–306. doi: 10.1001/archpsyc.1969.01740150045007. [DOI] [PubMed] [Google Scholar]
  43. Smith GW, Shevlin M. Patterns of alcohol consumption and related behaviour in Great Britain: a latent class analysis of the alcohol use disorder identification test (AUDIT) Alcohol and Alcoholism. 2008;43(5):590–594. doi: 10.1093/alcalc/agn041. [DOI] [PubMed] [Google Scholar]
  44. Steinley D. Properties of the Hubert-Arable Adjusted Rand Index. Psychological Methods. 2004;9(3):386. doi: 10.1037/1082-989X.9.3.386. [DOI] [PubMed] [Google Scholar]
  45. Steinley D, Brusco MJ. A new variable weighting and selection procedure for K-means cluster analysis. Multivariate Behavioral Research. 2008;43(1):77–108. doi: 10.1080/00273170701836695. [DOI] [PubMed] [Google Scholar]
  46. Steinley D, Brusco MJ, Henson R. Principal cluster axes: A projection pursuit index for the preservation of cluster structures in the presence of data reduction. Multivariate Behavioral Research. 2012;47(3):463–492. doi: 10.1080/00273171.2012.673952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Swanson SA, Lindenberg K, Bauer S, Crosby RD. A Monte Carlo investigation of factors influencing latent class analysis: An application to eating disorder research. International Journal of Eating Disorders. 2012;45(5):677–684. doi: 10.1002/eat.20958. [DOI] [PubMed] [Google Scholar]
  48. Whitesell NR, Beals J, Mitchell CM, Novins DK, Spicer P, Manson SM, AI-SuperPFP Team Latent class analysis of substance use: Comparison of two American Indian reservation populations and a national sample. Journal of Studies on Alcohol and Drugs. 2006;67(1):32. doi: 10.15288/jsa.2006.67.32. [DOI] [PubMed] [Google Scholar]
  49. Widiger TA, Samuel DB. Diagnostic categories or dimensions? A question for the Diagnostic and statistical manual of mental disorders. Journal of Abnormal Psychology. 2005;114:494. doi: 10.1037/0021-843X.114.4.494. [DOI] [PubMed] [Google Scholar]
  50. Wilsnack RW, Vogeltanz ND, Wilsnack SC, Harris TR. Gender differences in alcohol consumption and adverse drinking consequences: cross-cultural patterns. Addiction. 2000;95:251–265. doi: 10.1046/j.1360-0443.2000.95225112.x. [DOI] [PubMed] [Google Scholar]
  51. Wilsnack RW, Wilsnack SC, Kristjanson AF, Vogeltanz-Holm ND, Gmel G. Gender and alcohol consumption: Patterns from the multinational GENACIS project. Addiction. 2009;104(9):1487–1500. doi: 10.1111/j.1360-0443.2009.02696.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Winokur G, Rimmer J, Reich T. Alcoholism IV: Is there more than one type of alcoholism? The British Journal of Psychiatry. 1971;118(546):525–531. doi: 10.1192/bjp.118.546.525. [DOI] [PubMed] [Google Scholar]
  53. World Health Organization . The ICD-10 classification of mental and behavioural disorders: Clinical descriptions and diagnostic guidelines. Geneva: World Health Organization: 1992. [Google Scholar]
  54. Yang CC. Evaluating latent class analysis models in qualitative phenotype identification. Computational Statistics & Data Analysis. 2006;50(4):1090–1104. [Google Scholar]

RESOURCES