Latent Classiness and Other Mixtures

Michael C Neale

doi:10.1007/s10519-013-9637-3

. Author manuscript; available in PMC: 2015 May 1.

Published in final edited form as: Behav Genet. 2014 Jan 30;44(3):205–211. doi: 10.1007/s10519-013-9637-3

Latent Classiness and Other Mixtures

Michael C Neale ¹

PMCID: PMC4024345 NIHMSID: NIHMS561482 PMID: 24477932

Abstract

The aim of this article is to laud Lindon Eaves’ role in the development of mixture modeling in genetic studies. The specification of models for mixture distributions was very much in its infancy when Professor Eaves implemented it in his own FORTRAN programs, and extended it to data collected from relatives such as twins. It was his collaboration with the author of this article which led to the first implementation of mixture distribution modeling in a general-purpose structural equation modeling program, Mx, resulting in a 1996 article on linkage analysis in Behavior Genetics. Today, the popularity of these methods continues to grow, encompassing methods for genetic association, latent class analysis, growth curve mixture modeling, factor mixture modeling, regime switching, marginal maximum likelihood, genotype by environment interaction, variance component twin modeling in the absence of zygosity information, and many others. This primarily historical article concludes with some consideration of some possible future developments.

Keywords: mixture distribution, conditional likelihood, latent class analysis, genetic analysis, twins

Introduction

The papers in this special issue devoted to the contributions of Dr. Lindon Eaves to the field of behavior genetics span an extraordinary breadth of intellectual contributions, each marked by their depth of scholarship. To grasp the extent of his scientific developments is at once delightful, inspiring and humbling. It is also important to recognize the context in which these advances were made. Today, computer science is advanced such that programing is readily accomplished in high-level languages on extraordinarily powerful devices such as smart phones, tablets, laptop and desktop computers, all networked to clusters of teraflop and petaflop performance. Thirty years ago, when I made my first acquaintance with Dr. Eaves, things were different. Although I had used a Commodore Pet 2001 computer with 4k of RAM (Wikipedia, 2013) as an undergraduate, communication with more powerful computers was limited to using punch cards for input and 132-column (14.75in wide) paper for output. Although the output paper had green bars on it, this method of communicating with computers would hardly be considered “green” today! Despite obstacles such as these, Lindon was making fantastic strides in the field, combining his knowledge of statistics, genetics and psychology to lay the foundation of our discipline. In this article I want to focus on his contributions to mixture distribution modeling.

Early mixture distributions: Major loci

Curiously, but perhaps not coincidentally, the first known approach to fitting a mixture distribution appears to have been that of the famous statistician and biometrician Karl Pearson (Pearson, 1894), as described by Everitt (1996) and McLachlan & Peel (2000). A colleague had suggested that the skewness in a set of measurements of the ratio of forehead to body length of crabs might be due to the presence of two separate species. Pearson, even more heavily handicapped by the absence of computer resources and predating the advent of maximum likelihood by some 30 years, supported this contention using an incredibly laborious algebraic method of moments. This tradition of persevering in the face of daunting challenges such as the lack of suitable technology or knowledge or both, characterizes the careers of our greatest scientists in general, and of Lindon Eaves in particular.

At times, in order to create new knowledge it is necessary to deconstruct the old. On several occasions in his career, Eaves has done just this, and the area of mixture distributions provides an apt example. Well before molecular genetics enabled the direct measurement of genomic variants, it was hypothesized that a single locus could have a large effect on a quantitative trait. If this effect was large enough, then the distribution of that trait in the population would not follow the normal distribution. This contention is entirely reasonable, and a mixture distribution test was incorporated as part of a set of tests used to discern what were known as ‘major genes’ (Morton & MacLean, 1974; Go, Elston, & Kaplan, 1978). This set included the marginal mixture distribution, but was more sophisticated in that it considered the distribution of pairs of relatives – a bivariate mixture distribution. Although Elston (1979) had cautioned against relying on these higher moments as a criterion for the existence of a major locus, the method remained popular. It was also beginning to be applied to behavioral and psychological tests such as measures of IQ (Weiss, 1972). That was until, perhaps, Eaves’ 1983 paper in the American Journal of Human Genetics. In a characteristically direct treatment, Lindon simulated data (using what we would now consider hopelessly archaic mainframe computer) according to a simple model of test items which were correlated due to the effects of a single common factor or ability dimension. This common factor followed the normal distribution, and individuals’ item responses were generated according to a single parameter logistic item response theory model (Lord, Novick, & Birnbaum, 1968). Thus the null hypothesis of no major locus was true for these data, but the data generating mechanism was the standard model for generating item responses and the subsequent scoring of such tests, i.e., a sum score of binary items. Forty items were used, with an even spread of item difficulties from −3 to +3 standard deviations on the ability scale. The paper showed that of ten datasets, only in seven cases did optimization behave satisfactorily, and of those in only one case were the criterion for declaring support for a major locus not met. It is easy to regard such debunking of problematic methodology as an ad hominem attack – especially when on the receiving end – and it can wound one’s pride. To my mind this was not Lindon’s intent in this case, nor in any other. He was simply being a responsible scientist, trying to ensure that we put our faith in only those methods appropriate for the data at hand.

Marginal maximum likelihood: A mixture distribution

My first encounter with mixture distribution work was with Lindon and colleagues in the early 1990’s, when he was developing models for multiple symptom data. Such data were abundant at VCU at that time, with the advent of data from the first waves of the Virginia Twin Study of Adolescent Behavioral Development (Eaves et al., 1997; Hewitt et al., 1997, VTSABD) and the Virginia Twin Study of Psychoactive Substance Use Disorders (VATSPSUD; (Kendler, Kessler, Heath, Neale, & Eaves, 1991)). In this article, Dr. Eaves combined item-response theory, latent class analysis and a behavior genetic model to tackle the question of what factors are responsible for the similarities of symptoms across relatives, and how best to model such similarities. In common with the major locus vs. polygenic theme of the preceding section, both of these possible sources were considered as mechanisms for twin similarity for class membership. There is no substitute for reading Eaves’ original articles – I recommend it strongly. In this section I aim to highlight the key concepts that led to the development of this model.

Bayes’ Theorem and marginal maximum likelihood

Item-level data present special problems for model-fitting approaches, of which one of the most significant is that the likelihood can be very expensive (i.e., slow) to compute, even with early 21st century hardware. The development of item response theory models described in the previous section incorporates one approach to resolving the issue, and it is very much at the heart of mixture distribution modeling. The idea is to use a conditional probability approach, using the Kolmogorov definition:

p (A | B) = \frac{p (B \cap A)}{p (B)},

(1)

which can be rearranged to give:

p (A \cap B) = p (A | B) p (B) .

(2)

It turns out that while the joint probability on the left hand side of the equation may be very difficult to compute, the marginal probability p(A|B) can prove to be very easy to evaluate. In the case of multiple binary or ordinal items, a natural model is to assume that there is an underlying, normally distributed liability dimension. This assumption seems appropriate because it represents the multifactorial hypothesis that a very large number of factors influence complex phenotypes such as behavior. Results from genome-wide association studies (GWAS) thus far indicate that for many traits there are indeed few if any variants that account for more than a tiny fraction of the variance. Variation appears truly polygenic, with a large number of variants each having very small effects, and environmental factors may operate in a similar fashion (they may not and their systematic investigation is a future goal for our field). If an outcome – an item on a test or a symptom of a disorder – is measured at the binary (yes/no) or ordinal (never/sometimes/often) level, it is necessary to invoke a threshold model or similar to describe the change from continuous variation of the underlying trait to the observed binary or ordinal response set. A popular approach here is to use a threshold model, in which there are particular points on the distribution where individuals abruptly change from one response category to the next (Curnow & Smith, 1975; Falconer, 1965; Mehta, Neale, & Flay, 2004). To compute the likelihood under the threshold model requires numerical integration of the normal distribution, because there is no closed form for the normal probability density function (pdf). In practice, this means evaluating a small number of points (typically k = 10 – 20) along the distribution. The trouble starts when considering several (m) correlated items, as it is necessary to compute k^m points in the distribution, which grows very rapidly with m. This is known as the curse of dimensionality. Bock (1972) showed how the problem could be approached by: i) selecting a point on the latent trait dimension and calculating the height of the normal pdf at that point; ii) computing the likelihood of the data conditional on the latent trait; iii) multiplying the results of steps i) and ii); and iv) repeating steps i)–iii) for multiple (k) points, and summing the results. This algorithm may seems a bit tortuous, but several points are worth noting. First, step i) corresponds to evaluating p(B) in Equation 1, while step ii) evaluates p(A‖B). The latter yields great computational savings, because conditional on the value of the latent trait the item responses are independent. Thus instead of evaluating k^m points it is necessary only to evaluate mk² points, being k points for each of the m dimensions, repeated for each of the k points chosen on the latent trait. This procedure is known as marginal maximum likelihood (MML), and it has applications in many areas, with more likely to come. Most important to note is that it is fundamentally a mixture distribution.

Latent class analysis with data from twins

The idea behind latent class models is that items or other measures covary purely because the population consists of subgroups (Lazarsfeld, 1950). These subgroups are characterized by different item response probabilities, but within each subgroup the items do not correlate at all. There is therefore a direct parallel between MML and latent class analysis. Normal theory MML is essentially a latent class model with class membership probabilities that follow the normal distribution, and where the item response probabilities for each class are a simple linear function of the latent trait mean for that class. The overall likelihood of any individual data vector is computed as a weighted sum of its conditional likelihoods. This sum can be written as:

L_{i} = \sum_{j = 1}^{q} p (class = j) L (x_{i} | class = j)

(3)

where p(class = j) is the probability that an individual belongs to class j, q is the number of classes in the model, and L(x_i|class = j) is the conditional probability of the item responses x_i of individual i, given that they belong to class j. In practice, the class membership probabilities are estimated along with the conditional response probabilities for each class.

To date, latent class models have not enjoyed as much popularity as common factor models, for several possible reasons. First, it may be somewhat implausible that there is no residual correlation within classes. To a degree this problematic assumption has been ameliorated by the advent of factor mixture modeling (Lubke & Muthen, 2005a), which fits a model with one or more factors within each class (latent growth curve mixture modeling is one example of this type of model). Second, maximizing the likelihood of mixture models is generally subject to more difficulties in optimization than are single-component models, with convergence failure or converging to a local instead of a global maximum a common problem. Third, the convenient likelihood ratio test for comparing models with sub-models (computed as a difference in −2 log-likelihood) does not follow a χ² distribution (Steiger, Shapiro, & Browne, 1985; Nylund, Asparoutiov, & Muthen, 2007). Alternative, although computationally heavy, methods such as the bootstrap likelihood ratio tests (McLachlan & Peel, 2000), may be used instead.

These limitations aside, it is still reasonable to estimate latent class models with certain datasets, at which point the question as to how to model twin resemblance for latent class membership emerges. A delightful variety of options are described in Eaves et al (1993). First it is possible to estimate directly the joint class membership probabilities. This multinomial model is effectively a saturated model against which models with fewer parameters can be compared. Setting these probabilities equal for MZ and DZ pairs specifies a shared environment model for twin resemblance. Under a single gene Hardy-Weinberg equilibrium model, class membership probabilities are estimated to be a simple function of allele frequencies: p², 2pq and q² (with q = (1 − p)). In this case MZ pairs are always concordant for class membership, while DZ pairs’ joint probabilities are calculated according to their expected frequencies under the assumption of independent assortment. A third, and perhaps overlooked, possibility described in the article is to set the pairwise class membership probabilities according to both a majorlocus and a major environmental factor, to yield a model of genotype × environment (G × E) interaction. A valuable aspect of this approach is that it is relatively scale-free, compared to direct analysis of sum or factor scores using ANOVA or regression methods on observed sum or factor scores. Such direct methods carry a non-trivial risk of false positive findings due to differences in measurement precision at different points on the scale, as discussed by Dr. Eaves in a later article (Eaves, 2006) and elsewhere in this issue.

Multivariate linkage analysis

During the mid-1990’s, Dr. Eaves and I were among the faculty teaching at the Methodology for Genetic Studies workshops in Europe and Boulder, Colorado. At that time, linkage analysis was a popular approach to gene-mapping, for several reasons. First, very few genetic markers were available, especially compared to the millions of single nucleotide polymorphisms (SNPs) that can be assayed at low cost today. The primary technology in use was restriction-length fragment polymorphisms (RFLPs) which yielded a large number of ‘alleles’ at each of a small number (≈ 400) of loci. Second, many geneticists had access to pedigree or family data, which enabled studies of the resemblance between relatives as a function of their degree of sharing over a particular region of the genome. Thus while most gene-mapping studies in 2013 use association with SNPs (often in a case-control design), in the 1990s most mapping involved linkage.

Historically, linkage analysis was often conducted using a regression-based approach described by Haseman and Elston (1972) in the second volume of this journal. The essence of the method is that if a particular region of the genome harbors factors that cause variation in the outcome trait, then siblings who share two alleles identical by descent (IBD) should be more similar than those who share one, who in turn should be more similar than those who share zero. In practice, it is not always possible to unambiguously diagnose the degree of IBD sharing for a sib pair at a particular locus, but it is always possible to ascribe probabilities. In the absence of any genomic data, the prior probabilities are .25, .5 and .25 for IBD 0, 1 and 2, respectively. A regression of intrapair differences on IBD sharing is a natural approach, and it is common to use p(IBD = 2) + .5p(IBD = 1) as a continuous measure of IBD sharing. This has the advantage of computational speed, as there are closed-form algorithms to obtain regression parameter estimates (no optimization required). Expediency aside, we note that the mathematical model is not accurate. That is, sibling pairs never share, e.g., 0.783 alleles IBD at a locus, they share either 0, 1 or 2. Following discussions with David Fulker, Stacey Cherny and others at the Boulder workshop in 1995, a mixture distribution approach to the analysis was devised (Fulker & Cherny, 1996). Equation 3 applies directly, in that the joint likelihood of the sib pairs was computed as a weighted sum of the likelihoods of the data given that the sib pairs were IBD 0, 1 or 2. The weights were the respective probabilities that each individual pair was IBD 0, 1 or 2, based on the genetic marker data (using a program such as Genehunter (Kruglyak, Daly, Reeve-Daly, & Lander, 1996)).

To enable multivariate linkage analysis, Dr. Eaves and I developed a mixture distribution modeling extension to the package Mx (Neale, 1997). To our knowledge, this was the first implementation of mixture distribution modeling in a general purpose structural equation modeling program. Combined with the feature of definition variables, with which the covariance structure, mean vector or weights could differ for every data vector in the sample, it was possible to conduct multivariate multipoint linkage analysis. In what might be called ‘blind’ experimental procedure for simulation studies, Dr. Eaves generated the data using parameter estimates only known to him, while I fitted the model with the new software to establish whether the correct estimates could be recovered. Mixture distribution modeling in Mx, and in its successor, OpenMx (Boker et al., 2011), has since become a popular and valuable feature with a wide variety of applications, some of which are described in the following section. I personally, and the field generally, owe Dr. Eaves a debt of thanks – not only for his role in developing mixture distribution modeling in Mx, but also for supporting its initial development when I was a junior faculty member in his department.

Zygosity and other examples of ignorance

Since their inception (Rende, Plomin, & Vandenberg, 1990) and to this day, most twin studies rely on accurate diagnosis of zygosity. Data analysis usually proceeds by classifying the twins as MZ or DZ and analyzing them in separate groups (Neale & Cardon, 1992). Although highly precise, almost unambiguous, classification is possible with genetic marker data, other methods such as questionnaires (Nichols & Bilbro, 1966) are subject to approximately 5% error. Again, the natural statistical approach to dealing with imperfect diagnostic data is not to make a diagnosis at all, but to incorporate quantitative measures of what we know (and what we don’t) into our analyses. Ten years ago the mixture distribution approach was described in the pages of this journal (Neale, 2003) – effectively applying Equation 3 – but it has been used relatively infrequently since, either by myself or others. One exception is Benyamin et al (2005), where there was no data on zygosity at all, but in principle the method should be used whenever zygosity is in doubt for any pair in the sample.

Association analyses are another area in which uncertainty exists, although virtually all analyses of genetic marker data proceed as if it does not. SNP genotypes diagnosed by microarray or sequencing technologies are often highly accurate (in the case of sequencing dependent on coverage) but they are not perfect. A mixture distribution approach to association analysis was developed independently by Sham et al (2004) and myself in Min et al (2004), both using Mx for the analysis. It is not surprising that the method is not used for GWAS, because it requires fitting a mixture distribution model by numerical optimization. which is much slower than, e.g., using a closed-form approach that involves a single evaluation of a formula. When there are millions of SNPs involved, optimization-based approaches are not practical with early 21st century hardware. However, it may prove possible to develop much faster algorithms, and Moore’s Law (Moore, 1965) continues to apply to hardware performance, so this may change in future. It is also not clear whether the relationship between analyses performed with a mixture distribution approach and those that use the convenient fiction of perfect allele calling is monotonic across loci. If it were, ordering the loci from most significant to least would give the same order for both analytical approaches, but this seems unlikely. However, the two methods may be so highly correlated that using one vs. the other would make no practical difference.

Growth curve mixture modeling is another popular approach in the social sciences (Muthén & Shedden, 1999; Witkiewitz, Maisto, & Donovan, 2010; Bauer & Curran, 2003). It is an application of factor mixture modeling (Lubke & Neale, 2008; Lubke & Muthén, 2005b) in which the factor structure is specified with a finite number of parameters, regardless of the number of occasions of measurement. For the purposes of association analysis, one approach is to obtain posterior probabilities of class membership for each individual in the sample¹, and then assign case vs. control status according to a cut-off of probability of class-membership. For example, one might wish to identify those with a high probability of belonging to a heavy-use class in an analysis of longitudinal substance use data. Once again, a more precise approach would be to use the probabilities of class membership directly, using the method of Sham et al (2004) as described above. Ideally, multivariate or longitudinal models would directly include SNP genotypes or other indicators of risk directly, as described by Medland & Neale’s (2009) analysis of DRD4 association with marijuana, stimulants and sedatives. This single-step analysis ideal is likely to remain computationally impractical at the genome-wide level for some years to come. It is I think in the Eaves spirit that trying to get it right, even if it takes quite a bit longer, is worth every drop of energy put into it. The contrary, a quick-and-dirty analysis, knowing that it is wrong (and especially without acknowledging its limitations) can border on the unethical. Often datasets take years to assemble, so care and patience with their analysis is appropriate, albeit sometimes in the face of pressure to publish or perish.

Genotype by environment interaction

Two general paradigms exist for the analysis of G×E interaction, and both can benefit from mixture distribution approaches. One is the context of measured genes and measured environments, of which Caspi & Moffit’s (2002) article is the canonical example. In this case, as discussed in the latent class analysis section above, a conditional likelihood approach can be helpful in handling some of the problems with behavioral and psychological measurement, which are often based on multiple binary or ordinal indicators (items or symptoms). Dr. Eaves’ latent class approach has some parallels in the continuous latent trait domain. For example, Schmitt et al (2006) describe a method to test for non-normality in the latent trait distribution of a test based on multiple items. Additional novel developments in this area were provided by Dylan Molenaar and colleagues (Molenaar, Dolan, & Verhelst, 2010). Effectively, MML permits exploration of the trait distribution in the population which is relatively free of artifacts due to measurement. Combination of this method with genetic and environmental measures has the potential to provide a less scale-dependent test for measured G × E, though this has yet to be developed.

A second genre of G × E interaction modeling is within the classical twin study, where the interaction between latent genetic and environmental factors is the focus. One simple approach described by Jinks & Fulker (1970)(Jinks & Fulker, 1970) is the regression of twin pair sums on their differences. Van der Sluis et al (2006) described an MML approach which has several advantages. It has greater statistical power when data are not transformed to normality (G × E interaction inevitably generates non-normality so this is a dubious procedure). Second, more nuanced analyses become possible, including joint analysis of multiple groups (e.g., DZ as well as MZ pairs) and tests of whether genetic factors that affect trait level are the same as those that influence sensitivity to environmental conditions. These and multivariate extensions were recently described (Molenaar, Sluis, Boomsma, & Dolan, 2012), again using the mixture distribution modeling features in Mx.

Yet further applications of MML are possible within the framework of the Purcell G × E interaction model. A particularly thorny issue exists when the putative environmental moderator is not purely environmental but also includes genetic factors, some of which are shared with the outcome phenotype of interest. One approach would be to integrate the likelihood over the latent environmental factor of a putative environmental moderator. That is, to put values of the latent environmental factor on the path from genotype to environment, and weight them in the (by now hopefully familiar) MML approach. Substituting environmental latent variables as moderators has the potential to eliminate some of the issues with using the observed value of the moderator – such as the potential for non-linear G × G interaction to masquerade as G × E interaction. Although MML seems viable in this context, validating the method would of course require extensive simulations across a range of conditions. As usual, Dr. Eaves is way ahead of the curve here, having provided an alternative approach via Bayesian estimation to tackle the same problem (Eaves & Erkanli, 2003). The latency between Dr. Eaves’ contributions and their widespread adoption by the field of behavior genetics is frequently longer than desired, partly because his thinking is often further ahead than that of his peers. Provision of open source, readily accessible, verifiable and modifiable software can go some way towards reducing this scientific lag (Morin et al., 2012).

Summary

Finally I must acknowledge the contribution of my mentor, David Fulker, who was not accorded a festschrift, though he certainly earned one, due to his untimely death. Lindon and David were good friends and colleagues and when I was at the Institute of Psychiatry as a graduate student, David encouraged me to contact Lindon to ask whether he had a copy of some IQ data that he’d previously analyzed. The phone call was brief, and negative: “No, sorry mate, the data were written down on a piece of bog-paper or something and lost years ago”. It did not take me very long to realize that while his response seemed to lack class expected of an Oxford don, Lindon’s intellectual class belonged to a much higher and hitherto latent dimension.

In conclusion, it was my honor and pleasure to organize this festschrift for Lindon Eaves, my mentor and colleague of many years. The impressive array of his accomplishments briefly described in this paper, and the developments that they subsequently inspired, are only a small sample of those of his scientific output. Other articles in this issue go some way towards a (still incomplete) account of perhaps the greatest behavior geneticist of our generation.

Acknowledgments

The author is grateful for support by NIH grants DA-18673 and DA-26119. He would like to acknowledge the members of the Behavior Genetics Association Executive Committee for help in organizing the festschrift, and Dr. Timothy Bates for hosting the meeting in the Moray House, Edinburgh, Scotland.

Footnotes

these probabilities are readily obtained with software such as Mplus or OpenMx

References

Bauer DB, Curran PJ. Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]
Benyamin B, Wilson V, Whalley LJ, Visscher PM, Deary IJ. Large, consistent estimates of the heritability of cognitive ability in two entire populations of 11-year-old twins from scottish mental surveys of 1932 and 1947. Behav Genet. 2005;35(5):525–534. doi: 10.1007/s10519-005-3556-x. [DOI] [PubMed] [Google Scholar]
Bock RD. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika. 1972;37:29–51. [Google Scholar]
Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S, Bates T, Mehta P, Fox J. Openmx: An open source extended structural equation modeling framework. Psychometrika. 2011;76(2):306–311. doi: 10.1007/s11336-010-9200-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Caspi A, McClay J, Moffitt TE, Mill J, Martin J, Craig IW, Taylor A, Poulton R. Role of genotype in the cycle of violence in maltreated children. Science. 2002;297(5582):851–854. doi: 10.1126/science.1072290. [DOI] [PubMed] [Google Scholar]
Curnow R, Smith C. Multifactorial models for familial diseases in man. J Roy Stat Soc, Series A. 1975;13:131–169. [Google Scholar]
Eaves L, Erkanli A. Markov chain monte carlo approaches to analysis of genetic and environmental components of human developmental change and g × e interaction. Behav Genet. 2003;33(3):279–299. doi: 10.1023/a:1023446524917. [DOI] [PubMed] [Google Scholar]
Eaves LJ. Errors of inference in the detection of major gene effects on psychological test scores. Am J Hum Genet. 1983;35(6):1179–1189. [PMC free article] [PubMed] [Google Scholar]
Eaves LJ. Genotype × environment interaction in psychopathology: Fact or artifact? Twin Res Hum Genet. 2006;9(1):1–8. doi: 10.1375/183242706776403073. [DOI] [PubMed] [Google Scholar]
Eaves LJ, Silberg JL, Hewitt JK, Rutter M, Meyer JM, Neale MC, Pickles A. Analyzing twin resemblance in multisymptom data: genetic applications of a latent class model for symptoms of conduct disorder in juvenile boys. Behav Genet. 1993;23(1):5–19. doi: 10.1007/BF01067550. [DOI] [PubMed] [Google Scholar]
Eaves LJ, Silberg JL, Meyer JM, Maes HH, Simonoff E, Pickles A, Rutter M, Neale MC, Reynolds CA, Erikson MT, Heath AC, Loeber R, Truett KR, Hewitt JK. Genetics and developmental psychopathology: 2. the main effects of genes and environment on behavioral problems in the virginia twin study of adolescent behavioral development. J Child Psychol Psychiatry. 1997;38(8):965–980. doi: 10.1111/j.1469-7610.1997.tb01614.x. [DOI] [PubMed] [Google Scholar]
Elston RC. Major locus analysis for quantitative traits. Am J Hum Genet. 1979;31(6):655–661. [PMC free article] [PubMed] [Google Scholar]
Everitt BS. An introduction to finite mixture distributions. Stat Methods Med Res. 1996;5(2):107–127. doi: 10.1177/096228029600500202. [DOI] [PubMed] [Google Scholar]
Falconer D. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet. 1965;29:51–76. [Google Scholar]
Fulker DW, Cherny SS. An improved multipoint sib-pair analysis of quantitative traits. Behavior Genetics. 1996;26:527–532. doi: 10.1007/BF02359758. [DOI] [PubMed] [Google Scholar]
Go RC, Elston RC, Kaplan EB. Efficiency and robustness of pedigree segregation analysis. Am J Hum Genet. 1978;30(1):28–37. [PMC free article] [PubMed] [Google Scholar]
Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behavior Genetics. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
Hewitt JK, Silberg JL, Rutter M, Simonoff E, Meyer JM, Maes H, Pickles A, Neale MC, Loeber R, Erickson MT, Kendler KS, Heath AC, Truett KR, Reynolds CA, Eaves LJ. Genetics and developmental psychopathology: 1. Phenotypic assessment in the Virginia Twin Study of Adolescent Behavioral Development. J Child Psychol Psychiatry. 1997;38(8):943–963. doi: 10.1111/j.1469-7610.1997.tb01613.x. [DOI] [PubMed] [Google Scholar]
Jinks JL, Fulker DW. Comparison of the biometrical genetical, MAVA, and classical approaches to the analysis of human behavior. Psychological Bulletin. 1970;73:311–349. doi: 10.1037/h0029135. [DOI] [PubMed] [Google Scholar]
Kendler KS, Kessler RC, Heath AC, Neale MC, Eaves LJ. Coping: a genetic epidemiological investigation. Psychol Med. 1991;21(2):337–346. doi: 10.1017/s0033291700020444. [DOI] [PubMed] [Google Scholar]
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996;58(6):1347–1363. [PMC free article] [PubMed] [Google Scholar]
Lazarsfeld PF. The logical and mathematical foundations of latent structure analysis. In: Stouffer SA, et al., editors. The american soldier: Studies in social psychology in world war II. Vol. IV. Princeton University Press; 1950. [Google Scholar]
Lord FM, Novick MR, Birnbaum A. Statistical theories of mental test scores. Oxford, England: Addison-Wesley; 1968. [Google Scholar]
Lubke G, Neale M. Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behav Res. 2008;43(4):592–620. doi: 10.1080/00273170802490673. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lubke GH, Muthen B. Investigating population heterogeneity with factor mixture models. Psychol Methods. 2005a;10(1):21–39. doi: 10.1037/1082-989X.10.1.21. [DOI] [PubMed] [Google Scholar]
Lubke GH, Muthén B. Investigating population heterogeneity with factor mixture models. Psychol Methods. 2005b;10(1):21–39. doi: 10.1037/1082-989X.10.1.21. [DOI] [PubMed] [Google Scholar]
McLachlan GJ, Peel D. Finite mixture models. New York: John Wiley & Sons; 2000. [Google Scholar]
Medland SE, Neale MC. An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet. 2010;18(2):233–239. doi: 10.1038/ejhg.2009.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mehta PD, Neale MC, Flay BR. Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychol Methods. 2004;9(3):301–333. doi: 10.1037/1082-989X.9.3.301. [DOI] [PubMed] [Google Scholar]
Min H-K, Moxley G, Neale MC, Schwartz LB. Effect of sex and haplotype on plasma tryptase levels in healthy adults. J Allergy Clin Immunol. 2004;114(1):48–51. doi: 10.1016/j.jaci.2004.04.008. [DOI] [PubMed] [Google Scholar]
Molenaar D, Dolan CV, Verhelst ND. Testing and modelling non-normality within the one-factor model. Br J Math Stat Psychol. 2010;63(Pt 2):293–317. doi: 10.1348/000711009X456935. [DOI] [PubMed] [Google Scholar]
Molenaar D, Sluis S van der, Boomsma DI, Dolan CV. Detecting specific genotype by environment interactions using marginal maximum likelihood estimation in the classical twin design. Behav Genet. 2012;42(3):483–499. doi: 10.1007/s10519-011-9522-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moore GE. Cramming more components onto integrated circuits. Electronics Magazine. 1965;38(8):4–7. [Google Scholar]
Morin A, Urban J, Adams PD, Foster I, Sali A, Baker D, Sliz P. Research priorities. shining light into black boxes. Science. 2012;336(6078):159–160. doi: 10.1126/science.1218263. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morton NE, MacLean CJ. Analysis of family resemblance. 3. complex segregation of quantitative traits. Am J Hum Genet. 1974;26(4):489–503. [PMC free article] [PubMed] [Google Scholar]
Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics. 1999;55(2):463–469. doi: 10.1111/j.0006-341x.1999.00463.x. [DOI] [PubMed] [Google Scholar]
Neale MC. Mx: Statistical modeling. 4th ed. Richmond VA: Box 980126 MCV; 1997. p. 23298. [Google Scholar]
Neale MC. A finite mixture distribution model for data collected from twins. Twin Res. 2003;6(3):235–239. doi: 10.1375/136905203765693898. [DOI] [PubMed] [Google Scholar]
Neale MC, Cardon LR. Methodology for genetic studies of twins and families. Kluwer Academic Press; 1992. [Google Scholar]
Nichols R, Bilbro W. The diagnosis of twin zygosity. Acta Genet. 1966;16:265–275. doi: 10.1159/000151973. [DOI] [PubMed] [Google Scholar]
Nylund KL, Asparoutiov T, Muthen BO. Deciding on the number of classes in latent class analysis and growth mixture modeling: A monte carlo simulation study. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL. 2007;14(4):535–569. [Google Scholar]
Pearson K. Contributions to the mathematical theory of evolution. II. skew variation in homogeneous material. Philisophical Transactions of the Royal Society of London A. 1894;186:343–414. [Google Scholar]
Rende RD, Plomin R, Vandenberg SG. Who discovered the twin method? Behav Genet. 1990;20(2):277–285. doi: 10.1007/BF01067795. [DOI] [PubMed] [Google Scholar]
Schmitt JE, Mehta PD, Aggen SH, Kubarych TS, Neale MC. Semi-nonparametric methods for detecting latent non-normality: A fusion of latent trait and ordered latent class modeling [Article] MULTIVARIATE BEHAVIORAL RESEARCH. 2006;41(4):427–443. doi: 10.1207/s15327906mbr4104_1. [DOI] [PubMed] [Google Scholar]
Sham PC, Rijsdijk FV, Knight J, Makoff A, North B, Curtis D. Haplotype association analysis of discrete and continuous traits using mixture of regression models. Behav Genet. 2004;34(2):207–214. doi: 10.1023/B:BEGE.0000013734.39266.a3. [DOI] [PubMed] [Google Scholar]
Sluis S van der, Dolan CV, Neale MC, Boomsma DI, Posthuma D. Detecting genotype-environment interaction in monozygotic twin data: comparing the Jinks and Fulker test and a new test based on marginal maximum likelihood estimation. Twin Res Hum Genet. 2006;9(3):377–392. doi: 10.1375/183242706777591218. [DOI] [PubMed] [Google Scholar]
Steiger JH, Shapiro A, Browne MW. On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika. 1985;50:253–264. [Google Scholar]
Weiss V. Empirische untersuchung zu einer hypothese über den autosomal-rezessiven erbgang der mathematisch-technischen begabung. Biologisches Zentralblatt. 1972;91:429–535. [Google Scholar]
Wikipedia [accessed 18-Feb-2013];Commodore PET. 2013 Online. [Google Scholar]
Witkiewitz K, Maisto SA, Donovan DM. A comparison of methods for estimating change in drinking following alcohol treatment. Alcohol Clin Exp Res. 2010;34(12):2116–2125. doi: 10.1111/j.1530-0277.2010.01308.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Bauer DB, Curran PJ. Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]

[R2] Benyamin B, Wilson V, Whalley LJ, Visscher PM, Deary IJ. Large, consistent estimates of the heritability of cognitive ability in two entire populations of 11-year-old twins from scottish mental surveys of 1932 and 1947. Behav Genet. 2005;35(5):525–534. doi: 10.1007/s10519-005-3556-x. [DOI] [PubMed] [Google Scholar]

[R3] Bock RD. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika. 1972;37:29–51. [Google Scholar]

[R4] Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S, Bates T, Mehta P, Fox J. Openmx: An open source extended structural equation modeling framework. Psychometrika. 2011;76(2):306–311. doi: 10.1007/s11336-010-9200-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Caspi A, McClay J, Moffitt TE, Mill J, Martin J, Craig IW, Taylor A, Poulton R. Role of genotype in the cycle of violence in maltreated children. Science. 2002;297(5582):851–854. doi: 10.1126/science.1072290. [DOI] [PubMed] [Google Scholar]

[R6] Curnow R, Smith C. Multifactorial models for familial diseases in man. J Roy Stat Soc, Series A. 1975;13:131–169. [Google Scholar]

[R7] Eaves L, Erkanli A. Markov chain monte carlo approaches to analysis of genetic and environmental components of human developmental change and g × e interaction. Behav Genet. 2003;33(3):279–299. doi: 10.1023/a:1023446524917. [DOI] [PubMed] [Google Scholar]

[R8] Eaves LJ. Errors of inference in the detection of major gene effects on psychological test scores. Am J Hum Genet. 1983;35(6):1179–1189. [PMC free article] [PubMed] [Google Scholar]

[R9] Eaves LJ. Genotype × environment interaction in psychopathology: Fact or artifact? Twin Res Hum Genet. 2006;9(1):1–8. doi: 10.1375/183242706776403073. [DOI] [PubMed] [Google Scholar]

[R10] Eaves LJ, Silberg JL, Hewitt JK, Rutter M, Meyer JM, Neale MC, Pickles A. Analyzing twin resemblance in multisymptom data: genetic applications of a latent class model for symptoms of conduct disorder in juvenile boys. Behav Genet. 1993;23(1):5–19. doi: 10.1007/BF01067550. [DOI] [PubMed] [Google Scholar]

[R11] Eaves LJ, Silberg JL, Meyer JM, Maes HH, Simonoff E, Pickles A, Rutter M, Neale MC, Reynolds CA, Erikson MT, Heath AC, Loeber R, Truett KR, Hewitt JK. Genetics and developmental psychopathology: 2. the main effects of genes and environment on behavioral problems in the virginia twin study of adolescent behavioral development. J Child Psychol Psychiatry. 1997;38(8):965–980. doi: 10.1111/j.1469-7610.1997.tb01614.x. [DOI] [PubMed] [Google Scholar]

[R12] Elston RC. Major locus analysis for quantitative traits. Am J Hum Genet. 1979;31(6):655–661. [PMC free article] [PubMed] [Google Scholar]

[R13] Everitt BS. An introduction to finite mixture distributions. Stat Methods Med Res. 1996;5(2):107–127. doi: 10.1177/096228029600500202. [DOI] [PubMed] [Google Scholar]

[R14] Falconer D. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet. 1965;29:51–76. [Google Scholar]

[R15] Fulker DW, Cherny SS. An improved multipoint sib-pair analysis of quantitative traits. Behavior Genetics. 1996;26:527–532. doi: 10.1007/BF02359758. [DOI] [PubMed] [Google Scholar]

[R16] Go RC, Elston RC, Kaplan EB. Efficiency and robustness of pedigree segregation analysis. Am J Hum Genet. 1978;30(1):28–37. [PMC free article] [PubMed] [Google Scholar]

[R17] Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behavior Genetics. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]

[R18] Hewitt JK, Silberg JL, Rutter M, Simonoff E, Meyer JM, Maes H, Pickles A, Neale MC, Loeber R, Erickson MT, Kendler KS, Heath AC, Truett KR, Reynolds CA, Eaves LJ. Genetics and developmental psychopathology: 1. Phenotypic assessment in the Virginia Twin Study of Adolescent Behavioral Development. J Child Psychol Psychiatry. 1997;38(8):943–963. doi: 10.1111/j.1469-7610.1997.tb01613.x. [DOI] [PubMed] [Google Scholar]

[R19] Jinks JL, Fulker DW. Comparison of the biometrical genetical, MAVA, and classical approaches to the analysis of human behavior. Psychological Bulletin. 1970;73:311–349. doi: 10.1037/h0029135. [DOI] [PubMed] [Google Scholar]

[R20] Kendler KS, Kessler RC, Heath AC, Neale MC, Eaves LJ. Coping: a genetic epidemiological investigation. Psychol Med. 1991;21(2):337–346. doi: 10.1017/s0033291700020444. [DOI] [PubMed] [Google Scholar]

[R21] Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996;58(6):1347–1363. [PMC free article] [PubMed] [Google Scholar]

[R22] Lazarsfeld PF. The logical and mathematical foundations of latent structure analysis. In: Stouffer SA, et al., editors. The american soldier: Studies in social psychology in world war II. Vol. IV. Princeton University Press; 1950. [Google Scholar]

[R23] Lord FM, Novick MR, Birnbaum A. Statistical theories of mental test scores. Oxford, England: Addison-Wesley; 1968. [Google Scholar]

[R24] Lubke G, Neale M. Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behav Res. 2008;43(4):592–620. doi: 10.1080/00273170802490673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Lubke GH, Muthen B. Investigating population heterogeneity with factor mixture models. Psychol Methods. 2005a;10(1):21–39. doi: 10.1037/1082-989X.10.1.21. [DOI] [PubMed] [Google Scholar]

[R26] Lubke GH, Muthén B. Investigating population heterogeneity with factor mixture models. Psychol Methods. 2005b;10(1):21–39. doi: 10.1037/1082-989X.10.1.21. [DOI] [PubMed] [Google Scholar]

[R27] McLachlan GJ, Peel D. Finite mixture models. New York: John Wiley & Sons; 2000. [Google Scholar]

[R28] Medland SE, Neale MC. An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet. 2010;18(2):233–239. doi: 10.1038/ejhg.2009.133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Mehta PD, Neale MC, Flay BR. Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychol Methods. 2004;9(3):301–333. doi: 10.1037/1082-989X.9.3.301. [DOI] [PubMed] [Google Scholar]

[R30] Min H-K, Moxley G, Neale MC, Schwartz LB. Effect of sex and haplotype on plasma tryptase levels in healthy adults. J Allergy Clin Immunol. 2004;114(1):48–51. doi: 10.1016/j.jaci.2004.04.008. [DOI] [PubMed] [Google Scholar]

[R31] Molenaar D, Dolan CV, Verhelst ND. Testing and modelling non-normality within the one-factor model. Br J Math Stat Psychol. 2010;63(Pt 2):293–317. doi: 10.1348/000711009X456935. [DOI] [PubMed] [Google Scholar]

[R32] Molenaar D, Sluis S van der, Boomsma DI, Dolan CV. Detecting specific genotype by environment interactions using marginal maximum likelihood estimation in the classical twin design. Behav Genet. 2012;42(3):483–499. doi: 10.1007/s10519-011-9522-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Moore GE. Cramming more components onto integrated circuits. Electronics Magazine. 1965;38(8):4–7. [Google Scholar]

[R34] Morin A, Urban J, Adams PD, Foster I, Sali A, Baker D, Sliz P. Research priorities. shining light into black boxes. Science. 2012;336(6078):159–160. doi: 10.1126/science.1218263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Morton NE, MacLean CJ. Analysis of family resemblance. 3. complex segregation of quantitative traits. Am J Hum Genet. 1974;26(4):489–503. [PMC free article] [PubMed] [Google Scholar]

[R36] Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics. 1999;55(2):463–469. doi: 10.1111/j.0006-341x.1999.00463.x. [DOI] [PubMed] [Google Scholar]

[R37] Neale MC. Mx: Statistical modeling. 4th ed. Richmond VA: Box 980126 MCV; 1997. p. 23298. [Google Scholar]

[R38] Neale MC. A finite mixture distribution model for data collected from twins. Twin Res. 2003;6(3):235–239. doi: 10.1375/136905203765693898. [DOI] [PubMed] [Google Scholar]

[R39] Neale MC, Cardon LR. Methodology for genetic studies of twins and families. Kluwer Academic Press; 1992. [Google Scholar]

[R40] Nichols R, Bilbro W. The diagnosis of twin zygosity. Acta Genet. 1966;16:265–275. doi: 10.1159/000151973. [DOI] [PubMed] [Google Scholar]

[R41] Nylund KL, Asparoutiov T, Muthen BO. Deciding on the number of classes in latent class analysis and growth mixture modeling: A monte carlo simulation study. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL. 2007;14(4):535–569. [Google Scholar]

[R42] Pearson K. Contributions to the mathematical theory of evolution. II. skew variation in homogeneous material. Philisophical Transactions of the Royal Society of London A. 1894;186:343–414. [Google Scholar]

[R43] Rende RD, Plomin R, Vandenberg SG. Who discovered the twin method? Behav Genet. 1990;20(2):277–285. doi: 10.1007/BF01067795. [DOI] [PubMed] [Google Scholar]

[R44] Schmitt JE, Mehta PD, Aggen SH, Kubarych TS, Neale MC. Semi-nonparametric methods for detecting latent non-normality: A fusion of latent trait and ordered latent class modeling [Article] MULTIVARIATE BEHAVIORAL RESEARCH. 2006;41(4):427–443. doi: 10.1207/s15327906mbr4104_1. [DOI] [PubMed] [Google Scholar]

[R45] Sham PC, Rijsdijk FV, Knight J, Makoff A, North B, Curtis D. Haplotype association analysis of discrete and continuous traits using mixture of regression models. Behav Genet. 2004;34(2):207–214. doi: 10.1023/B:BEGE.0000013734.39266.a3. [DOI] [PubMed] [Google Scholar]

[R46] Sluis S van der, Dolan CV, Neale MC, Boomsma DI, Posthuma D. Detecting genotype-environment interaction in monozygotic twin data: comparing the Jinks and Fulker test and a new test based on marginal maximum likelihood estimation. Twin Res Hum Genet. 2006;9(3):377–392. doi: 10.1375/183242706777591218. [DOI] [PubMed] [Google Scholar]

[R47] Steiger JH, Shapiro A, Browne MW. On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika. 1985;50:253–264. [Google Scholar]

[R48] Weiss V. Empirische untersuchung zu einer hypothese über den autosomal-rezessiven erbgang der mathematisch-technischen begabung. Biologisches Zentralblatt. 1972;91:429–535. [Google Scholar]

[R49] Wikipedia [accessed 18-Feb-2013];Commodore PET. 2013 Online. [Google Scholar]

[R50] Witkiewitz K, Maisto SA, Donovan DM. A comparison of methods for estimating change in drinking following alcohol treatment. Alcohol Clin Exp Res. 2010;34(12):2116–2125. doi: 10.1111/j.1530-0277.2010.01308.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Latent Classiness and Other Mixtures

Michael C Neale

Abstract

Introduction

Early mixture distributions: Major loci

Marginal maximum likelihood: A mixture distribution

Bayes’ Theorem and marginal maximum likelihood

Latent class analysis with data from twins

Multivariate linkage analysis

Zygosity and other examples of ignorance

Genotype by environment interaction

Summary

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Latent Classiness and Other Mixtures

Michael C Neale

Abstract

Introduction

Early mixture distributions: Major loci

Marginal maximum likelihood: A mixture distribution

Bayes’ Theorem and marginal maximum likelihood

Latent class analysis with data from twins

Multivariate linkage analysis

Zygosity and other examples of ignorance

Genotype by environment interaction

Summary

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases