Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jan 1.
Published in final edited form as: Stat Sci. 2011 Jan 1;26(1):116–129. doi: 10.1214/11-STS353

Statistical Analysis in Genetic Studies of Mental Illnesses

Heping Zhang 1
PMCID: PMC3169093  NIHMSID: NIHMS318224  PMID: 21909187

Abstract

Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.

Keywords: Comorbidity, Covariate adjusted association test, FBAT, Kendall’s tau, Multiple traits, Ordinal traits

1 Introduction

Mental illnesses affect the health and well-being of all popluations and all ages. Schizophrenia–a chronic, severe, and disabling brain disorder–is one of these mental illnesses, affecting about 1.1 percent of the U.S. population age 18 and older in a given year. People with schizophrenia sometimes hear voices others dont hear, believe that others are broadcasting their thoughts to the world, or become convinced that others are plotting to harm them. These experiences can make them fearful and withdrawn and cause difficulties when these people try to have relationships with others (http://www.nimh.nih.gov). Emil Kraepelin (1856-1926) described “Dementia Praecox” as an inherited disorder in his influential “Textbook of Psychiatry” (1899). Dementia Praecox, coined “schizophrenia,” was first used by Arnold Pick (1851-1924)–a professor of psychiatry at the German branch of Charles University in Prague–to describe a patient with a psychotic disorder resembling hebephrenia in 1891.

Nearly a century ago, Cannon and Rosanoff (1911) made an attempt to understand whether there are any forms of nervous and mental diseases that are transmitted from generation to generation in concordance with Mendelian laws. They examined the families of 11 neuropathetic patients, which are now referred to as probands in pedigrees. Using Mendelian laws as their theoretical expectation, they concluded that the neuropathetic make-up is recessive to normal. Although the report was indeed “preliminary,” a few things are noteworthy. First, they noted that “any form of insanity or even all the forms of hereditary insanity do not constitute an independent hereditary character.” This raised an early sign of the complexity associated with studying mental disorders compared to the characterization of the disorders and their comorbidity. Here, comorbidity refers to more than one disease condition in the same patient. Second, they remarked “should larger accumulations of such data in the future give similar results, we shall be able” to confirm their result. The requirement for more samples and replication is another challenge in studies of complex diseases. Lastly, but not the least, while they said “let us test,…, the hypothesis …” they did not mean a statistical test. However, the idea of the χ2-test is evident.

Despite this early work, it wasn’t until 1960s that the researchers began to use scientifically rigorous designs and methods to study the inheritance of mental illnesses. For example, the key idea in adoption studies lies in the belief that any links between an adopted child and the biological parents are attributable to genetics, and any links between that child and adoptive parents can be attributed to environment (Plomin et al., 1997). This enables us to separate the confounding environment (i.e., a family) from genetic contribution. Consequently, there are two strategies in adoption studies. One approach compares the risk of developing schizophrenia in the adopted children of schizophrenic parents to the risk of adopted children whose parents do not have schizophrenia. Several studies including Heston (1966), Rosenthal (1972), and Tienari (1991) used this approach to study schizophrenia. Each study found an elevated risk in adopted-away children of schizophrenic parents, supporting the role of genetics in the transmission of schizophrenia. The origin of this approach is the schizophrenic parents. Another approach backtracks from adopted children who have developed schizophrenia and compares the risks of schizophrenia in their adpotive and biological families. Kety et al. (1978) and others found that the risk was significantly higher in the biological relatives than in the adoptive families, again underscoring the role of genetics as a risk factor.

While these schizophrenia adoption studies are influential in understanding the role of genetics in mental disorders, the majority of the genetic factors associated with mental disorders is based on family and twin studies. By comparing the concordance in the risk between identical (monozygotic) and fraternal (dizygotic) twins, twin studies arguably provide the most compelling results about genetic and environmental effects. For example, the concordance in monozygotic twins for Touretteś syndrome, a complex disorder characterized by repetitive, sudden, and involuntary movements or noises called tics, was reported to be about 50% whereas it is less than 10% in dizygotic twins.

Twin studies are most helpful in demonstrating the magnitude of genetic effect, but they do not provide insight into the inheritance pattern of a condition. Thus, family studies can offer information that twin studies cannot. Thus, Cannon and Rosanoff (1911) employed a small scale, simple family study. Using the Mendelian laws, not only might we find evidence of genetics, but also infer the mode of transmission, as Cannon and Rosanoff (1911) concluded for the heredity of insanity.

Although twin and family studies continue to be useful for understanding the genetics of complex diseases, different studies are needed to locate a specific gene on a chromosome that may underlie the disease. Gene mapping in humans through linkage analysis emerged in 1930s, but it was Morton (1955) that laid the foundation for the methodology. It was only during the 1970s and 1980s, when the Elston-Steward (1971) was developed and implemented (Ott 1974), that the method thrived as a common tool of genetic studies. These initial and subsequent developments allowed for linkage analyses of multiple markers simultaneously. In light of the sheer number of genes and that we do not know which specific gene we are looking for, we typically genotype 300 to 400 “landmarks” that cover the 22 pairs of autosomes and the X chromosome. By inferring the transmission patterns of these markers, then linking them to the disease status, we can obtain information about the most probable region where the gene of interest resides.

While linkage studies have had some successes (e.g., BRCA1), they have generated many more premature excitements. In the late 1980s, two particular studies attracted significant public attention after they reported that bipolar affective disorders were linked to DNA markers on chromosome 11, and that a susceptibility locus for schizophrenia was located on chromosome 5. Unfortunately, these findings were not replicated. Replications in genetic studies of mental disorders do not come easily. For example, Abelson et al. (2005) identified mutations involving the SLITRK1 gene (13q31.1) in a small number of people with Touretteś syndrome. However, most people with Touretteś syndrome do not have a mutation in the SLITRK1 gene. Because the mutations were reported in so few people with this condition, the association of the SLITRK1 gene with this disorder could not be confirmed. In fact, Scharf et al. (2008) reported a lack of the association between SLITRK1var321 and Touretteś syndrome in a large family-based sample.

Various reasons have been suggested to explain the difficulties detouring progress in genetic studies using linkage analysis. A key concept underlying linkage analysis is the recombination fraction, which reflects the distance between any two markers, such as a DNA marker and the disease locus. There may be limited information in the data, however, diminishing the power of the linkage study. Furthermore, complex diseases are polygenic, involving multiple genes (Carter and Chung, 1980). Linkage analyses, however, are generally under the assumption of one major gene. Additionally, heterogeneity in the diagnosis and comorbidity of mental illnesses make linkage analysis considerably more difficult, if even possible at all.

Many investigators have adopted association analyses to take advantage of the advent of high throughput genotyping technologies. Recent efforts have identified genes that contribute to a number of complex human traits using the ultra-dense genetic markers (Arking et al., 2006; Klein et al., 2005; Duerr et al., 2006; Chen et al., 2007). Trios (one affected off-spring and two parents) have been an effective design for association studies, particularly with the development of the elegant transmission/disequilibrium test (TDT) (Spielman, McGinnis and Ewens, 1993). The central idea of this test is that each affected child serves as his or her own matched case and control. This acts to control for all potential confounding issues and examines alleles that both are and are not transmitted from the parents. In the absence of association between the affective status and the gene, the distributions of the transmitted and non-transmitted alleles are expected to be the same. Deviations in distribution as evaluated by a χ2-square test indicate the existence of association. Trios are the simplest example of nuclear family, but when other siblings are available, the trio design is not cost-effective. As a result, family-based association tests (FBAT) including sibships (Spielman and Ewens, 1998; Horvath and Laird, 1998; Knapp, 1999), nuclear families (Weinberg, 1999; Lunetta et al., 2000; Rabinowitz and Laird, 2000), and general pedigrees (Martin, Monks, Warren and Kaplan, 2000) have been developed.

Another restriction in the use of trios is the requirement of defining the affective status of a disease. Consequently, association tests have been proposed for quantitative traits (Allison, 1997; Rabinowitz, 1997), traits with distribution belonging to an exponential family (Liu, Tritchler and Bull, 2002), ordinal traits (Zhang, Wang and Ye, 2006; Wang, Ye and Zhang, 2006), and multiple traits (Lange et al. 2003; Zhang et al. 2010).

Since the early success in identifying the complement factor H polymorphism in age-related macular degeneration (Klein et al., 2005), case-control association studies have intensified, and many genetic variants have been identified and cateloged (Hindorff et al. 2009). Despite the enormous investment, the intense attention to the genetics of diseases, the rapid improvement in technology, and the increasingly large sample sizes in many studies, it remains challenging to identify disease genes, especially those underlying mental illnesses. Some of the common genetic variants that have been identified for complex diseases only account for a small portion of the genetic risk, which may vary across populations (Goldstein 2009). For example, Kopp et al. (2008) and Kao et al. (2008) identified several variations in the MYH9 gene as major contributors to excess risk of kidney disease among African-Americans. They found that 60 percent of African-Americans carry the risk variants as opposed to 4 percent of white Americans.

Technology will continue to improve and the amount of genetic data will increase. The purpose of this article is to review some of the progress from a statistical perspective and discuss some of the potential challenges. Obviously, it would take volumes or series to do justice to all of the work in statistical genetics. Instead of taking on that impossible task, this article is oriented toward the publications directly related to my own recent work.

2 Methods

Since 1952, the American Psychiatric Association has published four editions of the Diagnostic and Statistical Manual of Mental Disorders (DSM) and plans to release its fifth edition in 2013. While widely used, the use and development of the DSM has not gone without controversy and criticism. Unlike diseases for which the diagnoses are well accepted by physicians and patients such as cancer, the diagnosis of mental disorders must reflect biological factors (e.g., gender and racial disparities), non-biological factors such as culture that are not specific to one person, and it also must reflect the natural variation within the same person.

2.1 Ordinal Traits

It is clear from the above discussion that a simple dichotomous diagnosis (e.g., yes or no), or a well-distributed continuous trait is unlikely to characterize the state of mental disorders. In fact, the questions used in the diagnosis of mental disorders, such as DSM-IV, are usually posed in terms of severity or frequency, and hence in an ordinal scale.

Statistical methods for genetic analysis are well established for both quantitative (continuous) and binary traits (see e.g., Blackwelder and Elston 1985; Goldgar 1990; Schork 1993; Amos 1994; Risch and Zhang 1995; Kruglyak et al. 1996; Blangero and Almasy 1997; Ott 1999). While there has been some progress in the analysis of ordinal traits (e.g., Heath et al. 2002; Steinke et al. 2003; Vergne et al. 2003; Zhang et al. 2003; Feng et al. 2004; Zhang et al. 2010), especially in plant science (Rao and Xu 1998; Xu and Xu 2006), not sufficient enough attention has been paid to address the unique challenges of analyzing ordinal traits. Some researchers have recognized that it is difficult to conduct genetic analyses of ordinal traits because such traits cannot be directly characterized by a linear function of genetic and environmental effects (Rao and Xu 1998). To fill in this methodological gap, we have made a systematic effort to develop statistical methods for segregation analysis (Zhang et al. 2003), linkage analysis (Feng et al. 2004), and association analysis (Zhang et al. 2006) of ordinal traits (for family studies and case-control studies).

2.1.1 Analysis of Family Data

Long before the era of genomics, researchers collected data in families, also called pedigrees as illustrated in Figure 1. Although the ascertainment process for families varies, Figure 1 depicts a representative three-generation pedigree. The proband is the first person who enters into the study according to defined inclusion and exclusion criteria: such criteria are related to the disease of interest. Other members of the proband’s family are included and directly or indirectly assessed, depending on the circumstance. The key idea in analyzing family data is that if a gene is a major driving force behind a disease, a trace in the concordance of diseases in family members would reflect the transmission pattern of a gene under the Mendelian laws. This is the fundamental concept that Cannon and Rosanoff (1911) employed. This type of analysis is referred to as segregation analysis.

Figure 1.

Figure 1

A three-generation pedigree.

The Elston-Stewart (1971) algorithm set up the quintessential framework to analyze data from general pedigrees through a technique called peeling. The main complication in analyzing pedigree data is the complex relationship among family members, making it difficult to express the likelihood function in a easily-computing form. The peeling algorithm makes use of the conditional independence embedded in the pedigree resulting from the Mendelian laws, and so peels off the complete likelihood function into smaller pieces before putting them back together.

Other methods have been relatively recently develop using the concept of latent random variables (Hopper, 1989; Babiker and Cuzick, 1994; Li and Thompson, 1997; Siegmund and McKnight 1998; Zhang and Merikangas, 2000), which are closely related to the classic ousiotype models of Cannings et al. (1978) in pedigree analysis. The basic idea is to use latent variables to represent the contribution of unobserved factors including a major gene, residual genetic factors, and common environmental factors. As discussed by Zhang and Merikangas (2000), the computation involving pedigrees is similar to the peeling algorithm. Advantages of using latent variable based models are that the interactions between underlying genetic effects and the observed covariates (e.g., demographic variables) can be considered. Additionally, more relevant to this article, we can accommodate ordinal traits in the latent-variable framework.

2.1.2 A Latent Variable Model

We follow the notation of Zhang and Merikangas (2000) and Zhang et al. (2003). Consider a trait, Y, that takes an ordinal value of 0, 1, … , K. Let x be a p-vector of covariates that is also available for each study subject. Three types of latent random variables U1i, U2i, and U3i are introduced within family i to represent, respectively, (a) common, unmeasured environmental factors; (b) genetic susceptibility of the family founders (a founder refers to a subject whose parents are not a part of the observed pedigree, for example, father, mother, and spouse in Figure 1); and (c) the transmission of susceptibility genes from a parent to an offspring.

The concept of latent variables is straightforward, but the interesting and difficult part lies in the specification of their distributions. They need to be interpretable and convenient. The following are the assumptions that we found useful.

  • U1i follows Bernoulli distributions P{U1i=1}=1P{U1i=0}=θ1, where θ1 is an unknown parameter.

  • U2i=(U2,1i,U2,2i,,U2,2ni1i,U2,2nii), where ni is the size of pedigree i. Here, P{U2,2j1i=1}=1P{U2,2ji=0}=θ2 when U2,2j1i and U2,2ji are the U2i-variables of a founder.

  • U3i=(U3,1i,,U3,sii). According to the Mendelian laws, P{U3,ji=1}=P{U3,ji=0}=12,j=1,,si, and si is the number of parent-offspring pairs in family i. U3i facilitates the transmission of U2i-variables from the founders to the offspring. For example, if a parent of subject j have U2i-variables, U2,2k1i and U2,2ki, and the U3i-variable for this parent-offspring pair is U3,li, then one of subject j’s U2i-variables is U2,2j1i=U2,2k1iU3,li+U2,2ki(1U3,li).

  • All latent variables are independent.

U1i is a simple “switch” indicating the presence or absence of a shared environment factor within family i. U2i is assigned independently to each of founders who are the source for any gene to enter into a family, and thus mimics the transmission of a single major susceptibility locus with alleles A and a of frequencies θ2 and 1 − θ2, respectively.

Conditional on all of the latent variables, denoted by Ui, within family i, the probability distribution for member j is assumed to be

P{Yjik|Ui}=exp(xjiβ+αk+ajiγ)1+exp(xjiβ+αk+ajiγ),k=0,,K1, (2.1)

where aji=(U1i,U2,2j1i+U2,2ji,U2,2j1iU2,2ji)T, and β and γ are p- and 3-vectors of parameters. The αk is the trait level dependent intercept, k = 0, … , K − 1.

As Zhang et al. (2003) pointed out, the β parameters measure the strength of association between the trait and the covariates, conditional on the latent variables. The γ parameters indicate the familial and genetic contributions to the trait. The mode of inheritance can be inferred from γ. For example, γ2 = 0 and γ3 ≠ 0, suggests a recessive effect.

The likelihood function can be derived from (2.1). Due to the presence of latent variables, the EM algorithm (Dempster, Laird, and Rubin, 1977) is the most convenient choice for parameter estimation (Guo and Thompson 1992, Zhang and Merikangas 2000, Zhang et al. 2003). Although Zhang and Merikangas (2000) and Zhang et al. (2003) presented an effective solution (e.g., a modified likelihood), we should note that the lack of concavity in the likelihood function makes it a challenging task to find the maximum likelihood estimates of the model parameters. In addition, the θ’s and γ’s are not fully identifiable. The identifiability issue not only causes computational problems, but also presents theoretical challenges in statistical inference. Another important, yet under-studied, issue is the validation of the assumptions on the distributions of the latent variables.

But, how useful is the latent variable model (2.1)? First, it provides a regression framework to assess familial aggregation and genetic contribution, and possibly interactions between measured covariates and latent factors. Using data from a family study of substance use (Merikangas et al., 1998), Zhang and Merikangas (2000) was able to present extremely significant evidence of familial aggregation p-value < 10−9 for alcohol dependence. This study additionally demonstrated that transmission does not follow a major locus pattern. In retrospect, their findings predicted the difficulty of identifying major genes associated with alcoholism. In addition, Zhang and Merikangas (2000) presented simulation examples to delineate when the absence of latent variables in (2.1) affects the estimates of the effects by the measured covariates. For example, hypothetically, if the greater presence of females in a family has an impact on the well-being of the family, ignoring the familial latent variables is likely to result in a biased estimate of the sex difference.

Not only is important to include the latent factors, but also it is important to adjust for covariates. To further illustrate this point, Zhang et al. (2003) reported the following simulation. Ten thousand data sets were generated from model (2.1) with θ1 = 0.3, β chosen from 0, 1, 5, or 10, γ1 from 0, 1, or 2, α0 = −1 and α1 = 1. To focus on the difference of having or not having covariates, they set γ2 = γ3 = 0. Each data set consists of 200 families with 7 family members (similar to Figure 1). One covariate x was generated as follows. For family i, U1i were generated according to whether a random number ri1 from the uniform(0,1) was greater than 0.3 or not. For member j in family i, an independent random number rij2 from the uniform(0,1) was generated. Then, xij = 0.9rij2 + 0.2ri1.

To evaluate the performance of the test statistic, the covariate was deliberately ignored in the test. When β = 0, the covariate played no role in the data generating process. The row corresponding to β = 0 in Table 1 displays the p-value (the column corresponding to γ1 = 0) and the power for two values of γ1 (1 or 2).

Table 1.

The probability estimates of rejecting γ1 = 0 at the significance level of 0.05. The covariate is omitted from the testing despite the fact that its coefficient β may not be zero.

γ1 = 0 γ1 = 1 γ1 = 2
β = 0 .0494 .9503 1.0
β = 1 .0534 .9843 1.0
β = 5 .1667 .9971 1.0
β = 10 .3828 .9890 1.0

When β ≠ 0, the covariate plays a role in the data generating model. The data in Table 1 reveal the consequence of ignoring the covariate, which is more severe when the the effect of the covariate is greater.

2.1.3 Linkage Analysis

While linkage analysis has a long history, it only became a common practice after the availability of several convenient computing programs (Ott 1974; Kruglyak et al. 1996; Almasy and Blangero 1998). For statisticians, some of the common terminologies in linkage analysis are puzzling, including the so-called LOD-score method and nonparametric method.

Morton (1955) first introduced the term “LOD-score.” LOD stands for “the logarithm (base 10) of odds.” The “odds” is a probability ratio, or likelihood ratio, of the probability under an alternative hypothesis to the probability under the null hypothesis. The LOD-score method is essentially a log-likelihood ratio test with two fundamental differences: (a) the use of the base 10 logarithm versus the natural logarithm; (b) the log-likelihood ratio statistic has a multiplier of 2 conforming to a χ2 distribution under certain regularity conditions.

Specifically, the LOD-score is the log(base 10)-ratio of the likelihood when the recombination fraction is less than 1/2 (i.e., two loci are not on the same chromosome, or called unlinked), to the likelihood when the recombination fraction is 1/2 (no linkage). The recombination fraction is the frequency that a chromosomal crossover occurs between two loci (or genes) during meiosis; 1% of combination frequency is termed the distance of one centimorgan (cM) in a genetic linkage map. Because the LOD-score is in base 10, a score of 3 indicates 1000 to 1 odds in favor of the linkage, which is the conventional threshold for declaring the evidence for linkage. If we convert a LOD-score of 3 into the standard log-likelihood ratio statistic, it yields a p-value of 2 × 10−4 under χ12. By Bonnferoni correction, it corresponds a genomewide p-value of 0.05 for 250 markers. This number is in the range for the number of microsatellites used in typical linkage studies.

In order to compute the LOD-score, we first need a number of parameters that determine the likelihood for a given recombination fraction. Then use the maximum likelihood over the recombination fraction for the likelihood under the alternative hypothesis. The parameters that are required include the mode of inheritance, penetrance, and disease allele frequency. These parameters are generally unknown and difficult to estimate for complex diseases including mental illnesses. For example, using segregation analysis (see Section 2.1.1) Pauls and Leckman (1986) examined specific genetic hypotheses about the mode of transmission of Gilles de la Tourette’s syndrome, by performing segregation analyses in 30 nuclear families (two-generation pedigrees). They concluded that Tourette’s syndrome is inherited as an autosomal dominant trait (one copy of the abnormal allele is sufficient to cause the disease). The penetrace (the probability of having the disease for a given genotype) was reported at 0.71 in males and 1.0 in females with at least one abnormal allele. After several decades of research, no major genetic variant has been identified for Tourette’s syndrome, and most likely this syndrome involves multiple genes, interacting with environmental factors. This reality makes it difficult to infer the mode of inheritance, penetrance, and disease allele frequency, and conceptually, this may not make sense for complex diseases (non-Mendelian inheritance).

This difficulty is somewhat alleviated since the LOD-score method has been found to work reasonably well (e.g., Abreu et al. 1999) under various parameter settings. There have been some efforts to improve the robustness of the method (Gastwirth (1966, 1985; Whittemore 1996). See Zheng et al. (2009) for a thorough review. Existing methods do not extend to the case of ordinal traits. The effectiveness of the robust methods remains to be studied. Naturally, non-parametric linkage methods have been developed to avoid specification of the genetic model parameters. In statistics, “nonparametric” methods typically refers to distribution-free methods such as rank-based tests and methods based on the empirical distribution. In linkage analysis, however, “nonparametric” does not mean ”distribution free”, but instead refers to the replacement of true genetic model parameters with the parameters of inheritance of markers, hypothesized to be close to the disease locus. Thus, with nonparametric linkage methods, we still need to compute the likelihood. Two core algorithms are used to compute the likelihood: the Elston-Steward algorithm (1971) and the Lander-Green (1987) algorithm. As previously discussed, the Elston-Steward algorithm (1971) is a peeling algorithm that makes the computation in a large pedigree feasible by splitting it into small pieces. This algorithm was implemented in early versions of linkage analysis programs (e.g., LIPED and LINKAGE); computational time increased linearly in family size, but exponentially with the number of loci. More recent programs (e.g., GENEHUNTER) use the Lander-Green (1987) algorithm that has first order complexity in the number of loci, but unfortunately exponentially in the family size. Although Markov chain Monte-Carlo methods have been used to accommodate linkage analysis of large families and a large number of markers (Guo and Thompson 1992), in practice, one may have to break large pedigrees apart in order to run programs such as GENEHUNTER.

We should note that there had not been a linkage analysis program to handle ordinal traits until the release of LOT (Zhang et al. 2008). Typically, the methods for linkage analysis can be divided into two main steps; only the second step involves the trait (Kruglyak et al. 1996). The first step infers how genetic information travels in a family as represented by the so-called “inheritance vector.”

We will use the pedigree in Figure 1 to illustrate this concept. The two parents and spouse are the founders of the family, meaning that their parents are not in the current pedigree. The four siblings and the child are nonfounders. The inheritance pattern at marker locus t is completely described by an inheritance vector υ(t) = (υ1, υ2, υ3, υ4, … , υ9, υ10)′. In other words, we devote two elements for every nonfounder. The founders are not included because they are the sources of the genes in the family and the inheritance vector is conditional on their genes. The paired elements describe the outcomes of the paternal and maternal meioses transmitted to the nonfounders. Specifically, υ2j−1 = 1 or 2 according to whether the grand paternal or grand maternal allele is transmitted in the paternal meiosis to the jth nonfounder. υ2j carries the similar information for the corresponding maternal meiosis, namely, υ2j = 3 or 4 according to whether the grand paternal or grand maternal allele was transmitted in the maternal meiosis to the jth nonfounder.

In practice, the genetic markers do not always allow us to determine the true inheritance vector. In this case, the inheritance distribution is the conditional probability distribution over the possible inheritance vectors that conform with the alleles observed at t, which we denote by p{υ(t) = w} for all inheritance vectors wV; here V is the set of all possible inheritance vectors. In the absence of any genotypic information, all inheritance vectors are equally likely according to Mendel’s first law; the probability distribution is uniform.

For segregation analysis, we employed latent variables to reflect the “imaginative” genetic effects in (2.1). In linkage analysis, we have genetic markers that flow through the inheritance vector. Thus, we can still use (2.1) for linkage analysis except that aji should be (U1i,U2,υ2j1i+U2,υ2ji). On one hand, we have a reduced number of latent variables. On the other hand, many of the latent variables depend on each other through the inheritance vectors. The computation of the likelihood would be summed over all inheritance vectors w in V, in addition to the probability space of the remaining independent latent variables. Because of this connection and distinction, the challenges in the linkage analysis of ordinal trait are, to a great extent, similar to those in segregation analysis of ordinal traits. For example the asymptotic mixture of χ2 distributions and the need to introduce the penalized likelihood (Liang and Rathouz 1999; Zhang et al. 2003).

2.1.4 Association test

As discussed above, linkage analysis focuses on testing the position of a marker, although it has been difficult to replicate findings in linkage studies of mental disorders. An association analysis, however, tests whether a genetic variant, including particular allele or genotype of a marker and a haplotype in several markers, is associated with a trait. Some study cohorts recruited for linkage studies have been re-genotyped for genomewide association analyses. For binary or quantitative traits, many methods have been developed and implemented. Two commonly used programs are PLINK (Purcell et al. 2007) and FBAT (Rabinowitz and Laird 2000). To analyze an ordinal trait, Zhang et al. (2006) introduced the following the proportional odds model,

logit{P(yijk|Gij)}=αk+βcij, (2.2)

where α0, … , αK−1 are non-descending level parameters, β is the genetic effect. The genetic factor cij can be chosen to reflect the underlying mode of inheritance such as the number of the risk allele. Under model (2.2), the null hypothesis is H0 : β = 0. The score statistic is

S=i,j[R+(yij)R(yij)]Aij, (2.3)

where R+(yij) and R(yij) are the counts of offspring in the entire sample whose trait values are greater or less than yij, respectively, and Aij is the number of copies of transmitted alleles at the marker locus. Thus, Zhang et al. (2006) proposed the following O-TDT test based on the score statistic

[SE(S|Y)]2Var(S|Y),

which follows a χ12-distribution asymptotically. For a case-control study, R+(yij) and R(yij) are the numbers of subjects whose trait values are greater or less than yij, respectively.

If we rewrite the statistic in (2.3) in a general form as Σij wij Aij, this yields the classic TDT when wij = 1 and the QTDT (Rabinowitz 1997) when wij = yij − ȳ, where ȳ is the average of all yij’s. In other words, all of these tests are a weighted function of the number of transmitted alleles at the marker locus, and the choice of the weights depends on the property of the trait. With this observation, after the proper weights are computed, the existing FBAT software can be used to test the association between any trait and alleles at a marker locus.

In the following, we describe a unified method to choose weights for any kind of trait. It is straightforward to categorize a quantitative trait into any reasonable number of categories (such as deciles) and induce an ordinal scaled trait. This would allow the use of the O-TDT for a quantitative trait. In their simulation studies, Zhang et al. (2006) demonstrated that this strategy has comparable power to the QTDT for quantitative traits. This is due to the fact that the number of categories is enough to capture most of the information in the data (e.g. following Cochran’s rule, Cochran 1977). The advantage is that the ordinal scaled test is not affected by the non-normal distribution of a quantitative trait, and so, the unified approach is robust.

One limitation of the test proposed by Zhang et al. (2006) is that it does not adjust for covariates. Environmental factors or covariates, such as gender and age, may confound the association of interest. In a subsequent work, Wang et al. (2006) generalized model (2.2) to include covariates as follows

logit{P(yijk|Gij),zij}=αk+βcij+δzij, (2.4)

where zij denotes the covariates and δ is the vector of the corresponding coefficients. Consequently, the score statistic becomes

S=i,j[γ^(yij,zij)γ^(yij1,zij)]Aij, (2.5)

where

γ^(k,z)=exp(α^k+δ^zij)1+exp(α^k+δ^zij),

which is the estimated probability of having a trait value no greater than k. Thus, the weight function in (2.5) is the difference between the probability of having a trait value greater than yij and the probability of having a trait value less than yij. Not surprisingly, this is in essence the same as the weight function in (2.3) where we used counts instead of frequency (or probability).

It is important to note that association analysis does not directly equate to a causal relationship. In well-designed genetic association studies, an observed association is expected to result from either a causal functional variant of a gene, or the linkage disequilibrium between the marker and a susceptibility gene. In population based case-control studies, there are typically attempts to match cases and controls by important demographic and/or baseline information. It is not wise to over-match subjects. Alternatively, we can collect potentially important environmental variables and consider them in the association analysis. We can also use principal component analysis on the genotypes to explore whether there are “clusters” in the study cohorts that are not appropriately reflected in the environmental variables. In family based studies, the association tends to be conditional on parental genotypes and all phenotypes.

2.1.5 Unique Challenges in Analyzing Ordinal Traits

Understanding the genetic mechanisms for complex diseases is challenging regardless of whether we analyze binary, ordinal, or continuous traits. Any challenges that exist for analyzing binary and continuous traits remain for ordinal traits. What are the unique challenges in analyzing ordinal traits? The key difference is that there is not a simple distribution function for ordinal traits. For continuous traits, the assumption is that the traits can somehow be treated under normality, by transformation if needed. For binary traits, through a link function (e.g., logit) we only need to deal with a Bernoulli distribution. However, for ordinal traits, the two typical approaches are (a) to assume a reliability variable or a continuous latent variable or (b) to assume a proportional odds model as we presented above. The first challenge is in the estimation. The likelihood function is complicated, and based on the numerical results, it has multiple local maxima. In addition, due to identifiability (or near identifiability), the likelihood function may be relatively flat. Combinations of the EM and other algorithms can provide practical solutions, but finding a more efficient algorithm is an open problem.

The second challenge is in the inference. When latent variables or mixture distributions are used, some of the commonly assumed regularity conditions do not hold. One solution is to use a penalized likelihood function (Zhang et al. 2003) that prevents the parameters from being near the singularity points.

Finally, model diagnostics are difficult. For example, how do we know the latent variable-based model or the proportional odds model provides an adequate fit to the data? Although the models and methods presented above do not address this and other questions, they provide a foundation for further research and improvement.

2.2 Comorbidity

The methods described above only deal with a single trait. However, comorbidity is the rule rather than the exception in studies of mental and behavioral disorders. For example, a patient may suffer from both anxiety and depression (Li and Burmeister, 2009), and the same patient may also be addicted to nicotine, alcohol, or other substances (Merikangas et al., 1998; True et al., 1999). From a data analysis perspective, we need to consider how important it is to accommodate multiple diseases/traits. In a real data example, Chen et al. (2011) analyzed a data set from the Study of Addiction: Genetics and Environment (SAGE). By simply considering addiction to at least two of the six substances (addiction to nicotine, alcohol, marijuana, cocaine, opiates or other drugs), we were available to identify the PKNOX2 gene that reached genomewide significance level among European-origin females. Interestingly, the PKNOX2 gene has been previously identified as one of the cis-regulated genes for alcohol addiction in mice (Mulligan et al. 2006). To further delineate the benefit of considering multivariate traits, Zhu and Zhang (2009) conducted comprehensive simulation studies, considered the correlations of 0.2 and -0.2 among three quantitative traits, and demonstrated that testing correlated traits jointly is more powerful than testing a single trait at a time. Using generalized estimation equation, Lange et al. (2003) developed a family-based association test for multivariate quantitative traits (FBAT-GEE). Recently, Zhang et al. (2010) constructed proposed a nonparametric test based on the generalized Kendall’s tau to accommodate any combination of dichotomous, ordinal, and quantitative traits.

2.2.1 Kendall’s Tau

Kendall’s τ is a rank-based correlation between two variables. It contracts the probability of observing the two variables in the same order in two observations with the probability of observing the two variables in the opposite order. Specifically, for a sample of n observations (X1, Y1), ⋯, (Xn, Yn), two observations (Xi, Yi) and (Xj, Yj) are called concordant if (Xi − Xj)(Yi − Yj) > 0 and discordant if (Xi − Xj)(Yi − Yj) < 0. Then Kendall’s τ is based on the difference between the numbers of concordant pairs and discordant pairs.

We introduce a kernel function,

ϕ((Xi,Yi),(Xj,Yj))=sign{(XiXj)(YiYj)}={1,if(XiXj)(YiYj)>0,1,if(XiXj)(YiYj)<0,0,if(XiXj)(YiYj)=0,

and define a U-statistic

U=(n2)1i<jϕ((Xi,Yi),(Xj,Yj)). (2.6)

Then, Kendall’s τ is

τ=UVar0(U), (2.7)

where Var0(U) is the variance of U under the null hypothesis of no correlation between X and Y, and equal to n(n − 1)(2n + 5)/18 if X and Y are continuous variables (Hollander and Wolfe, 1999).

2.2.2 Generalized Kendall’s Tau

To test the association between genetic markers and comorbidity, Zhang et al. (2010) generalized Kendall’s tau as follows. For individuals i and j, let Ti and Tj be their vectors of traits respectively. Then, a trait kernel is defined as

Fij=(f1(Ti(1)Tj(1)),,fp(Ti(p)Tj(p))),

where function fk(·) is the identity function for a quantitative or binary trait (Rabinowitz, 1997), or the sign function for an ordinal trait (Zhang et al., 2006).

Also, recall that, as in Section 2.1.4, c is the number of any chosen allele for marker genotype and let Ci refer to the C for the ith subject. Then, Zhang et al. (2010) defined a marker kernel as

Dij=cicj.

Their U-statistic is defined as

U=(n2)1i<jDijFij. (2.8)

The association test statistic, or generalized Kendall’s tau, is UVar01(U)U, where Var0(U) is the variance of U under the null hypothesis that there is no association between marker alleles and any linked locus that influences the trait T. The test statistic follows an asymptotic χ2 distribution under the null hypothesis.

Obviously, the statistic in (2.8) does not incorporate covariate effects. This is relatively straightforward for a single trait as was done in (2.4). Here, the traits can be a hybrid of different traits. An alternative is to impose different weights for each pair of sample in the statistic (2.8) according to the information of their covariates. The weight, denoted by w(zi, zj) for the pair (i, j), reflects the relative importance attributed by the covariates when we derive the statistic. Zhu et al. (2010) examined the following weight function. Write z = (zco, zca)′ with zco = (z(1), … , z(l1))′ for the continuous covariates and zca = (z(l1+1), … , z(l))′ for the categorical covariates. They defined the weight function w(zi, zj) as

w(zi,zj)=W(zicozjco)I(zica=zjca), (2.9)

where W(·) is a positive and decreasing function, e.g., W(u) = exp(−u2/2h2), and I(·) is the indicator function. Then a weighted test statistic is given by

S=(n2)1i<jDijFijw(zi,zj). (2.10)

Zhang et al. (2010) and Zhu et al. (2010) showed that under the null hypothesis, the test statistic S (weighted or not) has the following asymptotic distribution conditional on all phenotypes and parental genotypes,

Var01/2(S)[SE0(S)]dN(0,Ip),

where

E0(S)=2n1i=1nu¯iE0(Ci|Mipa),Var0(S)=4(n1)2i=1nj=1nu¯iu¯jCov0(Ci,Cj|Mipa,Mjpa).

Consequently, the following test statistic

χtau2=[SE0(S)]Var01(S)[SE0(S)]

converges to χp2 in distribution under the null hypothesis provided that Var0(S) is full rank. In a case-control study, we do not have the markers from parents and hence the conditional expectations are replaced with the unconditional ones. Thus, the key difference in the test statistics between family studies and popluation studies lies in the conditioning on the parental markers. The conditioning on the parental markers gives the family studies a major advantage in removing the effect of popluation admixture, but family studies tend to be more difficult and expensive to carry out.

Under the alternative hypothesis, the test statistic χtau2 can be written as a weighted sum of noncentral χ12=i=1peiχ12(ϕi), where e1 ≥ ⋯ ≥ ep are the non-negative eigenvalues of 11/20111/2. ϕi=μRi2 and μRi is the ith component of μR=Q11/2μ, where Q is an orthonormal matrix such that Q11/20111/2Q=diag(e1,,ep). μ is the difference in the means of S under the alternative and null hypotheses. Using the approximation theory of Pearson (1959), Solomon and Stephens (1977), and Liu et al. (2009), we can find a certain degree of freedom l and noncentral parametric υ such that the distribution of χtau2 can be closely approximated by χl2(υ). Through simulation studies, Zhu et al. confirmed that this approximation is accurate enough for power calculation.

It is noteworthy that the weight function in (2.9) is restrictive with respect to categorical covariates, especially so for ordinal covariates. The use of genomic propensity score can give rise to an alternative weight function. Specifically, a di-allelic marker G (e.g., SNP), the genomic propensity score is the conditional probability pg(z) = P(G = g∣Z = z). This probability can be fitted by a logistic regression model or proportional odds model depending on whether G is chosen as an allele type or genotype. In the latter choice, the model also depends on the mode of inheritance. In the current genomewide association studies, we usually only have genotypes and cannot distinguish the phases of individual alleles. Thus, we have to construct genomic propensity scores by considering various modes of inheritance. Once the genomic propensity score is estimated, it can be treated as a numerical covariate and then we can use (2.9) again.

2.2.3 Examples

Zhang et al. (2010) re-analyzed a data set from the Collaborative Study on the Genetics of Alcoholism (COGA). The data came from a multi-center (9 sites) consortium that recruited study participants by requiring every proband to meet two alcohol dependence diagnostic criteria based on DSM-IV-R (American Psychiatric Association 1994). The first-degree relatives of the probands were invited into the study. Zhang et al. (2010) included a total of 1614 individuals from 143 families. They considered three phenotypes: (1) alcohol DX-DSM3R+Feighner; (2) maximum number of drinks in a 24 hour period; and (3) the response to “spent so much time drinking, had little time for any thing else.” Using the first phenotype alone, the p-value of the association between a peak marker D7S679 on chromosome 7 and the trait was 0.0019. However, when the three traits are analyzed together, D7S679 remains to be the peak marker, and the p-value is reduced to 0.00055, demonstrating the possibility that the other two phenotypes enhanced the association signal. If the other two phenotypes are analyzed alone, the analysis did not lead to anything worthy of further attention.

In the analysis cited above, the association was assessed without considering covariates. In a follow-up analysis, Jiang et al. (2010) considered two important covariates: age at interview and sex. When these two covariates were controlled for, the p-value of the association between the peak marker D7S679 and the three phenotypes went down further to 0.000313.

3 Discussion

Studying comorbidity is a significant issue in mental and behavioral research, dating back to a century ago (Cannon and Rosanoff 1911). This is challenging due to a lack of statistical methods that accommodate the complexity of comorbidity. While dealing with comorbidity in genetic studies is the focus of this review, it is achieved through gradual development, and accumulation of methods. Various challenges are dealt with along the way.

Although I focused on the analysis of ordinal traits and applications in mental health, the presented methods are closely related to robust and ranked based methods for binary and quantitative traits. Furthermore, ordinal traits arise in studies of diseases besides mental illnesses such as cancer (specifically, different stages).

From the statistical perspective, the methods that are presented here have broad applications beyond genetic association studies. From college admissions, to job searches, to scientific investigations, we make inferences based on multi-dimensional data. It is important and imperative to consider and develop inferential tools for multivariate outcomes, particularly when the outcomes are discrete. There is extensive literature on the statistical analysis of multivariate normal variables as well as on nonparametric tests for a single variable of non-normal distribution. However, few options are available for the inference when we have multiple non-normally distributed variables and potential hybrids of continuous and discrete variables. To overcome this challenge, I presented several useful statistical techniques such as the rank-based U-statistics and the kernel-based weighted statistics to accommodate the mix of continuous and discrete outcomes and the presence of important covariates.

Acknowledgments

This work is supported in part by grant R01DA016750 from the National Institute on Drug Abuse. The author wishes to thank Professor David Madigan for his encourage on this review. He also thanks Dr. Gang Zheng and Professor Joseph Gastwirth for helpful discussions and comments, and Jennnifer Brennan for careful reading and comments.

References

  1. Abelson JF, Kwan KY, et al. Sequence variants in SLITRK1 are associated with Tourette’s syndrome. Science. 2005;310:317–20. doi: 10.1126/science.1116502. [DOI] [PubMed] [Google Scholar]
  2. Abreu PC, Greenberg DA, Hodge SE. Direct powercomparisons between simple lod scores and NPL scores for linkageanalysis in complex diseases. Am J Hum Genet. 1999;65:847–857. doi: 10.1086/302536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Allison DB. Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet. 1997;60:676–690. [PMC free article] [PubMed] [Google Scholar]
  4. Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998;62:1198–1121. doi: 10.1086/301844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. Fourth Edition. Washington, DC: American Psychiatric Association Press; 1994. Revised. [Google Scholar]
  6. Amos CI. Robust variance-components approach for assess-ing genetic linkage in pedigrees. Am J Hum Genet. 1994;54:535–543. [PMC free article] [PubMed] [Google Scholar]
  7. Arking DE, Pfeufer A, Post W, Kao WH, Newton-Cheh C, Ikeda M, West K, Kashuk C, Akyol M, Perz S, et al. A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization. Nat Genet. 2006;38:644–651. doi: 10.1038/ng1790. [DOI] [PubMed] [Google Scholar]
  8. Begleiter H, et al. The Collaborative Study on the Genetics of Alcoholism. Alcohol Health Res World. 1995;19:228–236. [PMC free article] [PubMed] [Google Scholar]
  9. Blackwelder WC, Elston RC. A Comparison of Sib-Pair Linkage Tests for Disease Susceptibility Loci. Genetic Epidemiology. 1985;2:85–97. doi: 10.1002/gepi.1370020109. [DOI] [PubMed] [Google Scholar]
  10. Blangero J, Almasy L. Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol. 1997;14:959–964. doi: 10.1002/(SICI)1098-2272(1997)14:6<959::AID-GEPI66>3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]
  11. Cannon GL, Rosanoff AJ. Preliminary report of a study of heredity in insanity in the light of the Mendelian laws. Journal of Nervous and Mental Disorders. 1911;38:272–279. Reprinted from. [Google Scholar]
  12. Carter CL, Chung CS. Segregation analysis of schizophrenia under a mixed genetic model. Hum Hered. 1980;30:350–356. doi: 10.1159/000153156. [DOI] [PubMed] [Google Scholar]
  13. Chen X, Cho K, Singer BH, Zhang HP. The Nuclear Transcription Factor PKNOX2 Is a Candidate Gene for Substance Dependence in European-Origin Women. PLoS ONE. 2011;6:e16002. doi: 10.1371/journal.pone.0016002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chen X, Liu CT, Zhang MZ, Zhang HP. A forest-based approach to identifying gene and gene-gene interactions. PNAS. 2007;104:19199–19203. doi: 10.1073/pnas.0709868104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cochran WG. Sampling Techniques. 3. New York: Wiley; 1977. [Google Scholar]
  16. Dick DM, Aliev F, Wang JC, Grucza RA, Schuckit M, Kuperman S, Kramer J, Hinrichs A, Bertelsen S, Budde JP, Hesselbrock V, Porjesz B, Edenberg HJ, Bierut LJ, Goate A. Using dimensional models of externalizing psychopathology to aid in gene identification. Arch Gen Psychiatry. 2008;65:310–318. doi: 10.1001/archpsyc.65.3.310. [DOI] [PubMed] [Google Scholar]
  17. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, et al. A genomewide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–1463. doi: 10.1126/science.1135245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Edenberg HJ. The collaborative study on the genetics of alcoholism: an update. Alcohol Research & Health: the Journal of the National Institute on Alcohol Abuse & Alcoholism. 2002;26:214–218. [PMC free article] [PubMed] [Google Scholar]
  19. Edenberg HJ, Bierut LJ, Boyce P, Cao M, Cawley S, Chiles R, Doheny KF. Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and single-nucleotide polymorphism genotyping for Genetic Analysis Workshop 14. BMC Genetics. 2005;6(Suppl 1):S2. doi: 10.1186/1471-2156-6-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Elston RC, Steward J. A general model for the analysis of pedigree data. Hum Hered. 1971;21:523–542. doi: 10.1159/000152448. [DOI] [PubMed] [Google Scholar]
  21. Feighner JP, Robins E, Guze SB, Woodruff RA, Jr, Winokur G, Munoz R. Diagnostic criteria for use in psychiatric research. Arch Gen Psychiatry. 1972;26:57–63. doi: 10.1001/archpsyc.1972.01750190059011. [DOI] [PubMed] [Google Scholar]
  22. Feng R, Leckman J, Zhang HP. Linkage analysis of ordinal traits for pedigree data. Proceedings of the National Academy of Sciences USA. 2004;101:16739–16744. doi: 10.1073/pnas.0404623101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gastwirth JL. On robust procedures. J Amer Statist Assoc. 1966;61:929–948. MR0205397. [Google Scholar]
  24. Gastwirth JL. The use of maximin efficieny robust tests in combining contingency tables and survival analysis. J Amer Statist Assoc. 1985;80:380–384. MR0792737. [Google Scholar]
  25. Goldgar DE. Multipoint analysis of human quantitativegenetic variation. Am J Hum Genet. 1990;47:957–967. [PMC free article] [PubMed] [Google Scholar]
  26. Goldstein DB. Common genetic variation and human traits. N Eng J Med. 2009;360:1696–1698. doi: 10.1056/NEJMp0806284. [DOI] [PubMed] [Google Scholar]
  27. Guo SW, Thompson EA. A Monte Carlo method forcombined segregation and linkage analysis. Am J Hum Genet. 1992;51:1111–1126. [PMC free article] [PubMed] [Google Scholar]
  28. Heath AC, Todorov AA, et al. Gene-environment interaction effects on behavioral variation and risk of complex disorders: the example of alcoholism and other psychiatric disorders. Twin Research. 2002;5:30–7. doi: 10.1375/1369052022875. [DOI] [PubMed] [Google Scholar]
  29. Heston LL. Psychiatric disorders in foster home reared children of schizophrenic mothers. Bristish Journal of Psychiatry. 1966;112:819–825. doi: 10.1192/bjp.112.489.819. [DOI] [PubMed] [Google Scholar]
  30. Hindorff LA, Sethupathy P, Junkinsa HA, Ramosa EM, Mehtac JP, Collinsb FS, Manolioa TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hollander M, Wolfe DA. Nonparametric Statistical Methods. 2. New York: Wiley; 1999. [Google Scholar]
  32. Horvath S, Laird NM. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am J Hum Genet. 1998;63:1886–1897. doi: 10.1086/302137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kao WH, et al. MYH9 is associated with nondiabetic end-stage renal disease in African-Americans. Nature Genetics. 2008;40:1185–1192. doi: 10.1038/ng.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kety SS, Rosenthal D, Wender P. Genetic relationships within the schizophrenia spectrum: evidence from adoption studies. In: Spitzer RL, Klein DF, editors. Critical Issues in Psychiatric Diagnosis. New York: Raven Press; 1978. pp. 213–223. [Google Scholar]
  35. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Knapp M. Using exact p values to compare the power between the reconstruction-combined transmission/disequilibrium test and the sib transmission/tisequilibrium test. Am J Hum Genet. 1999;65:1208–1210. doi: 10.1086/302591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kopp JB, et al. MYH9 is major-effect risk gene for focal segmental glomerulosclerosis. Nature Genetics. 2008;40:1175–1184. doi: 10.1038/ng.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kraepelin E. Psychiatrie: Ein Lehrbuch fur Studirende und Aerzte. Sechste, vollstandig umgearbeitete Auflage. Leipzig: Verlag von Johann Ambrosius Barth; 1899. [Google Scholar]
  39. Kruglyak L, Daly MJ, et al. Parametric and nonparametric linkage analysis: a unified multipoint approach. American Journal of Human Genetics. 1996;58:1347–63. [PMC free article] [PubMed] [Google Scholar]
  40. Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71:1330–1341. doi: 10.1086/344696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lange C, Laird NM. Power calculations for a general class of family-based association test: dichotomous traits. Am J Hum Genet. 2002;71:575–584. doi: 10.1086/342406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Lange C, Silverman EK, Xu X, Weiss ST, Laird NM. A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003;4:195–306. doi: 10.1093/biostatistics/4.2.195. [DOI] [PubMed] [Google Scholar]
  43. Laird NM, Horvath S, Xu X. Implementing a unified approach to family-based tests of association. Genet Epidem. 2000;19(Suppl 1):S36–S42. doi: 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  44. Lam KF, Lee YW, Leung TL. Modeling multivariate survival data by a semiparametric random effects proportional odds model. Biometrics. 2002;58:316–323. doi: 10.1111/j.0006-341x.2002.00316.x. [DOI] [PubMed] [Google Scholar]
  45. Lander ES, Green P. Construction of multilocus geneticmaps in humans. Proc Natl Acad Sci USA. 1987;84:2363–2367. doi: 10.1073/pnas.84.8.2363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Li MD, Burmeister M. New insights into the genetics of addiction. Nat Rev Genet. 2009;10:225–231. doi: 10.1038/nrg2536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Liu H, Tang YQ, Zhang HH. A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables. Comp Statist Data Anal. 2009;53:853–856. [Google Scholar]
  48. Liu Y, Tritchler D, Bull SB. A unified framework for transmission-disequilibrium test analysis of discrete and continuous traits. Genet Epidem. 2002;22:26–40. doi: 10.1002/gepi.1041. [DOI] [PubMed] [Google Scholar]
  49. Lunetta KL, Farone SV, Biederman J, Laird NM. Family based tests of association and linkage that used unaffected sibs, covariates, and interactions. Am J Hum Genet. 2000;66:605–614. doi: 10.1086/302782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Martin ER, Monks SA, Warren LL, Kaplan NL. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am J Hum Genet. 2000;67:146–154. doi: 10.1086/302957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Merikangas KR, et al. Familial transmission of substance use disorders. Arch Gen Psychiatry. 1998;55:973–979. doi: 10.1001/archpsyc.55.11.973. [DOI] [PubMed] [Google Scholar]
  52. Morton NE. Sequential tests for the detection of linkage. Am J Hum Genet. 1955;7:277–318. [PMC free article] [PubMed] [Google Scholar]
  53. Mulligan MK, Ponomarev I, Hitzemann RJ, Belknap JK, Tabakoff B, Harris RA, Crabbe JC, Blednov YA, Grahame NJ, Phillips TJ, Finn DA, Hoffman PL, Iyer VR, Koob GF, Bergeson SE. Toward understanding the genetics of alcohol drinking through transcriptome meta-analysis. Proc Natl Acad Sci U S A. 2006;103:6368–6373. doi: 10.1073/pnas.0510188103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Ott J. Estimation of the recombination fraction in human pedigrees: Efficient computation of the likelihood for human linkage studies. Am J Hum Genet. 1974;26:588–597. [PMC free article] [PubMed] [Google Scholar]
  55. Ott J. Analysis of Human Genetic Linkage. 3. Johns Hopkins University Press; Baltimore, MD: 1999. [Google Scholar]
  56. Pauls DL, Leckman JF. The inheritance of Gilles de la Tourette’s syndrome and associated behaviors. Evidence for autosomal dominant transmission. New England Journal of Medicine. 1986;315:993–7. doi: 10.1056/NEJM198610163151604. [DOI] [PubMed] [Google Scholar]
  57. Pearson ES. Note on an approximation to the distribution of non-central χ2. Biometrika. 1959;46:364. [Google Scholar]
  58. Plomin R, DeFries JC, McClearn GE, Rutter M. Behavioral Genetics. 3. New York: Freeman; 1997. [Google Scholar]
  59. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Rabinowitz D. A transmission disequilibrium test for quantitative trait loci. Hum Hered. 1997;47:342–350. doi: 10.1159/000154433. [DOI] [PubMed] [Google Scholar]
  61. Rabinowitz D, Laird NM. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
  62. Rao S, Xu S. Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity. 1998;81:214–224. doi: 10.1046/j.1365-2540.1998.00378.x. [DOI] [PubMed] [Google Scholar]
  63. Reich T, Edenberg HJ, et al. Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet (Neuropsychiatric Genetics) 1998;81:207–215. [PubMed] [Google Scholar]
  64. Risch NR, Zhang HP. Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science. 1995;268:1584–1589. doi: 10.1126/science.7777857. [DOI] [PubMed] [Google Scholar]
  65. Rosenthal D. Three adoption studies of heredity in the schizophrenic disorders. International Journal of Mental Health. 1972;1:63–75. [Google Scholar]
  66. Solomon H, Stephens MA. Distribution of a sum of weighted chi-square variables. J Amer Statist Assoc. 1977;72:881–885. [Google Scholar]
  67. Scharf JM, et al. Lack of Association between SLITRK1var321 and Tourette Syndrome in a large family-based sample. Neurology. 2008;70:1495–1496. doi: 10.1212/01.wnl.0000296833.25484.bb. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Schork NJ. Extended multipoint identity-by-descent analy-sis of human quantitative traits: efficiency, power, and modeling con-siderations. Am J Hum Genet. 1993;53:1306–1319. [PMC free article] [PubMed] [Google Scholar]
  69. Spielman RS, Ewens WJ. A sibship rest for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450–458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
  71. Steinke JW, Borish L, Rosenwasser LJ. Genetics of hypersensitivity. Journal of Allergy and Clinical Immunology. 2003;111:S495–S501. doi: 10.1067/mai.2003.143. [DOI] [PubMed] [Google Scholar]
  72. True WR, et al. Interrelationship of genetic and environmental influences on conduct disorder and alcohol and marijuana dependence symptoms. Am J Med Genet. 1999;88:391–397. doi: 10.1002/(sici)1096-8628(19990820)88:4<391::aid-ajmg17>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
  73. Tienari P. Interaction between genetic vulnerability and family environment: the Finnish adoptive family study of schizophrenia. Acta Psychiatrica Scandinavica. 1991;84:460–465. doi: 10.1111/j.1600-0447.1991.tb03178.x. [DOI] [PubMed] [Google Scholar]
  74. Vergne L, Bourgeois A, Mpoudi-Ngole E, Mougnutou R, Mbuagbaw J, Liegeois F, Laurent C, Butel C, Zekeng L, Delaporte E, Peeters M. Biological and genetic characteristics of HIV infections in Cameroon reveals dual group M and O infections and a correlation between SI-inducing phenotype of the predominant CRF02_AG variant and disease stage. Virology. 2003;310:254–266. doi: 10.1016/s0042-6822(03)00167-3. [DOI] [PubMed] [Google Scholar]
  75. Wang XQ, Ye YQ, Zhang HP. Family-based association tests for ordinal traits adjusting for covariates. Genet Epidem. 2006;30:728–736. doi: 10.1002/gepi.20184. [DOI] [PubMed] [Google Scholar]
  76. Weinberg CR. Allowing for missing parents in genetic studies of case-parental triads. Am J Hum Genet. 1999;64:1186–1193. doi: 10.1086/302337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Whittemore AS. Genome scanning for linkage: an overview. Am J Hum Genet. 1996;59:704–716. [PMC free article] [PubMed] [Google Scholar]
  78. Xu S, Xu C. A multivariate model for ordinal trait analysis. Heredity. 2006;97:409–417. doi: 10.1038/sj.hdy.6800885. [DOI] [PubMed] [Google Scholar]
  79. Zhang HP, Feng R, Zhu H. A latent variable model for segregation anlysis. J Amer Statist Assoc. 2003;98:1023–1034. [Google Scholar]
  80. Zhang HP, Liu C-T, Wang XQ. An association test for multiple traits based on the generalized Kendall’s tau. J Amer Statist Assoc. 2010;105:473–481. doi: 10.1198/jasa.2009.ap08387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Zhang HP, Merikangas K. A frailty model of segregation analysis: understanding the familial transmission of alcoholism. Biometrics. 2000;56:815–23. doi: 10.1111/j.0006-341x.2000.00815.x. [DOI] [PubMed] [Google Scholar]
  82. Zhang HP, Wang XQ, Ye YQ. Detection of genes for ordinal traits in nuclear families and a unified approach for association studies. Genetics. 2006;172:693–699. doi: 10.1534/genetics.105.049122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Zhang M, Feng R, Chen X, Hu B, Zhang H. LOT: a Tool for Linkage Analysis of Ordinal Traits for Pedigree Data. Bioinoformatics. 2008;24:1737–9. doi: 10.1093/bioinformatics/btn258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Zheng G, Joo J, Zaykin D, Wu C, Geller N. Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium. Statistical Science. 2009;24:503–516. [Google Scholar]
  85. Zhu WS, Zhang HP. Why do we test multiple traits in genetic association studies? (with discussion) J Korean Statist Soc Sci. 2009;38:1–10. doi: 10.1016/j.jkss.2008.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Zhu X, Cooper R, Kan D, Cao G, Wu X. A genome-wide linkage and association study using COGA data. BMC Genetics. 2005;6:S128. doi: 10.1186/1471-2156-6-S1-S128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Zhu WS, Jiang Y, Zhang HP. Covariate-Adjusted Association Tests and Power Calculations Based on the Generalized Kendall’s Tau. Technical Report. 2010 doi: 10.1080/01621459.2011.643707. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES