Skip to main content
Genetics logoLink to Genetics
. 2006 Jan;172(1):663–686. doi: 10.1534/genetics.105.046417

High-Resolution Association Mapping of Quantitative Trait Loci: A Population-Based Approach

Ruzong Fan *,1, Jeesun Jung , Lei Jin *
PMCID: PMC1456191  PMID: 16172503

Abstract

In this article, population-based regression models are proposed for high-resolution linkage disequilibrium mapping of quantitative trait loci (QTL). Two regression models, the “genotype effect model” and the “additive effect model,” are proposed to model the association between the markers and the trait locus. The marker can be either diallelic or multiallelic. If only one marker is used, the method is similar to a classical setting by Nielsen and Weir, and the additive effect model is equivalent to the haplotype trend regression (HTR) method by Zaykin et al. If two/multiple marker data with phase ambiguity are used in the analysis, the proposed models can be used to analyze the data directly. By analytical formulas, we show that the genotype effect model can be used to model the additive and dominance effects simultaneously; the additive effect model takes care of the additive effect only. On the basis of the two models, F-test statistics are proposed to test association between the QTL and markers. By a simulation study, we show that the two models have reasonable type I error rates for a data set of moderate sample size. The noncentrality parameter approximations of F-test statistics are derived to make power calculation and comparison. By a simulation study, it is found that the noncentrality parameter approximations of F-test statistics work very well. Using the noncentrality parameter approximations, we compare the power of the two models with that of the HTR. In addition, a simulation study is performed to make a comparison on the basis of the haplotype frequencies of 10 SNPs of angiotensin-1 converting enzyme (ACE) genes.


IN genetics research, one important goal is to locate and identify important genetic variants that are related to complex traits. With the development of dense maps such as single-nucleotide polymorphisms (SNPs) and high-resolution microsatellites in the human genome, enormous amounts of genetic data on human chromosomes are becoming available (International SNP Map Working Group 2001; Kong et al. 2002; International HapMap Consortium 2003; HapMap project, http://www.hapmap.org). The opportunities for a genomewide scan to map complex disease genes are tremendous. It is important to build appropriate models and useful algorithms in association mapping of complex diseases to identify important genetic variants of complex traits, for human, animal, or plant study.

In recent years, there has been great interest in linkage disequilibrium (LD) mapping (or association study) of quantitative traits of complex diseases. One way is to use diallelic markers such as SNPs in analysis. This approach has been receiving much attention and there are quite a lot of references to it in the literature (Fulker et al. 1999; George et al. 1999; Abecasis et al. 2000a,b, 2001; Sham et al. 2000; Fan et al. 2005). Another approach is to use haplotype data that may consist of a set of SNPs (Schaid et al. 2002; Zaykin et al. 2002; Schaid 2004). The haplotype data may provide more information on the relation between DNA variants and complex traits than that of any single SNP. Hence, it is important to investigate models and algorithms that are based on haplotype data. In Schaid et al. (2002) and Zaykin et al. (2002), score tests are proposed for association between complex traits and haplotypes, which can be ambiguous owing to the unknown linkage phase of different haplotypes. In Zaykin et al. (2002), the method is called haplotype trend regression (HTR), which is very close to the method of Schaid et al. (2002) (see Schaid 2004, p. 355, for further explanation). HTR does not assume that haplotype phases are known. Meuwissen and Goddard (2000) introduced a haplotype-based approach, which assumes that haplotype phases are known. In addition, mixed models are used to model the haplotype effect in Meuwissen and Goddard (2000). Morris et al. (2004) used a Markov chain Monte Carlo algorithm based on the shattered coalescent model for fine mapping.

On the other hand, the direct available information is genotypes by current genotyping technology, instead of haplotypes. Hence, it is interesting to build models by directly using genotype information; under these models, the main effects of each marker are modeled, which does not require phase information across the markers. If phase is unknown, presumably the haplotypes would need to be estimated first, using a reconstruction algorithm such as PHASE or EM algorithms (Dempster et al. 1977; M. Stephens et al. 2001; Stephens and Donnelly 2003). This may introduce bias into the subsequent analysis, which would need to be investigated. It is of real interest in making comparison of the genotype-based models and the haplotype-based models. Interestingly, Morris et al. (2004) and Clayton et al. (2004) have observed that the haplotypes at SNPs may be only slightly more advantageous or even less powerful for fine mapping than the corresponding unphased genotypes.

Suppose that a quantitative trait locus (QTL) is located in a chromosome region. In the region, a marker (or two/multiple markers) is (or are) typed. In our previous research, the markers are assumed to be diallelic (Fan and Xiong 2002). In the current article, the markers can be either diallelic or multiallelic. Suppose that a population sample is available. For each individual in the sample, both trait value and genotypes at the markers are observed. We propose two regression models in association mapping of QTL based on population genetic data. One model is the “genotype effect model,” and the other is the “additive effect model.” These two models extend our previous research of high-resolution LD mapping of QTL using diallelic markers (Fan and Xiong 2002). The model can be very easily performed by using any statistical software in data analysis, or it can be easily implemented by widely used language such as C++. By analytical formulas, we show that the genotype effect model can be used to model the additive and dominance effects simultaneously; the additive effect model takes care of additive effect only. On the basis of the two models, F-test statistics are proposed to test association between the QTL and markers. To investigate the robustness of the proposed models and the related F-test statistics, simulation studies are performed to calculate the type I error rates. The noncentrality parameters of F-test statistics are derived to make power calculation and comparison. Moreover, the proposed models are compared with the haplotype trend regression method by simulation study and type I error rate analysis when two diallelic markers are used in the analysis (Zaykin et al. 2002). On the basis of the haplotype frequencies of 10 SNPs of angiotensin-1 converting enzyme (ACE) genes, a simulation study is performed to make power comparison of the proposed models with the haplotype trend regression method (Keavney et al. 1998).

A software, CLAM_QTL, is written in C++ to implement the proposed models and methods, which can be downloaded from http://www.stat.tamu.edu/∼rfan/software.html/.

METHODS

As the first step, we present models and methods by using one marker. Here the marker can be either biallelic or multiallelic. This article extends our previous work (Fan and Xiong 2002). Similar results were worked out independently by colleagues at North Carolina State University, although their language and notations are slightly different (Weir and Cockerham 1977; Nielsen and Weir 1999, 2001). Then, the models and methods are extended to use two/multiple markers in analysis. On the basis of the models, F-test statistics are proposed, and the related noncentrality parameter approximations of the F-tests are derived.

Analysis by one marker:

Population models:

Consider a quantitative trait locus Q, which is located at an autosome. Suppose that there are two alleles Q1 and Q2 at the trait locus with frequencies q1 and q2, respectively. In a region of the QTL Q, suppose that one marker A is typed, which may be diallelic such as a single-nucleotide polymorphism or may be multiallelic such as a microsatellite marker. Let us denote the alleles of marker A by A1, …, Am, where m is the number of alleles. Suppose that the marker A is in Hardy-Weinberg equilibrium (HWE). Let the frequency of Ai be Inline graphic. There are JA = m(m + 1)/2 possible genotypes, which can be listed as A1A1, …, AmAm, A1A2, …, A1Am, …, Am−1Am. Accordingly, let β11, …, βmm, β12, …, β1m, …, βm−1,m be the corresponding effects of the listed genotypes on the quantitative trait. Let y be the trait value of an individual with genotype GA = AiAj. Under an assumption of normality, the trait value can be modeled as

graphic file with name M2.gif (1)

where w is a row vector of covariates such as sex and age, γ is a column vector of regression coefficients of w, and e is the error term. Assume that e is normal N(0, σe2). In addition to the covariate effects, there are JA = m(m + 1)/2 parameters βij in model (1), where βij = βji. Model (1) treats each genotype effect as one parameter. Hence, we call it a genotype effect model. In practice, model (1) may lead to large number of parameters.

Now let us denote the effect of allele Ai as αi, i = 1, …, m. Suppose the genetic effect is additive in a sense of βij = αi + αj, i, j = 1, …, m. If an individual has quantitative trait value y and genotype GA = AiAj, model (1) can be modified as

graphic file with name M3.gif (2)

In addition to the covariate effects, there are m parameters αi, i = 1, …, m, in model (2). Compared with model (1), model (2) may significantly reduce the number of parameters. Since it models only the additive effect, we call it the additive effect model.

Property of model coefficients and association tests:

As in the traditional quantitative genetics, let a be the effect of genotype Q1Q1, d be the effect of genotype Q1Q2, and −a be the effect of genotype Q2Q2 (Falconer and Mackay 1996). Let αQ = a + (q2q1)d be the average effect of gene substitution and δQ = 2d be the dominance deviation. In addition, let μ = a(q1q2) + 2dq1q2 be the aggregate effect of the QTL on the trait mean in the population. For i = 1, 2, …, m, let us denote Inline graphic, which are measures of LD between QTL Q and marker A. Here P(Q1Ai) is the frequency of haplotype Q1Ai. In appendix a, we show that the regression coefficients of model (1) are given by

graphic file with name M5.gif (3)

In appendix b, we show that the regression coefficients of model (2) are given by

graphic file with name M6.gif (4)

From Equations 3 and 4, it is clear that βij = αi + αj, when δQ = 0, i.e., no dominance effect. Suppose that the marker A and the QTL Q are in linkage equilibrium; i.e., Inline graphic. Then Equation 3 implies βij = μ; Equation 4 implies that αi = μ/2. Hence, models (1) and (2) are reduced to

graphic file with name M8.gif (5)

Assume that the additive genetic effect is significantly present, but the dominance genetic effect is not significantly present; i.e., αQ ≠ 0 but δQ = 0. To test association between the marker A and the QTL Q, one may test hypotheses Ha0: α1 = ⋯ = αm vs. Ha1: at least two αi's are not equal. To see this, note that the hypotheses Ha0: α1 = ⋯ = αm is equivalent to Inline graphic, since αQ is significantly different from 0. Thus, Inline graphic implies Inline graphic and so Inline graphic under Ha0. Hence, the hypotheses Ha0: α1 = ⋯ = αm vs. Ha1: at least two αi's are not equal to each other are equivalent to Inline graphic at least one Inline graphic is not equal to 0. Model (2) can be used to map the QTL by an association analysis.

On the other hand, assume that both additive and dominance genetic effects are significantly present at the putative QTL Q; i.e., αQ ≠ 0 and δQ ≠ 0. To test association between the marker A and the QTL Q, one may test hypotheses Had0: β11 = ⋯ = βmm = β12 = ⋯ = β1m = ⋯ = βm−1,m vs. Had1: at least two βij's are not equal.

Relation to our previous work:

If the marker A has only two alleles A1 and A2, Fan and Xiong (2002) proposed the following model in association mapping of the QTL Q,

graphic file with name M15.gif (6)

where xA and zA are dummy random variables defined by

graphic file with name M16.gif (7)

and αA and δA are regression coefficients of the dummy variables xA and zA. The regression coefficients are given by Inline graphic and Inline graphic (Fan and Xiong 2002). It can be shown that model (6) is equivalent to model (1). Actually, the following relations of the regression coefficients of the two models can be shown: Inline graphic, and Inline graphic. Similarly, model (2) is equivalent to y = wγ + μ + xHαA + e, and we have the following relations Inline graphic and Inline graphic. The advantage of model (6) is that the association effect is decomposed into summations of additive and dominance effects if A is diallelic. If A has more than two alleles, model (1) extends model (6), and model (2) extends model y = wγ + μ + xHαA + e.

Regression models:

Assume that N individuals from a population are available for study. Let us list their trait values as y1, …, yN and their genotypes as GA1, …, GAN. For individual k, let xii(k) be the indicator function of genotype AiAi and xij(k) be the indicator function of genotype AiAj. That is, they are dummy variables defined by

graphic file with name M23.gif

where i, j = 1, 2, …, m, ij. Let Inline graphic, k = 1, 2, …, N; i.e., Xk is a column vector of genotype indicator functions of individual k. Here the superscript τ denotes a vector/matrix transpose. Denote Inline graphic. The corresponding regression of model (1) can be written as

graphic file with name M26.gif (8)

where subscript k indicates the corresponding quantities of individual k.

Similarly, let Inline graphic be the number of alleles Ai of genotype GAk, i = 1, 2, …, m, for individual k. That is, Inline graphic is a dummy variable defined by

graphic file with name M29.gif

Denote Inline graphic and Inline graphic. To use model (2) for data analysis, the corresponding regression model is

graphic file with name M32.gif (9)

F-tests and noncentrality parameter approximations:

It is well known that the additive variance Inline graphic and the dominance variance Inline graphic. Let Inline graphic be the total variance. Assume that there are no covariates. Let us denote Inline graphic, and Inline graphic. Then model (8) can be expressed as y = Xη + e. By standard regression theory, the coefficients can be estimated by Inline graphic. Let H be a (JA − 1) × JA matrix defined by

graphic file with name M39.gif

Then, (Hη)τ = (β11 − β22, …, β11 − βmm, β11 − β12, …, β11 − β1m, …, β11 − βm−1,m). Hence, the hypothesis Had0 is equivalent to Hη = (0, …, 0)τ. From Graybill (1976), Chap. 6, the test statistic of a hypothesis Had0 is noncentral F(JA − 1, NJA) defined by

graphic file with name M40.gif

where IN is the N × N identity matrix. The noncentrality parameter of the above F-statistic is λm,ad = (Hη)τ[H(XτX)−1Hτ]−1(Hη)/σ2. Under the assumption of large sample sizes N, we show in appendix c the approximation

graphic file with name M41.gif (10)

where RAQ2 is a general measure of the degree of linkage disequilibrium between marker A and the QTL Q defined by Inline graphic (Crow and Kimura 1970; Hedrick 1987; Morton and Wu 1988; Sham et al. 2000). Note that RAQ2 is the χ2-statistic of the m × 2 table of haplotype frequencies of the marker A and trait locus Q. Approximation (10) shows that the noncentrality parameter of test statistics of the null hypothesis of no genetic effects of model (1) is reduced by a factor of Inline graphic for additive variance and by a factor of Inline graphic for dominance variance.

Similarly, let us denote Inline graphic. Then model (9) can be expressed as y = Zψ + e. The coefficients can be estimated by Inline graphic. Let K be a (m − 1) × m matrix defined by

graphic file with name M47.gif

Then, (Kψ)τ = (α1 − α2, …, α1 − αm). Hence, the hypothesis Ha0 is equivalent to Kψ = (0, …, 0)τ. From Graybill (1976), Chap. 6, the test statistic of the hypothesis Ha0 is noncentral F(m − 1, Nm) defined by

graphic file with name M48.gif

The noncentrality parameter of the above F-statistic is λm,a = (Kψ)τ[K(ZτZ)−1Kτ]−1(Kψ)/σ2. Under an assumption of large sample sizes N, we show in appendix d the following approximation:

graphic file with name M49.gif (11)

This approximation (11) shows that the noncentrality parameter λm,a is reduced by a factor of Inline graphic for additive variance. The dominance variance is not present in λm,a.

Analysis by two/multiple markers:

Population models and association tests:

If genetic data of two/multiple markers are available, models (1) and (2) can be extended for association study of QTL. Most importantly, the data of two/multiple markers may contain phase ambiguity, i.e., phase unknown double heterozygotes. In the following, we generalize models (1) and (2) to directly analyze genetic data of two markers. The principle, actually, can be applied to multiple marker data.

In addition to marker A, assume that a second marker B is typed, which has n alleles denoted by B1, …, Bn. Suppose that the marker B is also in Hardy-Weinberg equilibrium. Let the frequency of allele Bk be Inline graphic. There are JB = n(n + 1)/2 possible genotypes, which can be listed as B1B1, …, BnBn, B1B2, …, B1Bn, …, Bn−1Bn. Let y be the trait value of an individual with genotype GA at marker A and genotype GB at marker B. Such as relations (7), define

graphic file with name M52.gif (12)

If marker A has only two alleles A1 and A2, then xAi defined above is closely related to xA, which is defined in (7). Actually, it is easy to see the following relation Inline graphic since Inline graphic.

To extend model (2) by using two markers A and B in the analysis, consider the following model

graphic file with name M55.gif (13)

In addition to the covariate effects, there are m + n − 1 parameters α, αAi, αBk, i = 1, …, m − 1, k = 1, …, n − 1 in model (13). To see why model (13) extends model (2), it is worthwhile to note that model (2) is equivalent to Inline graphic. Actually, the quantity Inline graphic implies that Inline graphic if only information of marker A is used in the analysis; thus, αm = α/2, αi = αAi + α/2, i = 1, …, m − 1. Such as model (2), model (13) takes only the additive effect into account. Hence, we call it an additive effect model. Similarly, model (1) can be extended to

graphic file with name M59.gif (14)

In addition to the covariate effects, there are JA + JB − 1 parameters α, αAi, αBk, δAij, δBkl in model (14). Model (14) takes both additive and dominance effects into account, and it is called the genotype effect model. Again, model (1) is equivalent to Inline graphic.

Denote XA = (xA1, …, xA(m−1))τ, XB = (xB1, …, xB(n−1))τ, and XAB = (XAτ, XBτ)τ. Let us denote the additive variance–covariance matrix of the indicator variables xAi, xBk by Inline graphic. Similarly, let ZA = (zA12, …, zA1m, zA23, …, zA2m, …, zA(m−1)m))τ, ZB = (zB12, …, zB1n, zB23, …, zB2n, …, zB(n−1)n))τ, and Inline graphic. Let us denote the dominance variance–covariance matrix of the indicator variables zAij, zBkl by VD = Cov(ZAB, ZAB). For k = 1, 2, …, n, let us denote Inline graphic, which are measures of LD between QTL Q and marker B. In appendix e, we show that the regression coefficients of models (13) and (14) are given by

graphic file with name M64.gif (15)

The elements of matrices VA and VD are provided in appendix e. Equations 15 show that the parameters of LD (i.e., Inline graphic and Inline graphic) and gene effect (i.e., αQ and δQ) are contained in the regression coefficients. Models (13) and (14) simultaneously take care of the LD and the effects of the putative trait locus Q. The gene substitution effect αQ is contained only in αAi, αBk; and the dominance effect δQ is contained only in δAij, δBkl. Therefore, VA is called the additive variance–covariance matrix; and VD is called the dominance variance–covariance matrix. The model (14) orthogonally decomposes the genetic effect into a summation of additive and dominance effects.

In Fan and Xiong (2002), regression models are proposed for LD mapping of QTL by diallelic markers. Models (13) and (14) extend the models by using multiallelic markers in LD analysis. On the basis of Equations 15, we may use models (13) and (14) to test the association between the trait locus Q and the two markers A and B. Assume that the additive genetic effect is significantly present, but the dominance genetic effect is not significantly present; i.e., αQ ≠ 0 but δQ = 0. To test association between the markers A and B and the QTL Q, one may test hypotheses HABa0: αA1 = ⋯ = αA(m−1) = αB1 = ⋯ = αB(n−1) = 0 vs. HABa1: at least one αAi, αBk is not equal to 0. To see this, note that the hypothesis HABa0 is equivalent to Inline graphic, since αQ is significantly different from 0. On the other hand, assume that both additive and dominance genetic effects are significantly present at the putative QTL Q; i.e., αQ ≠ 0 and δQ ≠ 0. To test association between the markers A and B and the QTL Q, one may test hypothesis HABad0: αA1 = ⋯ = αA(m−1) = αB1 = ⋯ = αB(n−1) = δA12 = ⋯ = δA1m = ⋯ = δA(m−1)m = δB12 = ⋯ = δB1n = ⋯ = δB(n−1)n = 0 vs. HABad1: at least one αAi, αBk, δAij, δBkl is not equal to 0, since both αQ and δQ are significantly different from 0.

Regression models, F-tests, and noncentrality parameter approximations:

Assume that N individuals from a population are available for study, whose trait values are listed as y1, …, yN and their genotypes as GA1, …, GAN at marker A and GB1, …, GBN at marker B. For individual s, let Inline graphic be the corresponding coding functions of genotypes GAs and GBs. Let us denote Inline graphic and Inline graphic. Denote αAB = (α, αA1, …, αA(m−1), αB1, …, αB(n−1))τ, and δAB = (δA12, …, δA(m−1)m, δB12, …, δB(n−1)n)τ. The corresponding regression of model (14) can be written as

graphic file with name M71.gif (16)

Let us denote Inline graphic and Inline graphic and Inline graphic. On the basis of regression (16), one may construct an F-test statistic FAB,ad to test the null hypothesis HABad0 in the same way as constructing Fm,ad or Fm,a (Graybill 1976, Chap. 6). Under the null hypothesis of HABad0, FAB,ad is central to F(JA + JB − 2, NJAJB + 1). Assume the sample size N is large enough that the large sample theory applies. Under the alternative hypothesis of HABad1, FAB,ad is noncentral to F(JA + JB − 2, NJAJB + 1), and it can be shown that the corresponding noncentrality parameter is approximated by

graphic file with name M75.gif

Similarly, an F-test statistic FAB,a used to test the null hypothesis HABa0 can be constructed. Under the null hypothesis of HABa0, FAB,a is central to F(m + n − 2, Nnm + 1). Under the alternative hypothesis of HABa1, FAB,a is noncentral to F(m + n − 2, Nmn + 1), and it can be shown that the corresponding noncentrality parameter is approximated by

graphic file with name M76.gif

The haplotype trend regression method:

If only one marker A is used in the analysis, the proposed model (2) is equivalent to the HTR method of Zaykin et al. (2002). However, the proposed models are different from the haplotype trend regression method for two/multiple marker data. Assume that M markers are typed in a region of the trait locus Q. On the basis of the genotypes of the multiple markers, assume that J haplotypes can be determined as h1, …, hJ with frequencies Inline graphic. For each individual, we may define an expected haplotype score vector as follows (Schaid et al. 2002; Zaykin et al. 2002). The expected haplotype score vector is a column vector of J elements (c1, …, cJ)τ based on the genotype combination (G1, …, GM) at the markers of an individual. For instance, the score vector is (1, 0, …, 0)τ if haplotype pair h1/h1 is the only possible phase of the genotype combination (G1, …, GM). In general, cj is the conditional probability of a haplotype hj given genotype combination (G1, …, GM) at the markers; i.e.,

graphic file with name M78.gif

In the above equation, the conditional probability P(G1, …, GM|hi, hk) is 1 if haplotype pair hi/hk is a possible phase for the genotype combination (G1, …, GM), and P(G1, …, GM|hk, hj) is 0 otherwise. For each individual, the summation Inline graphic of the expected haplotype scores is equal to 1.

For the purpose of explanation, consider two diallelic markers A and B. Let us denote the two alleles of marker A by A1, A2; and denote the two alleles of marker B by B1, B2. Table 1 gives the score vector for each genotype combination of markers A and B. To understand the entries of Table 1, it is worthwhile to take genotype combination (GA = A1A1, GB = B1B1) as an example. Two copies of haplotype A1B1 can be formed from the genotype combination (GA = A1A1, GB = B1B1). The score for haplotype A1B1 is 1 for this genotype combination; and scores for the other three haplotypes are all 0. Denote the genotype of an individual at marker A by GA and the genotype at marker B by GB. Let us denote c1 = P(A1B1|GA = A1A2, GB = B1B2) = P(A1B1)P(A2B2)/[2P(A1B1)P(A2B2) + 2P(A1B2)P(A2B1)] = c4; i.e., c1 is the conditional probability of a haplotype A1B1 given the double heterozygotes (GA = A1A2, GB = B1B2); and Inline graphic. For the double heterozygotes (GA = A1A2, GB = B1B2), the expected scores are c1, c2, c2, c1 for haplotypes A1B1, A1B2, A2B1, A2B2. The scores of the other genotype combinations are provided in Table 1. Then the corresponding model of the haplotype trend regression method can be written as

graphic file with name M81.gif (17)

where βi are regression coefficients, and Ii are expected scorings of haplotypes defined in Table 1. It can be seen that model (17) is not equivalent to either proposed model (13) or model (14).

TABLE 1.

Expected scorings Ii, i = 1, 2, 3, 4 of haplotypes of model (17)

Haplotype and related expected scoring
Genotype (GA, GB) A1B1, I1 A1B2, I2 A2B1, I3 A2B2, I4
(A1A1, B1B1) 1 0 0 0
(A1A1, B1B2) Inline graphic Inline graphic 0 0
(A1A1, B2B2) 0 1 0 0
(A1A2, B1B1) Inline graphic 0 Inline graphic 0
(A1A2, B1B2) c1 c2 c2 c1
(A1A2, B2B2) 0 Inline graphic 0 Inline graphic
(A2A2, B1B1) 0 0 1 0
(A2A2, B1B2) 0 0 Inline graphic Inline graphic
(A2A2, B2B2) 0 0 0 1

The constants are given by c1 = P(A1B1|GA = A1A2, GB = B1B2) = P(A1B1)P(A2B2)/[2P(A1B1)P(A2B2) + 2P(A1B2)P(A2B1)] and c2 = Inline graphicc1.

In the general case of M markers, let Ij be the expected score of haplotype hj, j = 1, 2, …, J. In terms of conditional probabilities, Ij can be expressed as

graphic file with name M82.gif

The corresponding model of the haplotype trend regression method can be written as

graphic file with name M83.gif (18)

For j = 1, 2, …, J, let us denote Inline graphic, which are measures of LD between QTL Q and the haplotypes. Here P(Q1hj) is the frequency of haplotype Q1hj. In appendix f, we show that the regression coefficients of model (18) satisfy the matrix equation

graphic file with name M85.gif (19)

where E(IiIk) are given in appendix f, and

graphic file with name M86.gif

From Equations 19, it is clear that model (18) models both the additive and dominance effects. Suppose that the haplotype and the QTL Q are in linkage equilibrium; i.e., Inline graphic. Then Equation 19 implies β1 = ⋯ = βJ = μ, since Inline graphic and Inline graphic. Hence, model (18) is reduced to (5). To test association between the haplotypes and the trait locus, one may test a null hypothesis β1 = ⋯ = βJ, and the related F-test statistic can be constructed.

Again, assume that N individuals from a population are available for study with trait values and genotype information. On the basis of regression (18), one may construct an F-test statistic FHTR to test the null hypothesis β1 = ⋯ = βJ = μ (Graybill 1976). Under the null hypothesis, FHTR is central to F(J − 1, NJ). Under the alternative hypothesis that at least two βj's are not equal to each other, FHTR is noncentral to F(J − 1, NJ). Assume the sample size N is large enough that the large sample theory applies. Then it can be shown that the corresponding noncentrality parameter is approximated by

graphic file with name M90.gif

where

graphic file with name M91.gif

The advantage of model (17) is that it may model the haplotype effect by parameters βi. In practice, it is necessary to calculate the expected scorings or haplotype frequencies before building the haplotype trend regression model. Instead, the proposed models (13) and (14) may be used to analyze genetic data directly. Moreover, we have derived analytical formulas to calculate the regression coefficients of the HTR method and the related noncentrality parameter of the test statistic FHTR. Note that the original article by Zaykin et al. (2002) did not work out this very useful information. Our analytical coefficient equations and related noncentrality parameter approximations can be readily utilized for power evaluation.

RESULTS

Type I error rates:

To evaluate the robustness of the proposed models, we calculate type I error rates of test statistics Fm,ad, Fm,a, FAB,ad, FAB,a, and FHTR at a 0.05 significance level. The results are presented in Tables 2 and 3. Four test cases are considered: null, no major gene effect a = d = 0; additive, additive mode of inheritance a = 1, but no dominant effect d = 0; dominant, dominant mode of inheritance a = d = 1; and recessive, recessive mode of inheritance a = 1 and d = −0.5. The total variance is fixed as σ2 = 1.0 and the trait allele frequency is taken as q1 = q2 = 0.5 except for that in the null test case. In Table 2, only one marker A is used in analysis; the number m of alleles ranges from 2 to 6. The allele frequencies are given by: Inline graphic when m = 2; Inline graphic when Inline graphic when m = 4; Inline graphic when m = 5; and Inline graphic when m = 6.

TABLE 2.

Type I error rates (percentage) of test statistics Fm,ad and Fm,a at a 0.05 significance level when only one marker A is used in the analysis

Error rates
No. of alleles Sample size Test case Fm,ad Fm,a
Diallele, m = 2 N = 200 Null 4.90 4.93
Additive 5.10 4.89
Dominant 4.75 4.98
Recessive 5.03 5.09
Triallele, m = 3 N = 200 Null 4.94 5.18
Additive 5.03 4.92
Dominant 5.07 5.06
Recessive 4.65 4.85
Quadriallele, m = 4 N = 200 Null 4.89 5.29
Additive 4.72 4.69
Dominant 5.03 4.92
Recessive 4.86 4.85
Five alleles, m = 5 N = 200 Null 4.71 5.14
Additive 4.96 4.49
Dominant 5.02 4.94
Recessive 5.04 4.76
Six alleles, m = 6 N = 200 Null 5.02 5.21
Additive 5.23 4.92
Dominant 9.11 5.16
Recessive 7.04 4.97
Six alleles, m = 6 N = 300 Null 4.91 5.36
Additive 5.08 4.98
Dominant 5.39 4.91
Recessive 5.32 5.11

The total variance is fixed as σ2 = 1.0 and the trait allele frequency is taken as q1 = q2 = 0.5. The number m of alleles ranges from 2 to 6. The allele frequencies are given by: Inline graphic when m = 2; Inline graphic when m = 3; Inline graphic when m = 4; Inline graphic when m = 5; and Inline graphic when m = 6. Four test cases are considered: null, no major gene effect a = d = 0; additive, additive mode of inheritance a = 1, but no dominant effect d = 0; dominant, dominant mode of inheritance a = d = 1; recessive, recessive mode of inheritance a = 1 and d = –0.5. In each test case, linkage equilibrium is assumed between the QTL Q and the marker A; i.e., Inline graphic.

To calculate the type I error rates, 10,000 data sets are simulated for each test case. Each data set contains either 200 or 300 individuals. In each test case in Table 2, the data sets are generated under an assumption of linkage equilibrium between the QTL Q and the marker A; i.e., Inline graphic. That is, there is no association between the QTL Q and marker A. Utilizing the data sets, we fit either model (8) or model (9), and then calculate the F-test Fm,ad or Fm,a. Because the data sets are generated under the assumption of linkage equilibrium, an empirical test statistic that is larger than the cutting point of the related F-statistic at a 0.05 significance level is treated as a false positive. On the basis of the F-test of either Fm,ad or Fm,a, type I error rates are calculated as the proportions of the 10,000 simulation data sets that give significant results at the 0.05 significance level.

For the test statistic Fm,a, the Table 2 results show that the type I error rates are around the 0.05 nominal significance level in all cases. Hence, the proposed model (9) is robust for data sets of a sample size N = 200. For test statistic Fm,ad, the type I error rates are around the 0.05 nominal significance level when m ≤ 5 for data sets of sample size N = 200. For m = 6 and a sample size N = 200, the type I error rates of test Fm,ad are too big for the dominant and recessive test cases (9.11 and 7.04%, respectively). This is partially due to the large degrees of freedom, JA − 1 = m(m + 1)/2 − 1 = 20 of test Fm,ad when m = 6; in addition, the high rate of type I error may be also caused by the mode of inheritance, i.e., for the cases of dominant and recessive models. When the sample size increases to N = 300, the type I error rates of test Fm,ad are around the 0.05 nominal significance level for m = 6. Model (8) is less robust than model (9).

In Table 3, two markers A and B are used in the analysis. The numbers m and n of alleles are equal to 2. The allele frequencies are given by Inline graphic and Inline graphic. In each test case, linkage equilibrium is assumed between the QTL Q and the markers A and B; i.e., Inline graphic. Denote Inline graphic, which is the measure of LD between A and B. Here P(A1B1) is the frequency of haplotype A1B1. Let

graphic file with name M102.gif (20)

be the measure of the third-order LD (Thomson and Baur 1984). Here P(A1Q1B1) is the frequency of haplotype A1Q1B1. Between marker A and marker B, two situations are considered: (1) linkage equilibrium, i.e., Inline graphic, and (2) linkage disequilibrium, i.e., Inline graphic. No linkage disequilibrium of third order is assumed among markers A and B and the QTL Q; that is, DAQB = 0. Again, 10,000 data sets are simulated for each test case, and each data set contains 200 individuals. The simulation is done as follows. First, the haplotype frequencies are calculated on the basis of allele frequencies and LD coefficients by relation (20) (Thomson and Baur 1984). Then data sets are simulated using the haplotype frequencies. On the basis of the F-test of either FAB,ad or FAB,a or the HTR method, type I error rates are calculated as the proportions of the 10,000 simulation data sets that give significant results at the 0.05 significance level. The Table 3 results show that the type I error rates are around the 0.05 nominal significance level in all cases. Hence, the proposed models (13) and (14) and the HTR method are robust for data sets of a sample size N = 200.

TABLE 3.

Type I error rates (percentage) of test statistics FAB,ad, FAB,a, and FHTR of the haplotype trend regression (HTR) method at a 0.05 significance level when two markers A and B are used in the analysis

LD measure Inline graphic
Error Rates
Sample size Test case FAB,ad FAB,a FHTR
0 N = 200 Null 4.90 5.22 5.39
Additive 5.09 4.75 4.77
Dominant 4.62 4.87 4.79
Recessive 5.36 5.12 4.81
0.08 N = 200 Null 5.09 5.23 5.55
Additive 4.92 4.74 4.71
Dominant 4.63 4.84 4.71
Recessive 5.04 5.02 4.94

The total variance is fixed as σ2 = 1.0 and the trait allele frequency is taken as q1 = q2 = 0.5. The numbers m and n of alleles = 2. The allele frequencies are given by Inline graphic and Inline graphic. Four test cases are considered: null, no major gene effect a = d = 0; additive, additive mode of inheritance a = 1, but no dominant effect d = 0; dominant, dominant mode of inheritance a = d = 1; recessive, recessive mode of inheritance a = 1 and d = –0.5. In each test case, linkage equilibrium is assumed between the QTL Q and the markers A and B; i.e., Inline graphic. No linkage disequilibrium of third order is assumed among markers A and B and the QTL Q; that is, DAQB = 0.

Table 4 shows type I error rates (percentages) of test statistics FABC,ad, FABC,a, and FHTR at a 0.05 significance level when three diallelic markers A, B, and C are used in the analysis. The measures DABC, DAQC, and DBQC of the third-order LD are defined as that of DAQB; the measure of the fourth order is defined accordingly (Bennett 1954). Such as relation (20), the haplotype frequencies at the three markers A, B, and C and at QTL Q are calculated on the basis of allele frequencies and LD coefficients by Weir's (1996, p. 119) relation (3.14). Then data sets are simulated using the haplotype frequencies. Since this article is about population data, one individual may have two copies of haplotypes. Each haplotype is sampled according to the haplotype frequencies. From the Table 4 results, we can see that the proposed models and the HTR method give correct type I errors for data sets of a sample size N = 200.

TABLE 4.

Type I error rates (percentage) of test statistics FABC,ad, FABC,a, and FHTR of the haplotype trend regression (HTR) method at a 0.05 significance level when three diallelic markers A, B, and C are used in the analysis

LD measure Inline graphic
Error rates
Sample size Test case FABC,ad FABC,a FHTR
0.08 N = 200 Null 5.2 5.35 5.43
Additive 4.98 4.85 4.74
Dominant 4.31 4.68 4.62
Recessive 5.29 5.3 5.27
0.06 N = 200 Null 5.24 5.41 5.39
Additive 5.15 4.89 4.71
Dominant 4.61 5.0 5.03
Recessive 5.09 4.94 5.08

The total variance is fixed as σ2 = 1.0 and the trait allele frequency is taken as q1 = q2 = 0.5. The allele frequencies are given by Inline graphic, Inline graphic, and Inline graphic. Four test cases are considered: null, no major gene effect a = d = 0; additive, additive mode of inheritance a = 1, but no dominant effect d = 0; dominant, dominant mode of inheritance a = d = 1; recessive, recessive mode of inheritance a = 1 and d = –0.5. In each test case, linkage equilibrium is assumed between the QTL Q and the markers A, B, and C; i.e., Inline graphic. Moreover, neither third- nor fourth-order linkage disequilibrium is assumed; i.e., DABC = DAQB = DAQC = DBQC = DABCQ = 0.

Power calculation and comparison:

Let h2 = σga22 be the heritability. Figure 1 shows power curves of the test statistics F4,a, F4,ad, F2,a, and F2,ad against the disequilibrium coefficient Inline graphic for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level based on the approximations of noncentrality parameters λm,a and λm,ad. F4,a and F4,ad are calculated when A has four equal frequency alleles; i.e., Inline graphic. In addition, the measures of LD are given as follows: Figure 1, A and B, Inline graphic, and Figure 1, C and D, Inline graphic. F2,a and F2,ad are calculated by collapsing the four alleles to be two alleles: in Figure 1, A and C, alleles A1 and A2 are collapsed as one allele, and alleles A3 and A4 are collapsed to be the other; in Figure 1, B and D, alleles A1 and A3 are collapsed to be one allele, and alleles A2 and A4 are collapsed to be the other. For F2,a and F2,ad, a simple calculation can show that the measures of LD in Figure 1A are 0, 0; the measures of LD in Figure 1B are Inline graphic; the measures of LD in Figure 1C are 0, 0; and the measures of LD in Figure 1D are Inline graphic. Hence, the QTL Q is in linkage equilibrium with the marker after collapsing the alleles in Figure 1, A and C. The other parameters are q1 = 0.50, h2 = 0.25, N = 200.

Figure 1.

Figure 1.

Power curves of the test statistics F4,ad, F4,a, F2,ad, and F2,a against the disequilibrium coefficient Inline graphic for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level. F4,ad and F4,a are calculated when marker A has four equal frequency alleles; i.e., Inline graphic. The measures of LD are (A and B) Inline graphic and (C and D) Inline graphic. F2,ad and F2,a are calculated by collapsing two of the four alleles: (A and C) alleles A1 and A2 are collapsed as one allele, and alleles A3 and A4 are collapsed to be the other; (B and D) alleles A1 and A3 are collapsed to be one allele, and alleles A2 and A4 are collapsed to be the other. The other parameters are q1 = 0.50, h2 = 0.25, N = 200.

From Figure 1, we may see the following:

  1. F4,ad is slightly less powerful than F4,a, and F2,ad is slightly less powerful than F2,a. This is because that test statistic Fm,ad has larger degrees of freedom than those of Fm,a. Note that the noncentrality parameter approximation λm,ad of Fm,ad is given by Equation 10. The contribution of the dominance effect is Inline graphic, which depends on both dominance effect d and the magnitude of factor Inline graphic and it can be significant when both of them are large enough. Hence, including a dominance component in the model can improve the power of QTL detection only when the magnitude of Inline graphic is large enough to compensate for the extra degrees of freedom. Note that the quantity Inline graphic is the product of the dominance variance Inline graphic and of the measure RAQ4 of LD. The magnitude of Inline graphic is the result of the dominance variance Inline graphic reduced by a factor Inline graphic. Even when Inline graphic is large, Inline graphic can be small when LD coefficients are not big; i.e., Inline graphic is small.

  2. When the measures of LD are high, the power of the test statistics is high. On the other hand, the power is minimal if all measures of LD are close to 0.

  3. The dependence of power on measures of LD can also be observed by comparing Figure 1A with Figure 1C, 1B with 1D. The power of F4,ad and F4,a in Figure 1A is higher than that of F4,ad and F4,a in Figure 1C, respectively; the power of each test statistic in Figure 1B is higher than that of the same test statistic in Figure 1D. This is because the measures of LD in Figure 1A are equal to or higher than those in Figure 1C, and the measures of LD in Figure 1B are equal to or higher than those in Figure 1D.

  4. In Figure 1B and Figure 1D, the power of F4,ad is slightly lower than that of F2,ad; the power of F4,a is slightly lower than that of F2,a.

  5. In Figure 1A and Figure 1C, the power of F2,ad and F2,a is minimal. This is because measures of LD are 0 after collapsing the alleles in these two graphs.

Figure 2 shows power curves of the test statistics F4,a, F4,ad, F3,a, and F3,ad against the disequilibrium coefficient Inline graphic for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level. F4,a and F4,ad are calculated as those in Figure 1. F3,a and F3,ad are calculated by collapsing two of the four alleles to be a new alelle: in Figure 2, A and C, alleles A1 and A2 are collapsed as a new one; in Figure 2, B and D, alleles A1 and A3 are collapsed to be a new one. For F3,a and F3,ad, a simple calculation can show that the measures of LD in Figure 2A are Inline graphic the measures of LD in Figure 2B are Inline graphic the measures of LD in Figure 2C are Inline graphic and the measures of LD in Figure 2D are Inline graphic. Among the features shown in Figure 1, it can be seen that in Figure 2, A and C, the power of F4,ad is higher than that of F3,ad, and the power of F4,a is higher than that of F3,a. In Figure 2, B and D, the power of F4,ad is slightly lower than that of F3,ad, and the power of F4,a is slightly lower than that of F3,a. Hence, the way to collapse the alleles has impact on power.

Figure 2.

Figure 2.

Power curves of the test statistics F4,a, F4,ad, F3,a, and F3,ad, against the disequilibrium coefficient Inline graphic for a dominant mode of inheritance a = d = 1.0 at a 0.05 significance level. F4,ad and F4,a are calculated when marker A has four equal frequency alleles; i.e., Inline graphic. The measures of LD are the same as those in Figure 1. F3,ad and F3,a are calculated by collapsing two of the four alleles: (A and C) alleles A1 and A2 are collapsed as a new one; (B and D) alleles A1 and A3 are collapsed to be a new one. The other parameters are q1 = 0.50, h2 = 0.25, N = 200.

From Figures 1 and 2, we may see that the power of F4,a and F4,ad is relatively stable although it may be slightly lower than that of F3,a, F3,ad, F2,a, and F2,ad in certain circumstances. However, the power of F3,a, F3,ad, F2,a, and F2,ad depends heavily on the way to collapse the alleles. This shows the advantage of using multiallelic markers in an association study of QTL detection. For multiallelic marker data, the proposed test statistics Fm,a and Fm,ad can be directly used to test if there is association between the marker and the QTL. As shown in Figures 1 and 2, the test statistic Fm,a is usually more powerful than Fm,ad due to the increase of degrees of freedom of test statistic Fm,ad.

Figure 3 shows power curves of the test statistics F4,a and F4,ad against the heritability h2 at a 0.05 significance level for a dominant mode of inheritance a = d = 1.0 and for a recessive mode of inheritance a = 1.0, d = −0.5, respectively. As with Figures 1 and 2, Figure 3 is based on noncentrality parameter approximations (10) and (11). In Figure 3, A and B, the power can be high as the heritability h2 > 0.1; in these two graphs, the measures of LD are given by Inline graphic. In Figure 3, C and D, the power can be high as the heritability h2 > 0.15; in these two graphs, the measures of LD are given by Inline graphic. Figure 4 shows power curves of the test statistics F4,a and F4,ad against the trait allele frequency q1 or marker allele frequency Inline graphic at a 0.05 significance level. It can be seen that the power depends on both the measures of linkage disequilibrium and the trait allele frequency q1 or marker allele frequency Inline graphic.

Figure 3.

Figure 3.

Power curves of the test statistics F4,a and F4,ad against the heritability h2 at a 0.05 significance level. (A and C) The curves are plotted for a dominant mode of inheritance a = d = 1.0; (B and D) the curves are plotted for a recessive mode of inheritance a = 1.0, d = −0.5. F4,a and F4,ad are calculated when marker A has four equal frequency alleles; i.e., Inline graphic. The measures of LD are given as follows: (A and B) Inline graphic (C and D) Inline graphic. The other parameters are q1 = 0.50 and N = 250.

Figure 4.

Figure 4.

Power curves of the test statistics F4,a and F4,ad against the trait allele frequency q1 or allele frequency Inline graphic at a 0.05 significance level. (A and C) The curves are plotted for a dominant mode of inheritance a = d = 1.0; (B and D) the curves are plotted for a recessive mode of inheritance a = 1.0, d = −0.5. (A and B) The parameters are given by Inline graphic, Inline graphic (C and D) the parameters are given by Inline graphic, Inline graphic. The other parameters are h2 = 0.15 and N = 250.

Comparison with the haplotype trend regression method:

Assume that the two diallelic markers A and B are used in the analysis. Figures 5 and 6 show power curves of the test statistics FAB,a, FHTR, and FAB,ad against the heritability h2 at a 0.05 significance level. The related parameters are given in the figure legends. The power curves of the test statistics FAB,a, FHTR, and FAB,ad are calculated on the basis of approximations of noncentrality parameters λABa, λHTR, and λABad.

Figure 5.

Figure 5.

Power curves of the test statistics FAB,a and FAB,ad and FHTR of the haplotype trend regression method against the heritability h2 at a 0.05 significance level, when two diallelic markers A and B are used in the analysis. (A and C) The curves are plotted for a dominant mode of inheritance a = d = 1.0; (B and D) the curves are plotted for an additive mode of inheritance a = 1.0, d = 0. (A and B) The parameters are given by Inline graphic (C and D) the parameters are given by Inline graphic. The other parameters are Inline graphic and N = 200.

Figure 6.

Figure 6.

Power curves of the test statistics FAB,a and FAB,ad and FHTR of the haplotype trend regression method against the heritability h2 at a 0.05 significance level, when two diallelic markers A and B are used in analysis. All parameters are the same as those in Figure 5 except that (A and B) Inline graphic and (C and D) Inline graphic.

In Figure 5, no third-order linkage disequilibrium is assumed; i.e., DAQB = 0. In Figure 6, A and B, weak third-order linkage disequilibrium is assumed; i.e., DAQB = 0.025. It can be seen that the genotype effect model can be less powerful than the HTR method, and the HTR method can be less powerful than the additive effect model in the case of no or weak third-order linkage disequilibrium among the two markers and the QTL (Figure 5 and Figure 6, A and B). In Figure 6, C and D, strong third-order linkage disequilibrium is assumed; i.e., DAQB = 0.065. In the case that strong third-order linkage disequilibrium exists, the HTR method can be more powerful (Figure 6, C and D).

Note the following fact: in Figure 6, A and B, the maximum of DAQB is 0.025; in Figure 6, C and D, the maximum of DAQB is 0.065 (otherwise, some of the haplotype would have negative frequencies). Thus, the simulated power curves of the haplotype trend regression method in Figures 5 and 6 represent the two extreme situations: (1) no third-order linkage disequilibrium (Figure 5) and (2) strongest third-order linkage disequilibrium (Figure 6). In practice, the third-order linkage disequilibrium would exist in a more moderate way that is between the two extremes; and the power of the haplotype trend regression method should be between those of the two extremes. Note that the proposed genotype effect model and additive effect model utilize only the second-order linkage disequilibrium or pairwise linkage disequilibrium. Hence, the powers of FAB,a and FAB,ad are the same for Figures 5 and 6.

Figure 7 shows power curves of the test statistics FABC,a and FABC,ad and FHTR against the heritability h2 at a 0.05 significance level, when three diallelic markers A, B, and C are used in the analysis. The related parameters are given in the figure legend. From Figure 7, it can be seen that the power of FHTR is the lowest. This is due to the large number of degrees of freedom of FHTR, which is F(7, N − 8), N = 200. In contrast, FABC,a is F(3, N − 4), N = 200; and FABC,a is F(6, N − 7), N = 200. The low power of FHTR is most likely due to the biallelic QTL situation that we consider. In the situation of multiple QTL haplotypes and strong LD between QTL and marker haplotypes, the haplotype-based methods are expected to have good power.

Figure 7.

Figure 7.

Power curves of the test statistics FABC,a and FABC,ad and FHTR of the haplotype trend regression method against the heritability h2 at a 0.05 significance level, when three diallelic markers A, B, and C are used in the analysis. (A and C) The curves are plotted for a dominant mode of inheritance a = d = 1.0; (B and D) the curves are plotted for an additive mode of inheritance a = 1.0, d = 0. (A and B) The parameters are given by DAQ = DBQ = DCQ = DDQ = 0.12, DAB = DAC = DBC = 0.08; (C and D) the parameters are given by DAQ = DBQ = DCQ = DDQ = 0.10, DAB = DAC = DBC = 0.06. Neither third- nor fourth-order linkage disequilibrium is assumed among markers and the QTL. The other parameters are Inline graphic and N = 200.

Comparison based on ACE haplotype frequencies:

To work on more realistic scenarios, we take the haplotype information of ACE genes as an example. Ten diallelic polymorphisms in the ACE gene spanning 26 kb were genotyped (Keavney et al. 1998). The order of the 10 polymorphisms is T-5991C, A-5466C, T-3892C, A-240T, T-93C, T1237C, G2215A, I/D, G2350A, and 4656(CT)3/2. Table 5 lists 10 haplotypes, where the first 7 are the most frequent haplotypes (http://www.well.ox.ac.uk/∼mfarrall/oxhap_freq.html). For the 10 haplotypes, allele I at marker I/D is always present with allele A at marker G2350A, and allele D at marker I/D is always present with allele G at marker G2350A. Hence, the two markers can be treated as one. Similarly, markers T-5991C and A-5466C can be treated as one; and markers A-240T and T-93C can be treated as one. Therefore, the 10 haplotypes can be considered as containing seven markers.

TABLE 5.

Ranked ACE haplotype frequencies

Haplotype rank Haplotype identity Haplotype code Frequency
1 TATATTGIA3 1111112111 0.352113
2 CCCTCCADG2 2222221222 0.284507
3 TATATCADG2 1111121222 0.087324
4 TACATCADG2 1121121222 0.073239
5 TATATCGIA3 1111122111 0.050704
6 CCCTCCGDG2 2222222222 0.025352
7 TATATTAIA3 1111111111 0.025352
8 CCCTCCGIA3 2222222111 0.008451
9 CCCTCCADG3 2222221221 0.008451
10 TATATCGDG2 1111122222 0.008451

In Abecasis et al. (2000a,b) and Fan et al. (2005), it is found that that markers I/D and G2350A show strongest association with the circulating ACE level. Thus, markers I/D and G2350A are treated as a putative trait locus Q. A quantitative trait of the putative locus Q is simulated for each graph in Figure 8, A–D. The empirical power curves of the test statistics FHTR, Fa, and Fad are plotted against the heritability h2 at a 0.05 significance level in Figure 8. Here Fa is the test statistic based on the additive effect model, and Fad is the test statistic based on the genotype effect model. The empirical power curves SFHTR, SFa, and SFad in Figure 8 are calculated as follows. First, the interval (0.01, 0.25) of the heritability h2 is divided into 24 subintervals. Correspondingly, the 24 subintervals lead to 25 end points. For each end point, there is a set of parameters for the power curve. Using the set of parameters, 2500 data sets are simulated for each end point. For each data set, empirical statistics of FHTR, Fa, and Fad are calculated. The simulated power is the proportion of the 2500 simulated data sets for which the empirical statistic is larger than the cutting point of the corresponding F-distribution at a 0.05 significance level.

Figure 8.

Figure 8.

Empirical power curves of the test statistics FHTR, Fa, and Fad against the heritability h2 at a 0.05 significance level. The notation SFa is the empirical power of the F-test statistic based on the additive effect model, SFad is the empirical power of the F-test statistic based on the genotype effect model, and SFHTR is the empirical power of the F-test statistic based on HTR. (A and C) The curves are plotted for a dominant mode of inheritance a = d = 1.0; (B and D) the curves are plotted for an additive mode of inheritance a = 1.0, d = 0. (A and B) Ten haplotypes are used in the simulations; (C and D) 7 haplotypes are used.

In Figure 8, A and C, the curves are plotted for a dominant mode of inheritance a = d = 1.0; in Figure 8, B and D, the curves are plotted for an additive mode of inheritance a = 1.0, d = 0. In Figure 8, A and B, all 10 haplotypes are used in the simulations; in Figure 8, C and D, only the first 7 most frequent haplotypes are used. From Figure 8, A–D, it can be seen that the proposed additive effect model has similar power to that of the HTR method. In Figure 8, A and C, when the dominance effects are present, the genotype effect model has similar power to those of the additive effect model and the HTR method. In Figure 8, B and D, the genotype effect model is less powerful because of the absence of the dominance effect. Hence, the genotype effect model can be useful only if the dominance effect can compensate for the extra degrees of freedom.

Simulation study:

To evaluate the accuracy of the noncentrality parameter approximations, we performed simulations for the power curves in Figures 1, 2, 5, 6, and 7. The results are presented as supplemental information (http://www.genetics.org/supplemental/). It can be seen that the approximations are excellent.

DISCUSSION

In this article, two models, the genotype effect model and the additive effect model, are proposed for high-resolution association mapping of QTL on the basis of population data. The two models extend our previous research, which is based on multiple diallelic markers (Fan and Xiong 2002, 2003; Jung et al. 2005). The genotype effect model is closely linked to the measured genotype approach (Boerwinkle et al. 1986). The very popular genetics software such as Mendel 5.0 is already capable of performing association mapping of QTL by the additive effect model (Cantor et al. 2005; Lange et al. 2005). Surprisingly, there is no research to theoretically show why these two models are valid methods in association mapping of QTL under normal distribution. There are no existing analytical formulas to evaluate the power of the related test statistics. This article shows that the model coefficients are functions of measures of LD; and thus related F-test statistics can be constructed for association study of QTL. In the presence of both additive and dominance effects of the QTL, either the Fm,ad (or FAB,ad) statistic or the Fm,a (or FAB,a) statistic can be used. Since the Fm,ad (or FAB,ad) test statistic has bigger degrees of freedom than those of Fm,a (or FAB,a), Fm,a (or FAB,a) can be more powerful. If the extra degrees of freedom of the Fm,ad test can be compensated by magnitude Inline graphic, it can be more powerful than Fm,a.

The formulas of noncentrality parameter approximations (10) and (11) clearly indicate the dependence of the power on the quantity RAQ2 for genetic data. That is, the noncentrality parameter of test statistics of the null hypothesis of no genetic effects is reduced by a factor of Inline graphic for additive variance and by a factor of Inline graphic for dominance variance. If only one diallelic marker A is used in the analysis, both our previous research and the work of colleagues have derived similar formulas to support this argument (Sham et al. 2000; Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan et al. 2005; Jung et al. 2005). This is a good example in the debate on appropriate measures of LD for markers or multiallelic markers (Hedrick 1987; Devlin and Risch 1995; Pritchard and Przeworski 2001; Weiss and Clark 2002). For multiallelic markers or haplotypes, a satisfactory measure of LD has not been derived, as mentioned regarding p306 in Ardlie et al. (2002). For two diallelic loci A and Q, Ardlie et al. (2002) favor using Inline graphic, which is the correlation of alleles at the two loci. For multiallelic marker data, this article extends previous research by providing the definition of RAQ2 and deriving Equations 10 and 11. Hayes et al. (2003) introduced a multilocus approach for estimating LD and past effective size and used chromosome segment homozygosity (CSH), which was introduced in Sved (1971). The dependence of the noncentrality parameter on the quantity Inline graphic has been indicated by our study and also by Sham et al. (2000).

In Fulker et al. (1999), Abecasis et al. (2000a,b, 2001), and Sham et al. (2000), an association between-family and association within-family (“AbAw”) approach is proposed to decompose the genetic association into effects of between pairs and within pairs on the basis of variance component models. The AbAw approach is based on any single diallelic marker. Instead of using a single diallelic marker, we have proposed variance component models using multiple diallelic markers. In our models, the association is decomposed into additive and dominance components (Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan et al. 2005; Jung et al. 2005). In Fan and Jung (2003), Fan et al. (2005), and Jung et al. (2005), we compare our method with the AbAw approach and find that our method is advantageous over the AbAw approach. In model (1) or (2), only one marker is used in model building. If multiple markers or multiallelic markers are available, it is very easy to generalize the models to analyze the data. For instance, model (14) generalizes model (1) if two markers are available in the analysis. Accordingly, model (13) generalizes model (2). If only one marker is used in analysis, the proposed model (2) is equivalent to the haplotype trend regression method by Zaykin et al. (2002), which is very close to the method of Schaid et al. (2002). However, the proposed models are different from the haplotype trend regression method for two/multiple marker data. If both markers are diallelic markers, the genotype effect model can be less powerful than the HTR method, and the HTR method can be less powerful than the additive effect model in the case of no or weak third-order linkage disequilibrium among the two markers and the QTL. If strong third-order linkage disequilibrium exists, the HTR method can be more powerful.

Basically, the proposed models are genotype based. The models can be used to analyze directly any number of markers, and the markers can be either diallelic or multiallelic. By a simulation study based on ACE haplotype frequencies, we show that the proposed additive effect models have similar power to that of the haplotype-based HTR method. In the meantime, the proposed models enjoy the simplicity of not needing to estimate the expected haplotype scorings; in contrast, the HTR method needs to calculate the expected haplotype scorings before building the models. The proposed models decompose the main marker effects into a summation of additive and dominance effects. In the presence of haplotype effects, it is important to estimate the haplotype effects and haplotype-based methods are more relevant (Stram et al. 2003; Tregouet et al. 2004).

One potential problem of this generalization is that the number of parameters can be very big. Then, one needs to select important alleles in the analysis and search for important genetic variants that are truly associated with the genetic traits. At first glance, model (1), (2), (13), or (14) seems too complicated and contains too many terms. However, the models are not intimidating at all if one takes into account the recent discovery of haplotype structure in the human genome. Although a haplotype block may contain many SNPs, it takes only a few SNPs to uniquely identify each of the haplotypes in the block. Within a block, there are only two to four common haplotypes (Arnheim et al. 2003; Daly et al. 2001; Goldstein 2001; Patil et al. 2001; Reich et al. 2001; Rioux et al. 2001; J. C. Stephens et al. 2001; Gabriel et al. 2002; Nordborg and Tavaré 2002; Phillips et al. 2003). This implies that model (1), (2), (13), or (14) contains a few terms and hence is manageable. Moreover, model (1) or (2) already takes the haplotype structure into account and is potentially more powerful. In practice, one may want to collapse some alleles to reduce the number of parameters. However, the collapsing process may decrease linkage disequilibrium and therefore result in loss of power. The proposed regression models can be fitted to alleviate the problem.

In the mathematical derivations, we make the assumption of HWE. It is unclear how to construct tests reflecting deviations from HWE and this requires further research. In addition, we illustrate that the false-positive rate of the genotype effect test is too high for more than five alleles in a sample of 200 individuals. This is obviously due to the large numbers of possible genotypes and hence to sparseness in the contingency table. This problem could be overcome by using exact tests or permutation procedures.

The models of this article are based on population data. Suppose that both population and pedigree data including sibships are available. Then, model (1) or (2) can be generalized to perform high-resolution combined LD mapping and a linkage study of QTL by variance component models in the spirit of our previous work. In fact, we may generalize regression (1) or (2) by adding the polygenic effect to fit the data. Moreover, log-likelihoods can be constructed on the basis of variance component models. This will generalize our research by using either diallelic/multiallelic markers or haplotypes in a combined analysis of population and pedigree data. It is well known that association study-based population data are prone to false positives, due to the population stratification and population history. A valid approach would be to find linkage information by using pedigree data to locate the QTL on a broad chromosome region. Then, a combined linkage and association mapping can be performed for fine mapping of the genetic traits on the basis of both population and pedigree data (Fan and Xiong 2003). This would be more likely to overcome the drawbacks of separate analysis of either a linkage study or association mapping: low resolution of linkage analysis and high false-positive rates in the association study. In the meantime, it is more likely to take advantage of the two methods: the low false-positive rates of linkage analysis and the high resolution of the association-mapping method.

Acknowledgments

We thank two anonymous reviewers for very detailed and thoughtful critiques, which make the paper better. R. Fan was supported by the National Science Foundation Grant DMS-0505025.

APPENDIX A

For an individual of a population with trait values y and genotype GA at marker A, let xii be an indicator function of genotype AiAi and xij be an indicator function of genotype AiAj. That is, they are dummy variables defined by

graphic file with name M180.gif

where i, j = 1, 2, …, m, ij. Then model (1) can be rewritten as

graphic file with name M181.gif (A1)

Note that Inline graphic. Given Equation A1, taking expectation of yxii leads to Inline graphic. On the other hand, a true random-effect model describing the trait value is y = wγ + g + e, where

graphic file with name M184.gif

Utilizing Inline graphic and Inline graphic gives

graphic file with name M187.gif (A2)

Equating the above quantity to Inline graphic shows Equation 3 when i = j.

If ij, Inline graphic. Multiplying at both sides of Equation A1 by xij and taking the expectation lead to E(yxij) = E(xij)[wγ + βij]. Again, utilizing Inline graphic, Inline graphic, Inline graphic, and Inline graphic gives

graphic file with name M194.gif (A3)

Equating the above quantity to Inline graphic shows Equation 3 when ij.

APPENDIX B

For an individual with trait values y and genotypes GA at marker A, let zi be the number of alleles Ai of genotype GA, i = 1, 2, …, m. That is, zi is a dummy variable defined by

graphic file with name M196.gif

Then model (2) can be rewritten as

graphic file with name M197.gif (B1)

Multiplying both sides of expression (B1) by zi and taking the expectation lead to

graphic file with name M198.gif (B2)

The elements of the matrix on the left-hand side of the above equation can be calculated as follows: Inline graphic. For ij, the expectation Inline graphic. For the elements on the right-hand side, Equations A2 and A3 lead to Inline graphic, since Inline graphic. Plugging the above quantities into matrix Equation B2 gives Equation 4 as

graphic file with name M203.gif

where diag(…) denotes a diagonal matrix; e.g., Inline graphic is

graphic file with name M205.gif

In the above calculation, we use a fact of the inverse matrix (M + abτ)−1 = M−1 − (M−1a)(bτM−1)/(1 + bτM−1a).

APPENDIX C

Denote a vector Inline graphic. If the sample size N is large enough, the large number law implies the approximation

graphic file with name M207.gif (C1)

where Inline graphic is a diagonal matrix, whose elements on the diagonal are given by the elements of Inline graphic. That is, if Inline graphic, then Inline graphic. Let H be a (JA − 1) × JA matrix defined by

graphic file with name M212.gif

Then, (Hη)τ = (β11 − β22, …, β11 − βmm, β11 − β12, …, β11 − β1m, …, β11 − βm−1,m). From approximation (C1), we have the approximation

graphic file with name M213.gif
graphic file with name M214.gif

where Inline graphic. Applying a fact of inverse matrix (M + abτ)−1 = M−1 − (M−1a)(bτM−1)/(1 + bτM−1a) again, we have

graphic file with name M216.gif

The noncentrality parameter is given by

graphic file with name M217.gif (C2)

From Equation 3, we have

graphic file with name M218.gif

Utilizing relation Inline graphic, we have

graphic file with name M220.gif

Plugging the above equation into (C2), we have

graphic file with name M221.gif

Note that Inline graphic, and so Inline graphic. Hence, the noncentrality parameter approximation (10) is valid.

APPENDIX D

The large number law implies the following approximation:

graphic file with name M224.gif

In the above approximation, the quantities E(zizj) in appendix b are used. Applying a fact of inverse matrix (M + abτ)−1 = M−1 − (M−1a)(bτM−1)/(1 + bτM−1a), the inverse is

graphic file with name M225.gif

Let K be a (m − 1) × m matrix defined by

graphic file with name M226.gif

Then, (Kψ)τ = (α1 − α2, …, α1 − αm). On the other hand, we have the approximation

graphic file with name M227.gif

whose inverse is given by

graphic file with name M228.gif

Therefore, an approximation of the noncentrality parameter is given by

graphic file with name M229.gif

Equation 4 implies that Inline graphic. Thus, the noncentrality parameter

graphic file with name M231.gif

APPENDIX E

For i = 1, 2, …, m, k = 1, …, n, let us denote Inline graphic, which are measures of LD between markers A and B. Here P(AiBk) is frequency of haplotype AiBk. It can be shown that for ij, kl, jj′, ll′, (i, j) ≠ (i′, j′), (kl) ≠ (k′, l′),

graphic file with name M233.gif (E1)

The quantities in (E1) imply that

graphic file with name M234.gif

Since EZAB is a vector of 0's by the quantities in (E1), it can be shown that VD = Cov(ZAB, ZAB) = E(ZABZABτ). Moreover, the quantities in (E1) imply that the covariance matrix Cov(XAB, ZAB) is a 0 matrix.

Taking variance–covariance between y and xAi, xBk, zAij, zBkl on the basis of relation (14), we may get the regression coefficients (15) of models (13) and (14).

APPENDIX F

Multiplying both sides of expression (18) by Ij and taking the expectation lead to

graphic file with name M235.gif (F1)

The elements of the matrix on the left-hand side of the above equation can be calculated as follows:

graphic file with name M236.gif

The elements on the right-hand side are given by

graphic file with name M237.gif

where

graphic file with name M238.gif

Plugging the above quantities into matrix Equation F1 gives Equation 19.

References

  1. Abecasis, G. R., L. R. Cardon and W. O. C. Cookson, 2000. a A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66: 279–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2000. b Pedigree tests of linkage disequilibrium. Eur. J. Hum. Genet. 8: 545–551. [DOI] [PubMed] [Google Scholar]
  3. Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2001. The power to detect linkage disequilibrium with quantitative traits in selected samples. Am. J. Hum. Genet. 68: 1463–1474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ardlie, K. G., L. Kruglyak and M. Seielsstad, 2002. Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3: 299–309. [DOI] [PubMed] [Google Scholar]
  5. Arnheim, N., P. Calabrese and M. Nordborg, 2003. Review article: hot and cold spots of recombination in the human genome: the reason we should find them and how this can be achieved. Am. J. Hum. Genet. 73: 5–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bennett, J. H., 1954. On the theory of random mating. Ann. Eugen. 18: 311–317. [DOI] [PubMed] [Google Scholar]
  7. Boerwinkle, E., E. Chakraborty and C. F. Sing, 1986. The use of measured genotype information in the analysis of quantitative phenotype in man. I. Models and analytical methods. Ann. Hum. Genet. 50: 181–194. [DOI] [PubMed] [Google Scholar]
  8. Cantor, R. M., G. K. Chen, P. Pajukanta and K. Lange, 2005. Association testing in a linked region using large pedigrees. Am. J. Hum. Genet. 76: 538–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Clayton, D., J. Chapman and J. Cooper, 2004. The use of unphased multilocus genotype data in indirect association studies. Genet. Epidemiol. 27: 415–428. [DOI] [PubMed] [Google Scholar]
  10. Crow, J. F., and M. Kimura, 1970. An Introduction to Population Genetics Theory. Harper & Row, New York.
  11. Daly, M. J., J. D. Rioux, S. F. Schaffner, T. J. Hudson and E. S. Lander, 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229–232. [DOI] [PubMed] [Google Scholar]
  12. Dempster, A. P., N. M. Laird and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39: 1–38. [Google Scholar]
  13. Devlin, B., and N. Risch, 1995. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29: 311–322. [DOI] [PubMed] [Google Scholar]
  14. Falconer, D. S., and T. F. C. Mackay, 1996. Introduction to Quantitative Genetics, Ed. 4. Longman, London. [DOI] [PMC free article] [PubMed]
  15. Fan, R. Z., and J. S. Jung, 2003. High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum. Hered. 56: 166–187. [DOI] [PubMed] [Google Scholar]
  16. Fan, R. Z., and M. M. Xiong, 2002. High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur. J. Hum. Genet. 10: 607–615. [DOI] [PubMed] [Google Scholar]
  17. Fan, R. Z., and M. M. Xiong, 2003. Combined high resolution linkage and association mapping of quantitative trait loci. Eur. J. Hum. Genet. 11: 125–137. [DOI] [PubMed] [Google Scholar]
  18. Fan, R. Z., C. Spinka, L. Jin and J. S. Jung, 2005. Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur. J. Hum. Genet. 13: 216–231. [DOI] [PubMed] [Google Scholar]
  19. Fulker, D. W., S. S. Cherny, P. C. Sham and J. K. Hewitt, 1999. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64: 259–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gabriel, S. B., S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy et al., 2002. The structure of markers in the human genome. Science 296: 2225–2229. [DOI] [PubMed] [Google Scholar]
  21. George, V., H. K. Tiwari, X. F. Zhu and R. C. Elston, 1999. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am. J. Hum. Genet. 65: 236–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Goldstein, G. B., 2001. Islands of linkage disequilibrium. Nat. Genet. 29: 109–111. [DOI] [PubMed] [Google Scholar]
  23. Graybill, F. A., 1976. Theory and Application of the Linear Model. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.
  24. Hayes, B. J., P. M. Visscher, H. C. McPartlan and M. E. Goddard, 2003. Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 13: 635–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hedrick, P. W., 1987. Gametic disequilibrium measures: proceed with caution. Genetics 117: 331–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. International HapMap Consortium, 2003. The International HapMap Project. Nature 426: 789–796. [DOI] [PubMed] [Google Scholar]
  27. International SNP Map Working Group, 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933. [DOI] [PubMed] [Google Scholar]
  28. Jung, J. S., R. Z. Fan and L. Jin, 2005. Combined linkage and association mapping of quantitative trait loci by multiple markers. Genetics 170: 881–898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Keavney, B., C. A. McKenzie, J. M. Connell, C. Julier, P. J. Ratcliffe et al., 1998. Measured haplotype analysis of the angiotension-1 converting enzyme gene. Hum. Mol. Genet. 7: 1745–1751. [DOI] [PubMed] [Google Scholar]
  30. Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson et al., 2002. A high resolution recombination map of the human genome. Nat. Genet. 31: 241–247. [DOI] [PubMed] [Google Scholar]
  31. Lange, K., J. S. Sinsheimer and E. Sobel, 2005. Association testing with Mendel. Genet. Epidemiol. 29: 36–50. [DOI] [PubMed] [Google Scholar]
  32. Meuwissen, T. H. E., and M. E. Goddard, 2000. Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155: 421–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Morris, A. P., J. C. Whittaker and D. J. Balding, 2004. Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data. Am. J. Hum. Genet. 74: 945–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Morton, N. E., and D. Wu, 1988. Alternative bioassays of kinship between loci. Am. J. Hum. Genet. 42: 173–177. [PMC free article] [PubMed] [Google Scholar]
  35. Nielsen, D. M., and B. S. Weir, 1999. A classical setting for associations between markers and loci affecting quantitative traits. Genet. Res. 74: 271–277. [DOI] [PubMed] [Google Scholar]
  36. Nielsen, D. M., and B. S. Weir, 2001. Association studies under general disease models. Theor. Popul. Biol. 60: 253–263. [DOI] [PubMed] [Google Scholar]
  37. Nordborg, M., and S. Tavaré, 2002. Linkage disequilibrium: what history has to tell us. Trends Genet. 18: 83–90. [DOI] [PubMed] [Google Scholar]
  38. Patil, N. P., A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi et al., 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294: 1719–1723. [DOI] [PubMed] [Google Scholar]
  39. Phillips, M. S., R. Lawrence, R. Sachidanandam, A. P. Morris, D. J. Balding et al., 2003. Chromosome-wide distribution of markers and the role of recombination hot spots. Nat. Genet. 33: 382–387. [DOI] [PubMed] [Google Scholar]
  40. Pritchard, J. K., and M. Przeworski, 2001. Linkage disequilibrium in humans: model and data. Am. J. Hum. Genet. 69: 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Reich, D. E., M. Cargill, S. Bolk, J. Ireland, R. C. Sabett et al., 2001. Linkage disequilibrium in the human genome. Nature 411: 199–204. [DOI] [PubMed] [Google Scholar]
  42. Rioux, J. D., M. J. Daly, M. S. Silverberg, K. Lindblad, H. Steinhart et al., 2001. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat. Genet. 29: 223–228. [DOI] [PubMed] [Google Scholar]
  43. Schaid, D. J., 2004. Evaluating associations of haplotypes with traits. Genet. Epidemiol. 27: 348–364. [DOI] [PubMed] [Google Scholar]
  44. Schaid, D. J., C. M. Rowland, D. E. Tines, R. M. Jacobson and G. A. Poland, 2002. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70: 425–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sham, P. C., S. S. Cherny, S. Purcell and J. K. Hewitt, 2000. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66: 1616–1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Stephens, J. C., J. A. Schneider, D. A. Tanguay, J. Choi, T. Acharya et al., 2001. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489–493. [DOI] [PubMed] [Google Scholar]
  47. Stephens, M., and P. Donnelly, 2003. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73: 1162–1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Stephens, M., N. Smith and P. Donnelly, 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68: 978–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Stram, D. O., C. A. Haiman, J. N. Hirschhorn, D. Altshuler, L. N. Kolonel et al., 2003. Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the multiethnic cohort study. Hum. Hered. 55: 179–190. [DOI] [PubMed] [Google Scholar]
  50. Sved, J. A., 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor. Popul. Biol. 2: 125–141. [DOI] [PubMed] [Google Scholar]
  51. Thomson, G., and M. P. Baur, 1984. Third order linkage disequilibrium. Tissue Antigens 24: 250–255. [DOI] [PubMed] [Google Scholar]
  52. Tregouet, D. A., S. Escolano, L. Tiret, A. Mallet and J. L. Golmard, 2004. A new algorithm for haplotype-based association analysis: the stochastic-EM algorithm. Ann. Hum. Genet. 68: 165–177. [DOI] [PubMed] [Google Scholar]
  53. Weir, B. S., 1996. Genetic Data Analysis II, Ed. 2. Sinauer Associates, Sunderland, MA.
  54. Weir, B. S., and C. C. Cockerham, 1977. Two-locus theory in quantitative genetics, pp. 247–269 in Proceedings of the International Conference on Quantitative Genetics, edited by E. Pollak, O. Kempthorne and T. B. Bailey. Iowa State University Press, Ames, IA.
  55. Weiss, K. M., and A. G. Clark, 2002. Linkage disequilibrium and the mapping of complex traits. Trends Genet. 18: 19–24. [DOI] [PubMed] [Google Scholar]
  56. Zaykin, D. V., P. H. Westfall, S. S Young, M. A. Karnoub, M. J. Wagner et al., 2002. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53: 79–91. [DOI] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES