Combined Linkage and Association Mapping of Quantitative Trait Loci by Multiple Markers

Jeesun Jung; Ruzong Fan; Lei Jin

doi:10.1534/genetics.104.035147

. 2005 Jun;170(2):881–898. doi: 10.1534/genetics.104.035147

Combined Linkage and Association Mapping of Quantitative Trait Loci by Multiple Markers

Jeesun Jung ^*, Ruzong Fan ^†,¹, Lei Jin ^†

PMCID: PMC1450431 PMID: 15802526

Abstract

Using multiple diallelic markers, variance component models are proposed for high-resolution combined linkage and association mapping of quantitative trait loci (QTL) based on nuclear families. The objective is to build a model that may fully use marker information for fine association mapping of QTL in the presence of prior linkage. The measures of linkage disequilibrium and the genetic effects are incorporated in the mean coefficients and are decomposed into orthogonal additive and dominance effects. The linkage information is modeled in variance-covariance matrices. Hence, the proposed methods model both association and linkage in a unified model. On the basis of marker information, a multipoint interval mapping method is provided to estimate the proportion of allele sharing identical by descent (IBD) and the probability of sharing two alleles IBD at a putative QTL for a sib-pair. To test the association between the trait locus and the markers, both likelihood-ratio tests and F-tests can be constructed on the basis of the proposed models. In addition, analytical formulas of noncentrality parameter approximations of the F-test statistics are provided. Type I error rates of the proposed test statistics are calculated to show their robustness. After comparing with the association between-family and association within-family (AbAw) approach by Abecasis and Fulker et al., it is found that the method proposed in this article is more powerful and advantageous based on simulation study and power calculation. By power and sample size comparison, it is shown that models that use more markers may have higher power than models that use fewer markers. The multiple-marker analysis can be more advantageous and has higher power in fine mapping QTL. As an application, the Genetic Analysis Workshop 12 German asthma data are analyzed using the proposed methods.

IN linkage disequilibrium (LD) mapping or association study, one may use one marker a time. However, the resolution of the single-marker analysis strategy can be low. In addition, utilizing different markers may lead to different results, since the power to detect allelic association depends on specific properties of the markers. This complicates the interpretation of an analysis. It is interesting and important to build models that use multiple markers simultaneously for high-resolution mapping of genetic traits. A unified analysis using multiple markers gives a unique result and may lead to greater resolution. Moreover, large numbers of single-nucleotide polymorphisms (SNPs) are available, and high-throughout genotyping approaches are emerging (International SNP Map Working Group 2001). This encouraging development facilitates high-resolution fine mapping of genetic traits. It is natural and necessary to develop high-resolution multiple-marker-based methods to dissect genetic traits.

In our previous work, variance component models using two markers are proposed for high-resolution linkage and association mapping of quantitative trait loci (QTL) based on population and pedigree data (Zhao et al. 2001; Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan et al. 2005). The genetic effects are orthogonally decomposed into summation of additive and dominance effects. In Abecasis et al. (2000a)(b, 2001), Cardon (2000), Fulker et al. (1999), and Sham et al. (2000), an association between-family and association within-family (AbAw) approach is proposed to decompose the genetic association into effects of between pairs and within pairs. The models of our previous work differ from the AbAw approach in the following senses: (1) The AbAw approach uses only one marker in analysis, but we use two diallelic markers, and (2) the way of modeling mean coefficients is different. Fan and Jung (2003) compare our method with the AbAw approach and find that our method is advantageous for sib-pair data. In addition, Fan et al. (2005) confirm that our approach is more powerful than the AbAw approach for large pedigrees. One may note that it is not clear how to extend the AbAw approach to use more than one marker in analysis (R. Fan and G. R. Abecasis, personal communication).

This article extends our previous work and investigates variance component models in high-resolution linkage and association mapping of QTL using multiple diallelic markers. The models jointly take linkage and linkage disequilibrium information into account. The linkage information is modeled in the variance-covariance matrix, and the linkage disequilibrium information is modeled in mean coefficients of trait values such as the AbAw approach. By modeling the linkage information in the variance-covariance matrix, we may take advantage of much research on variance component models (Haseman and Elston 1972; Amos et al. 1989; Goldgar and Oniki 1992; Amos 1994; Fulker et al. 1995; Almasy and Blangero 1998; George et al. 1999; Pratt et al. 2000). In the mean time, the linkage disequilibrium information is incorporated into the mean coefficients via indicator variables of marker genotypes, whose validity can be justified intuitively (Fan and Xiong 2002, pp. 608–609).

Using the models developed in this article, test statistics can be developed for high-resolution association mapping of QTL. The procedure is to perform appropriate linkage analysis on the basis of a sparse genetic map for prior linkage evidence. Then association study can be carried out on the basis of a dense genetic map for high-resolution mapping of QTL in the presence of prior linkage information. Likelihood-ratio tests (LRT) can be carried out in high-resolution association studies. For large-sample data, likelihood-ratio criteria are accurate. On the basis of general theory of linear models, F-test statistics can be built to test the association between trait locus and markers in the presence of prior linkage evidence (Graybill 1976). The analytical formulas for the noncentrality parameter approximations are derived for the F-test statistics. The merits of the proposed method are investigated by power and sample size comparison. Using the simulation program LDSIMUL kindly provided by G. R. Abecasis, simulation study is performed to explore the power and type I error rates of the proposed test statistics. The proposed methods are compared with the AbAw approach (Abecasis et al. 2000a). Moreover, the method is applied to analyze the Genetic Analysis Workshop (GAW) 12 German asthma data (Wjst et al. 1999; Meyers et al. 2001).

MODEL

Assume that k diallelic markers M_j, j = 1, · · · , k, are typed in a region of one chromosome. Suppose a quantitative trait locus Q is located in the region, which has two alleles Q₁ and Q₂ with frequencies q₁ and q₂, respectively. For marker M_j, there are two alleles M_j with frequency P_{M_j} and m_j with frequency P_{m_j}, respectively. For a nuclear family of l children and two parents, let y = (y_f, y_m, y₁, · · · , y_l)^τ be their quantitative trait column vector and G_j = (G_f_j, G_m_j, G₁_j, · · · , G_lj)^τ be their genotype column vector at the jth marker locus M_j. Here y_f is the trait value of the father, and G_f_j is the genotype of the father at the jth marker. Likewise, the other notations of the mother and the ith child are defined accordingly with subscripts m and i, respectively. The superscript τ denotes the transpose of a vector or a matrix. Under the assumption of multivariate normality, we consider the mixed-effect model

(Searle et al. 1992; Pinheiro and Bates 2000), where β is the overall mean of fixed effect, w_i is a row vector of covariates such as sex and age, γ is a column vector of fixed-effect regression coefficients of w_i, B_i is the familial effect of random effects, and e_i is the error term. Assume that e_i is normal N(0, σ²_e), and B_i is normal N0,σ²_s + σ²_Ga, where σ²_e is error variance, σ²_s is the variance of shared environment effect, and σ²_Ga is the variance of additive polygenic effect. Moreover, B_i and e_i are independent. For j = 1, · · · , k, α_j and δ_j are fixed-effect regression coefficients of the dummy variables x_ij and z_ij, respectively. Here x_ij and z_ij are indicator variables and are defined as follows:

Following the traditional quantitative genetics, the variance-covariance matrix of model (1) is a (l + 2) × (l + 2) square matrix and is given by

where Inline graphic . Here σ²_g is variance explained by the putative QTL Q. The genetic variance is decomposed into additive and dominance components. is the correlation between the parents. Let be the variance of familial effects that include shared environment variance σ²_s and half of the additive polygenic variance. Inline graphic is correlation between parents and children; is the correlation between the ith child and the jth child, where π_ijQ is the proportion of alleles shared identical by descent (IBD) at putative QTL Q by the ith child and the jth child, and Δ_ijQ is the probability that both alleles at the putative QTL Q shared by the ith child and the jth child are IBD (Cotterman 1940; Pratt et al. 2000; Zhu and Elston 2000; Lange 2002). On the basis of the above discussion, the log-likelihood function of the mixed-effect model (1) is given by

where η = (β, γ^τ, α₁, · · · , α_k, δ₁, · · · , δ_k)^τ is a vector of regression coefficients, and X is the model matrix, accordingly.

One may wonder why we use model (1) to describe the phenotypes. Here we provide an intuitive rationale. Suppose that QTL Q coincides with one marker, e.g., marker M₁, and trait allele Q₁ coincides with marker allele M₁ and allele Q₂ coincides with allele m₁. Let μ_ij be the effect of genotype Q_iQ_j, i, j = 1, 2. Denote genotypic value a = μ₁₁ − (μ₁₁ + μ₂₂)/2 and d = μ₁₂ − (μ₁₁ + μ₂₂)/2. The average effect of gene substitution is α_Q = a + (q₂ − q₁)d, i.e., the difference between the average effects of the trait locus alleles, and dominance deviation is δ_Q = 2d in view of traditional quantitative genetics (Falconer and Mackay 1996). Fan and Xiong (2002) show that y_i can be expressed as y_i = μ₀ + x_i₁α_Q + z_i₁δ_Q + B_i + e_i, where μ₀ is overall population mean of the quantitative trait. Hence, marker M₁ may fully describe the trait values if it coincides with the QTL Q. In practice, the information of QTL Q is unknown. Instead, model (1) is proposed to describe trait value y_i using marker information. Two marker models were used in previous work (Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan et al. 2005). Model (1) uses multiple markers and is a natural generalization of model of our previous work. The objective is to use marker information fully for fine high-resolution mapping of QTL. In the following, we show that model (1) and log-likelihood (3) can be used in joint linkage and association mapping of QTL.

PROPERTY OF REGRESSION COEFFICIENTS AND ASSOCIATION TESTS

Denote the measure of LD between trait locus Q and marker M_i by Inline graphic , i = 1, · · · , k and the measure of LD between marker M_i and marker M_j by , i < j, i, j = 1, · · · , k. Let the additive and dominance variance-covariance matrices of the indicator variables defined in (2) be (appendix A)

In appendix A, the coefficients of model (1) are derived as

Equations (5) show that the parameters of LD (i.e., D_{M_iQ} and D_{M_iM_j}) and gene effect (i.e., α_Q and δ_Q) are contained in the mean coefficients. Model (1) simultaneously takes care of the LD and the effects of the putative trait locus Q. The gene substitution effect α_Q is contained only in α_i; and the dominance effect δ_Q is contained only in δ_i, i = 1, · · · , k. Therefore, model (1) orthogonally decomposes genetic effect into summation of additive and dominance effects.

Assume that all markers M_i and M_j are in linkage equilibrium (i.e., Inline graphic , i, j = 1, · · · , k, i ≠ j). The coefficients of additive and dominance effects are given by , · · ·, , and , · · · , . That means markers M₁, · · · , M_k independently contribute to the analysis of the trait values. Usually, the markers M_i can be in LD, especially when they are located in a narrow chromosome region. Equations (5) correctly use the LD information of markers M_i in the analysis.

Linkage analysis can be performed by considering a reduced variance component model,

by using the traditional method of variance component models (Amos et al. 1989; Amos 1994; Almasy and Blangero 1998). This initial study can identify prior linkage evidence of the trait values to a specific chromosome region on the basis of a sparse genetic map. Suppose that prior linkage evidence is provided by an initial linkage study. On the basis of a dense genetic map, high-resolution association mapping of the QTL can be carried out by fitting the full model (1). First, assume that linkage is confirmed in a chromosome region by the significant presence of both the gene substitution and dominance effects, i.e., α_Q ≠ 0 and δ_Q ≠ 0. On the basis of Equations 5, the existence of LD between markers M_i (i = 1, · · · , k) and trait locus Q can be tested by H_ad: α₁ = · · · = α_k = δ₁ = · · · = δ_k = 0. Second, assume that linkage is supported by the significant presence of the gene substitution effect, but not the dominance effect, i.e., α_Q ≠ 0 and δ_Q = 0. The existence of LD can be tested by H_a: α₁ = · · · = α_k = 0. Third, assume that linkage is supported by the significant presence of the dominance effect, but not the gene substitution effect, i.e., α_Q = 0 and δ_Q ≠ 0. The existence of LD can be tested by H_d: δ₁ = · · · = δ_k = 0.

Evidence of association can be evaluated by the LRT procedure. For instance, let L_ad be the log-likelihood under the alternative hypothesis of H_ad and L₀ be the log-likelihood under the null hypothesis H_ad. Then, the likelihood-ratio test statistic 2[L_ad − L₀] is asymptotically distributed as χ². The degrees of freedom for this test are determined as follows. Under the null hypothesis H_ad, there are only k measures of LD, D_M₁Q, · · · , D_{M_kQ}, of which only k − 1 are independent since Inline graphic . Thus, the number of coefficients α_i, δ_i, i = 1, · · · , k, which is significantly different from 0, should be ≤k − 1 in a data analysis. This number is the degrees of freedom of the likelihood-ratio test statistic 2[L_ad − L₀]. The likelihood-ratio test is accurate and robust for large sample data based on the statistical theory.

Theoretically, it is not easy to evaluate the power of the likelihood-ratio test statistics. The reason is that it is very hard to calculate the approximations of noncentrality parameters of the likelihood-ratio test statistics. Sham et al. (2000) performed power analysis of the AbAw approach by deriving the approximations of the noncentrality parameters of the likelihood-ratio test statistics, which is rather complicated in our opinion. In addition to the likelihood-ratio test statistics, we develop an F-test procedure based on linear model theory in this article (Graybill 1976). Utilizing the formulas of noncentrality parameters in chapter 6 of Graybill (1976), the approximations of the noncentrality parameters of the F-tests are calculated readily. Moreover, we show that the type I error rates and power of the F-test are very close to those of the likelihood-ratio test statistics (Tables 2 and 3), which are actually guaranteed by the construction of the F-test for large samples (Graybill 1976, pp. 187–188). Therefore, both the likelihood-ratio test procedure and the F-test procedure are useful. Before introducing the F-test procedure, we discuss the parameter estimations first.

TABLE 2.

Type I error rates (%) of test cases of Table 1 at a 0.05 significance level

		Error rates when total no. of offspring is
120		240		480
No. of offspring in each family	Test case	LRT	F̂_1,a	LRT	F̂_1,a	LRT	F̂_1,a
1	Null	5.0	4.9	5.1	5.1	5.8	5.8
Familiality	5.4	5.3	5.2	5.2	5.3	5.3
Admixture	3.9	3.8	5.2	5.2	5.3	5.3
2	Null	4.6	4.5	4.8	4.7	4.5	4.5
Familiality	4.2	4.1	3.6	3.6	4.7	4.8
Admixture	5.0	4.8	5.5	5.5	4.9	5.1
Linkage	5.5	5.4	5.0	4.3	5.0	5.1
Composite	5.6	5.8	5.8	5.9	5.6	5.7
4	Null	4.9	5.0	4.3	4.3	3.6	3.6
Familiality	5.2	5.3	4.2	4.3	4.8	4.8
Admixture	5.5	5.6	5.4	5.8	4.2	4.2
Linkage	5.3	5.5	5.4	5.4	4.9	5.0
Composite	5.3	5.5	5.3	5.3	4.1	4.2
8	Null	4.2	4.4	5.0	5.1	4.7	4.7
Familiality	4.7	5.3	5.1	5.5	4.4	4.4
Admixture	3.5	4.4	5.5	6.0	4.4	4.6
Linkage	6.1	6.8	4.3	4.6	4.6	4.8
	Composite	5.8	6.7	5.5	5.9	3.7	3.9

Open in a new tab

The parameters are the same as those of Abecasis et al. (2000a)(Table 2).

TABLE 3.

Power comparison with results of Abecasiset al. (2000a, Table 4)

		No. of families/sample size N
One sib per family: 480/1440		Two sibs per family: 240/960		Three sibs per family: 160/800		Four sibs per family: 120/720		Five sibs per family: 96/672		Six sibs per family: 80/640		Eight sibs per family: 60/600
D′ %	No. of simulated data sets	AbAw	F_1,a, F̂_1,a, LRT	AbAw	F_1,a, F̂_1,a, LRT	AbAw	F_1,a, F̂_1,a, LRT	AbAw	F_1,a, F̂_1,a, LRT	AbAW	F_1,a, F̂_1,a, LRT	AbAw	F_1,a, F̂_1,a, LRT	AbAw	F_1,a, F̂_1,a, LRT
0	0.2	0.1	0.0	0.1	0.1	0.1	0.1	0.1	0.2	0.1	0.1	0.1	0.0	0.1
1,000	0.1	0.1	0.1	0.0	0.1	0.1	0.0
1,000	0.1	0.1	0.1	0.0	0.1	0.1	0.0
20,000^a	0.105	0.10	0.105	0.09	0.105	0.095	0.09
20,000^b	0.105	0.10	0.105	0.09	0.085	0.085	0.09
25	2.1	33.1	1.8	15.4	2.0	10.6	2.6	8.5	3.0	7.4	2.1	6.7	2.1	5.8
1,000	33.0	14.8	11.2	7.3	8.2	5.3	4.3
1,000	32.9	14.7	10.9	7.2	7.4	4.9	3.8
50	19.5	99.2	22.9	89.4	24.8	78.6	26.7	70.7	26.7	65.2	27.2	61.2	23.9	56.0
1,000	99.4	90.5	76.7	69.9	63.7	55.5	47.5
1,000	99.4	90.4	76.4	69.0	62.8	54.1	45.0
75	69.3	100	72.6	100	74.2	99.8	76.9	99.3	76.0	98.7	76.5	98.1	75.4	96.9
1,000	100	100	99.9	99.2	98.9	97.3	94.6
1,000	100	100	99.9	99.2	98.8	97.2	93.8
100	97.4	100	97.7	100	98.3	100	98.4	100	98.2	100	98.4	100	98.5	100
1,000	100	100	100	100	100	100	100
	1,000		100		100		100		100		100		100		100

Open in a new tab

In the AbAw columns, the power (%) is taken from Abecasis et al. (2000a)(Table 4). In columns 4, 6, 8, 10, 12, 14, and 16 the power (%) of F_1,a is calculated on the basis of the theoretical approximation of noncentrality parameter λ_1,a of test statistic F_1,a at a 0.001 significance level; the empirical power (%) of F̂_1,a and LRT are calculated as the proportions of 1000 or 20,000 simulated data sets that give significant results at the 0.001 significance level on the basis of F_1,a and the likelihood-ratio test statistic, respectively. The parameters are the same as those of Abecasis et al. (2000a)(Table 4): Inline graphic , h² = 0.1, σ² = 100, , , . In addition, and .

^{^a}

Results of the row are calculated on the basis of F̂_1,a.

^{^b}

Results of the row are calculated on the basis of LRT.

PARAMETER ESTIMATIONS

IBD estimations:

Denote the recombination fraction between the trait locus Q and marker M_i by θ_{M_iQ}, i = 1, · · · , k. Likewise, the recombination fraction between markers M_i and M_j is defined by θ_{M_iM_j}. Following Fulker et al. (1995) and Almasy and Blangero (1998), we propose a multipoint interval mapping method to estimate the proportion π_ijQ of allele sharing IBD at a putative QTL Q for a sib-pair i and j by

where π_{ijM_l} is the proportion of alleles shared IBD at the marker M_l for l = 1, · · · , k. The coefficients α_π, β_πM₁, · · · , β_{πM_k} are derived in appendix B as follows:

And α_π is estimated as Inline graphic . If marker M_l coincides with QTL Q, it can be shown that and α_π = 0, . Hence . To estimate Δ_ijQ of the probability of sharing two alleles IBD for a sib-pair, consider

where Δ_{ijM_l} is the probability of sharing two alleles IBD at marker M_l for l = 1, · · · , k. The coefficients r_M₁, · · · , r_{M_k}^τ are derived in appendix C as follows:

The remaining coefficients are given in appendix C by

The α in Equation 8 is Inline graphic . Again, if marker M_l coincides with QTL Q, it can be shown that .

Estimations of model coefficients and variance-covariance matrix:

As an example, assume that the data are composed of three subsamples: n individuals of a population; m trio families, each having both parents and a single child; and s nuclear families, each having both parents and two offspring. Furthermore, we assume that n, m, and s are sufficiently large, so that large sample theory applies. We may include data of nuclear families with both parents and more than two offspring. The principle of the following paragraphs can be extended to such families if the number of families is large enough to apply the large sample theory. To estimate the parameters, one may take the method of interval mapping proposed by Fulker et al. (1995) and Almasy and Blangero (1998). That is to say, for each location of the QTL on the chromosome with fixed recombination fractions, the IBD estimations are performed first. Then one may estimate parameters of Σ and η as follows.

Consider the overall log-likelihood Inline graphic , I = n + m + s, where L_i is the log-likelihood of trait vector or value y_i of the ith family or individual. Let Σ_i be the variance-covariance matrix of trait vector or value y_i and X_i be its model matrix. Denote the total trait values by y = (y^τ₁, · · · , y^τ_I)^τ, the total variance-covariance matrix by Σ = diag(Σ₁, · · · , Σ_I), and the model matrix by X = (X^τ₁, · · · , X^τ_I)^τ. Let N = n + 3m + 4s be the total number of individuals. On the basis of the log-likelihood Inline graphic , parameters of Σ and η can be estimated by Newton-Raphson or Fisher scoring algorithms (Jennrich and Schluchter 1986). Let Σ̂ = diag(Σ̂₁, · · · , Σ̂_I) be the maximum-likelihood estimates of Σ. Then the estimate of η is

For each location of the QTL on the chromosome, the likelihood-ratio test or F-test statistics can then be calculated using the estimates Σ̂ and η̂. The location that gives the best result can be treated as the location of the QTL. In practice, some of the parameters (e.g., the variance parameter σ²_gd) may not be estimable and identifiable due to the redundancy. For specific types of data, one needs to specify the model carefully.

F-TESTS AND NONCENTRALITY PARAMETER APPROXIMATIONS

On the basis of linear regression model theory, one may construct F-test statistics of genetic effects and LD coefficients (Graybill 1976). Moreover, the noncentrality parameters of the F-test statistics can be calculated readily. To evaluate the power of the F-test statistics, it is necessary to calculate the approximations of the noncentrality parameters. The procedure is as follows. First, one may construct an F-test statistic for each of three hypotheses:

H_ad: α₁ = · · · = α_k = δ₁ = · · · = δ_k = 0;
H_a: α₁ = · · · = α_k = 0;
H_d: δ₁ = · · · = δ_k = 0.

The noncentrality parameter of each F-test statistic can be calculated using the theory in Graybill (1976)(Chap. 6). Assume that there are no covariates. Then the coefficients of model (1) can be written as η = (β, α₁, · · · , α_k, δ₁, · · · , δ_k)^τ. For each hypothesis, there is a q × (2k + 1) matrix H, such that the hypothesis can be written as Hη = 0, where q is the rank of H. On the basis of Graybill (1976), the F-test statistic for hypothesis Hη = 0 is

with a noncentral F(q, N − (2k + 1), λ) distribution under the alternative hypothesis, where λ is the noncentrality parameter given by λ = (Hη)^τ[H(X^τΣ⁻¹X)⁻¹ H^τ]⁻¹(Hη).

Combined analysis of population and family data:

Again, assume that the data are composed of three subsamples: n individuals of a population; m trio families, each having both parents and a single child; and s nuclear families, each having both parents and two offspring. To calculate the approximations of the noncentrality parameters, assume that the sample sizes n, m, and s are large enough that the large-sample theory applies. We show in appendix d the approximation

where a₁, a₂, and a₃ are constants given by Equations (D7) in appendix D.

The additive variance Inline graphic and the dominance variance are expressed in terms of the average effect of gene substitution α_Q and the dominance deviation δ_Q. Let I_k and I₂_k be k and 2k dimension identity matrices. Moreover, let O_k_×_l be a k × l zero matrix. To test hypothesis H_a: α₁ = · · · = α_k = 0, the test matrix H = (O_k_×1, I_k, O_k_×_k). Let us denote the test statistic as F_k_,a. The noncentrality parameter is approximated by

To test hypothesis H_d: δ₁ = · · · = δ_k = 0, the test matrix H = (O_k_×1, O_k_×_k, I_k). Let us denote the test statistic as F_k_,d. The noncentrality parameter is approximated by

To test hypothesis H_ad: α₁ = · · · = α_k = δ₁ = · · · = δ_k = 0, the test matrix H = (O₂_k_×1, I₂_k). Let us denote the test statistic as F_k_,ad. The noncentrality parameter is λ_k_,ad ≈ λ_a + λ_d; i.e., λ_k_,ad is decomposed into the summation of additive and dominant noncentrality parameters.

Nuclear family data:

To make a comparison with the results of Abecasis et al. (2000a)(Table 4), we consider I families, each having both parents and l offspring. Let N = I(l + 2) be the total number of individuals. The other notations are defined in a similar way as above. Suppose that variance-covariance matrices of the I families are the same, i.e., Σ₁ = · · · = Σ_I. Denote Inline graphic . If the sample size N is large enough, we show in appendix E that

where b₁ and b₂ are constants given by Equations (E1) in appendix E. The approximation of the noncentrality parameter of statistic F_k_,a is

TYPE I ERROR RATES

To evaluate the type I error rates of the proposed method, simulation program LDSIMUL kindly provided by G. R. Abecasis is used to generate data sets. Nuclear families are generated in simulation. Five test cases are considered in type I error rate calculation, which are taken from Abecasis et al. (2000a)(Table 2). Table 1 presents parameters of four test cases. Trait values are constructed by a normal distribution with mean 0 and total variance σ² = 100 except for test case of Admixture. Here Inline graphic is the summation of the additive major gene effect σ²_ga, the variance of polygenic effect σ²_Ga, and the error variance σ²_e. In each model except the Admixture, a diallelic marker M₁ is simulated with allele frequency . In the test cases of Null, Familiality, and Admixture, no major gene effect is assumed, i.e., Inline graphic . In the test cases of Linkage and Composite, major gene effect is assumed, and marker M₁ coincides with the QTL Q, i.e., recombination fraction ; in the meantime, linkage equilibrium is assumed between QTL Q and the marker M₁, i.e., . In the test case of Admixture, population admixture is generated by mixing families equally drawn from one of two subpopulations A and B. In both subpopulations A and B, no major gene effect or familial effect is assumed, i.e., Inline graphic . However, the trait mean of subpopulation A is fixed at 10 and the variance is fixed at 100, and the marker allele frequency P_M₁ is taken as 0.7 in subpopulation A. The trait mean of subpopulation B is fixed at 0 and the variance is fixed at 100, and the marker allele frequency P_M₁ is taken as 0.3 in subpopulation B. Therefore, the total variance in the mixing population is σ² = 125. The admixture contributed to (10 − 0)²/[4 × 125] = 0.20 of the total variance.

TABLE 1.

The parameters of the simulated genetic cases

Test case	σ²_ga	σ²_Ga	σ²_e	σ²	θ_M₁Q	P_M₁	q₁	D_M₁Q
Null	0	0	100	100	Not applied	0.5	Not applied	Not applied
Familiality	0	50	50	100	Not applied	0.5	Not applied	Not applied
Linkage	30	0	70	100	0	0.5	0.5	0
Composite	20	30	50	100	0	0.5	0.5	0

Open in a new tab

The total variance is fixed at Inline graphic and . Admixture: no major gene effect or familial effect , but with population admixture (see text for explanation).

To calculate the type I error rates, 1000 data sets are simulated for each test case. Each data set contains a certain number of related pedigrees. For instance, 120 trio families are generated for test case Null if the total number of offspring is 120 and the number of offspring in each family is 1; but only 15 families are generated if the number of offspring in each family is 8 and the total number of offspring is 120. Using the data sets, we fit the model

where B_i is normal N(0, σ²_Ga), y_i is normal N(β + x_i₁α₁, σ²), and Inline graphic . The null hypothesis is H_1,a: α₁ = 0. Since the QTL Q is in linkage equilibrium with marker M₁, an empirical test statistic that is larger than the cutting point at a 0.05 significance level is treated as a false positive. On the basis of either the likelihood-ratio test or the F-test, type I error rates are calculated as the proportions of the 1000 simulation data sets that give a significant result at the 0.05 significance level based on F_1,a and the likelihood-ratio test statistic, respectively. Table 2 presents type I error rates of likelihood-ratio tests and F-test statistics. The results show that the type I error rates are around the 0.05 nominal significance level in almost all cases. Hence, the proposed model is robust. In addition, the type I error rates of F-tests are similar to those of the likelihood-ratio tests. In an association study, false positives due to population stratifications are usually a big issue. From the results of Table 2, the type I error rates in the Admixture case are reasonable.

Table 2, bottom, shows a notable variability in the range of type I errors when the number of offspring is 8 and the sample sizes are small. For example, the type I error rates of the F-test F̂_1,a are 6.7% for test case of Composite when the total number of offspring is 120. This is most likely due to the small sample size and multivariate normality. When the total number of offspring is 120, there are only 15 pedigrees, each consisting of two parents and 8 offspring; and the variance-covariance matrix Σ is a big 10 × 10 square matrix. Hence, the parameter estimations are hardly accurate, which makes the deviation from the nominal level greater. When the sample size increases (i.e., the total number of offspring is 240 or 480), the type I error rates are close to the nominal level of 0.05. The results of Table 2 are based on 1000 simulated data sets, which may not be always reliable. To further investigate the issue, we perform a calculation in the next section based on 20,000 simulated data sets for another Composite test case in Table 3. The results of Table 3 confirm that the type I error rates are close to the nominal level for large-sample data.

POWER CALCULATION AND COMPARISON

Comparison with the AbAw approach:

Denote the heritability by h², which is defined as Inline graphic (Falconer and Mackay 1996). To compare the method proposed in this article with the AbAw approach of Abecasis et al. (2000a), we present a power comparison in Table 3. The parameters are the same as those of Abecasis et al. (2000a)(Table 4): , h² = 0.1, σ² = 100, , , . In addition, Inline graphic and . In the AbAw columns in Table 3, the results are taken from Abecasis et al. (2000a)(Table 4). In the (F_1,a, F̂_1,a, LRT)^τ columns, the power (%) of F_1,a is calculated on the basis of approximation of noncentrality parameter λ_1,a of test statistic F_1,a at a 0.001 significance level; the power (%) of F̂_1,a and the LRT are calculated as the proportions of 1000 or 20,000 simulation data sets that give a significant result at the 0.001 significance level based on F_1,a and the likelihood-ratio test statistic, respectively. For each simulated data set, a certain number nuclear families are simulated via LDSIMUL. For instance, for one sib per family, 480 trio families are simulated in each simulated data set.

The results of Table 3 clearly show that the proposed F-tests F_1,a and likelihood-ratio tests are much more powerful than the AbAw approach. When Inline graphic 25%, it is possible to achieve considerable power. When , the statistic F_1,a is powerful for a sample with a total number of 480 sibs. In addition, the results of Table 3 show that the empirical power of F̂_1,a is similar to that of the likelihood-ratio test. This implies that in a large sample the two tests provide similar power (Graybill 1976). The AbAw approach presented in Abecasis et al. (2000a) utilized only the trait values of sibships in the model and discarded the trait values of parents. This is, obviously, not an efficient way. The proposed methods, on the other hand, incorporate both parental and sibship phenotypes into the models. This considerably increases the power as shown in Table 3.

In Table 3, the first row of results corresponds to the case when D′ is zero, i.e., a situation when the null hypothesis of no association is true. Hence, the power results for all these tests are simply the type I error rates. It can be seen that the type I error rates are close to the nominal level 0.001 = 0.1% when the number of simulated data sets is 20,000. This is consistent with the conclusion of Table 2; i.e., the proposed model is robust. To make a comparison with the results of Abecasis et al. (2000a)(Table 4), the results of F̂_1,a and the LRT of 1000 simulated data sets are also presented. In most cases, the entries are equal to the nominal level 0.001 = 0.1%; i.e., one of the 1000 data sets leads to a significant result, but some entries are 0 since none of the 1000 data sets leads to a significant result.

In Table 3, there is a trend that the power of (F_1,a, F̂_1,a, LRT)^τ to detect association decreases with the increasing sibship sizes. This is partly because the sample size N decreases although the total number of offspring is the same, 480: For 480 trio families of one sib per family, the total number of individuals is N = 1440; for 60 families of eight sibs per family, the total number of individuals is N = 600. For the AbAw approach presented in Abecasis et al. (2000a), the total number of offspring that are used in the model is the same, 480. Since our models use phenotypes of both parents and offspring, the sample sizes N are different. On the other hand, for the same total number of typed individuals N, families of large sibship sizes contain less LD information than families of small sibship sizes. The readers may note that this result is consistent with findings in Fan and Xiong (2003). In Fan and Xiong (2003)(p. 131, Figure 3), the population-based method is shown to be more powerful than the family-based method for the same number of individuals.

Comparisons of sample size and power of LD mapping:

Power and sample size calculations are performed to investigate the merits of the proposed method. Figure 1 shows the power curves of the test statistics F_4,a, F_3,a, F_2,a, F_4,d, F_3,d, and F_2,d against the linkage disequilibrium coefficient D_M₁Q at a 0.01 significance level for a dominant mode of inheritance (a = d = 1.0) and a recessive mode of inheritance (a = 1.0, d = −0.5). The related parameters are given in the Figure 1 legend. Generally, the power of F_4,a using four markers in the model is higher than that of F_3,a using three markers, which in turn is higher than that of F_2,a using two markers. Hence, multiple-marker analysis is advantageous. The power of F_k_,d is usually minimal unless the LD between locus Q and marker M₁ is very strong for the dominant mode of inheritance. Note the power curves of Figure 1 are not symmetric with respect to D_M₁Q. This is due to Inline graphic , i = 2, 3, 4, , i ≠ j, and so the power curves do not have to reach a minimum value when D_M₁Q is zero. Instead, they are shifted to the right, so that the minimum is at a point when D_M₁Q > 0. Figure 2 provides the power of the test statistics F_4,a, F_3,a, F_2,a, F_4,d, F_3,d, and F_2,d against heritability h² at a 0.01 significance level for a dominant mode of inheritance (a = d = 1.0) and a recessive mode of inheritance (a = 1.0, d = −0.5), respectively. In addition to the merits shown in Figure 1, the power of the test statistics F_4,a, F_3,a, F_2,a is high when heritability h² is >0.10 for both modes of inheritance.

Inline graphic — Power curves of test statistics F_4,a, F_3,a, F_2,a, F_4,d, F_3,d, and F_2,d against the measure of LD between M₁ and Q at a 0.01 significance level, when q₁ = 0.5, , i = 1, 2, 3, 4, , i = 2, 3, 4, , i ≠ j, π₁₂_Q = 0.5, δ₁₂_Q = 0.25, heritability h² = 0.15, polygenic effect variance and sample size n = 40, m = 30, s = 20 for (A) a dominant mode of inheritance a = d = 1.0 and (B) a recessive mode of inheritance a = 1.0, d = −0.5, respectively.

Figure 2.— — Power of test statistics F_4,a, F_3,a, F_2,a, F_4,d, F_3,d, and F_2,d against the heritability h² at a 0.01 significance level, when q₁ = 0.5, , i, j = 1, 2, 3, 4, i ≠ j, π₁₂_Q = 0.5, δ₁₂_Q = 0.25, and sample size n = 40, m = 30, s = 20 for (A) a dominant mode of inheritance a = d = 1.0 and (B) a recessive mode of inheritance a = 1.0, d = −0.5, respectively.

Figure 3 shows the power of test statistics F_4,a, F_3,a, F_2,a, and F_1,a against the trait allele frequency q₁ (Figure 3A) or marker allele frequency P_M₁ (Figure 3B) at a 0.01 significance level for an additive mode of inheritance a = 1.0, d = 0.0, respectively. The other parameters are given in the Figure 3 legend. From Figure 3A, it can be seen that the power of F_k_,a increases as the trait allele frequency q₁ increases. Figure 3B shows that the power of F_4,a and F_3,a is almost constant; in addition, the power of F_2,a increases slowly, and the power of F_1,a increases as the marker allele frequency P_M₁ increases. In general, the power of F_4,a and F_3,a depends heavily on the trait allele frequency q₁, but not on the marker allele frequency P_M₁. At first glance, it is strange that the power of F_4,a and F_3,a does not depend very much on the marker allele frequency P_M₁. The mystery is that the LD measures Inline graphic , i = 2, 3, 4 are already high. That is why the contribution of marker M₁ matters not very much to the power of F_2,a, F_3,a, and F_4,a. This adds one more piece of information to the advantage of multiple-marker analysis. That is, as long as some markers are in strong linkage disequilibrium with the trait locus, the power to detect the association is high.

Assume that the LD is due to historical mutations T generations ago at QTL Q. At the initial generation when the mutation occurred, the LD coefficient is Inline graphic , where P(M_iQ)(0) is the frequency of haplotype M_iQ. The LD coefficient is reduced by a factor 1 − θ_{M_iQ} in each subsequent generation. The LD between marker M_i and Q is at the current generation. Assume that the marker M₁ locates at position 0 cM, marker M₂ locates at position 1 cM, marker M₃ locates at position 2 cM, and marker M₄ locates at position 3 cM. Under the assumption of no interference, we may calculate the recombination fraction Inline graphic by Haldane's map function, where Ω_{M_iM_j} is the map distance between marker M_i and marker M_j. Similarly, the recombination fraction θ_{M_iQ} can be calculated by the distance Ω_{M_iQ} between QTL Q and marker M_i, i = 1, · · · , 4. Suppose that the QTL Q is located along the horizontal axis; i.e., it moves from 0 to 3 cM. Figure 4 shows the power curves of the test statistics F_4,a, F_4,ad, F_3,a, F_3,ad, F_2,a, and F_2,ad against the location of QTL Q for a dominant mode of inheritance (a = d = 1) and a recessive mode of inheritance (a = 1.0, d = −0.5), respectively. The powers of F_4,a and F_4,ad with four markers in the model are generally high across the location of QTL Q, since at least one marker is close to the QTL Q. The power of F_3,a and F_3,ad using three markers in the model is similar to that of four markers, except that QTL Q locates far above marker M₃, i.e., λ_M₁Q ≥ 2.3cM. The power of F_2,a and F_2,ad using two markers in the model is high when the QTL is close to markers M₁ and M₂. However, once the QTL is far above marker M₂ (i.e., λ_M₁Q ≥ 1.3cM ), the power of F_2,a and F_2,ad using two markers in the model decreases very quickly. Figure 4 implies that multiple-marker LD analysis has high power in fine mapping of QTL. Moreover, the power of test statistic F_k_,a, which tests only the additive effect, is higher than that of F_k_,ad, which tests both the additive and dominance effects through the proposed model. The reason is that the degrees of freedom of test statistics increases if the dominance effect is added to the test statistics. Figure 5 shows the power curves of test statistic F_4,ad against the position of markers M₁, · · · , M₄ for different mutation age at a 0.01 significance level. The trait locus Q locates at position 10 cM. The four markers flank the trait locus Q; two markers are on each side of the QTL with equal distance to each other as follows: M₂ = 5 + M₁/2, M₃ = 15 − M₁/2, M₄ = 20 − M₁. Here M_i also denotes the location in centimorgans of marker M_i. As the mutation ages, the power decreases and the power can be high only when the markers are close to the trait locus.

Figure 4.— — Power of test statistics F_4,a, F_4,ad, F_3,a, F_3,ad, F_2,a, and F_2,ad against location of QTL Q at a 0.01 significance level. The parameters are given by, q₁ = 0.5, , , , i, j = 1, · · · , 4, i ≠ j, π₁₂_Q = 0.5, δ₁₂_Q = 0.25, familial effect variance , heritability h² = 0.15 and sample size n = 100, m = 50, s = 30, mutation age T = 60 for (A) a dominant mode of inheritance a = d = 1.0 and (B) a recessive mode of inheritance a = 1.0, d = −0.5, respectively. Marker M₁ locates at position 0 cM, marker M₂ locates at position 1 cM, marker M₃ locates at position 2 cM, and marker M₄ locates at position 3 cM. The location of QTL Q is along the horizontal axis; *i.e*., it moves from 0 to 3 cM.

Figure 5.— — Power of test statistic F_4,ad for mutation age T = 30, T = 40, T = 50, T = 60, T = 70 against position of markers *M_i*, i = 1, · · · , 4 at a 0.01 significance level. The QTL Q locates at position 10 cM. The four markers flank the trait locus Q; two markers are on each side of the QTL with equal distance to each other as follows: M₂ = 5 + M₁/2, M₃ = 15 − M₁/2, M₄ = 20 − M₁. q₁ = 0.5, , , , i, j = 1, · · · , 4, i ≠ j, heritability h² = 0.15, polygenic effect variance and sample size n = 40, m = 30, s = 20 for (A) a dominant mode of inheritance a = d = 1.0 and (B) a recessive mode of inheritance a = 1.0, d = −0.5, respectively.

Figure 6 shows the required number of trio families or families with both parents and two offspring for the test statistics F_4,a, F_3,a, F_2,a, and F_1,a against heritability h² at a significance level 0.01 and power 0.8. For a favorable case (Figure 6, A and C), the parameters are given by Inline graphic , and for i, j = 1, · · · , 4, i ≠ j. For a less favorable case (Figure 6, B and D), the parameters are given by q₁ = 0.2, , , and for i, j = 1, · · · , 4, i ≠ j. For the favorable case, the required number of families of test statistics F_4,a and F_3,a is <200 and that of F_2,a is <600 if heritability h² is >0.1. For the less favorable case, the required number of families of test statistics F_4,a and F_3,a is <500 and that of F_2,a is <700 if heritability h² is >0.1. The required number of families of test statistics F_1,a is very large for both favorable and less favorable cases.

AN EXAMPLE

The proposed method is applied to the Genetic Analysis Workshop 12 German asthma data (Meyers et al. 2001). The data consist of 97 nuclear families, including 415 persons. Seventy-four families have two children, 19 have three children, and 4 have four children. Wjst et al. (1999) perform linkage analysis for total serum IgE by a nonparametric statistic of MAPMAKER/SIBS 2.1. Three markers on chromosome 1 are shown to be linked with immunoglobulin E (IGE) level, i.e., marker D1S207 at position 118.1 cM, marker D1S221 at position 146.7 cM, and marker D1S502 at position 151.2 cM. In Fan and Jung (2003), we analyze the data using sibships and confirm the result of Wjst et al. (1999). By the method proposed in this article, we analyze the data again. The dominance variance of log(IGE) is significantly >0 at position 149.85 cM (P-value, 0.00075; compared with the P-value of 0.01 in Fan and Jung 2003). On this basis, we collapse alleles 6, 8, and 10 as allele M₁ at marker D1S207 and others as allele m₁. At marker D1S221, alleles 5, 6, and 7 are collapsed as allele M₂ and other alleles as allele m₂. At marker D1S502, we collapse alleles 7, 8, and 12 as allele M₃ and others as allele m₃. Then, we find that coefficient δ₂ is significantly different from 0 at position 149.85 cM, with a P-value of 0.034 by likelihood-ratio test (compared with the P-value of 0.0475 in Fan and Jung 2003) and a P-value 0.034 by F-test (compared with the P-value 0.0484 in Fan and Jung 2003). The estimation is δ̂₂ = 0.76. Hence, we are able to confirm the result of Wjst et al. (1999) and find that marker D1S221 is associated with log(IGE).

Compared with the results of Fan and Jung (2003), the evidence in the above paragraph is stronger since the P-values are smaller. There are two reasons for this. In this article, all family members are used in the analysis (compared with only sibships used in Fan and Jung 2003). This article used three markers in the analysis (compared with only two markers used in Fan and Jung 2003). Hence, the proposed model improves the performance of the previous method.

DISCUSSION

On the basis of multiple diallelic markers, this article proposes variance component models for high-resolution joint linkage and association mapping of QTL. The models extend our previous work using two diallelic markers in analysis and incorporate genetic-marker information into the models (Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan et al. 2005). By analytical analysis, it is shown that linkage disequilibrium measures and genetic effects are incorporated in the mean coefficients. On the basis of marker information, a multipoint interval mapping method is provided to estimate the proportion of allele-sharing IBD and probability of sharing two alleles IBD at a putative QTL for a sib-pair. It is shown that recombination fractions, i.e., linkage information, are contained in variance-covariance matrices. Therefore, the proposed methods model both association and linkage in a unified model.

In the literature, there is plenty of research for linkage mapping of QTL (Amos 1994; Fulker et al. 1995; Almasy and Blangero 1998). The linkage evidence can be detected by fitting model (6) as the first step on the basis of a sparse genetic map. In this article, we put more effort into high-resolution linkage disequilibrium mapping of QTL in the presence of prior linkage evidence. To test the association between the trait locus and the markers, both likelihood-ratio tests and F-tests can be constructed on the basis of the proposed models. In addition, analytical formulas of noncentrality parameter approximations of the F-test statistics are provided. After comparing it with the AbAw approach, it is found that the method proposed in this article is more powerful and advantageous on the basis of simulation study and power calculation. By power and sample size comparison, it is shown that models that use more markers may have higher power than models that use less markers. Multiple-marker analysis can be more advantageous and has higher power in fine mapping QTL.

In an association study, population stratification can have a huge impact on a study, which leads to high false positives (Ewens and Spielman 1995). Zhao and Xiong (2002) proposed unbiased quantitative population association tests to investigate the issue. In this article, we perform type I error calculations. We allow for the very extreme form of population admixture, in which each family is drawn from a different stratum (Abecasis et al. 2000a). Type I error rates of the proposed test statistics are calculated to investigate the behaviors of the test statistics under the null distribution. Five test cases including population admixture are considered to investigate the type I error rates. The results show the proposed models and methods have correct type I error rates for most cases and are robust.

In a QTL mapping study, a strategy may be taken as follows. First, linkage analysis can be carried out using a sparse genetic map. Then, an association study can be performed using a dense genetic map for high-resolution mapping of the trait. The basic idea is to take advantage of linkage analysis for prior linkage information. In the meantime, one can take advantage of the high-resolution association study for fine mapping a genetic trait. It is well known that linkage analysis is robust; i.e., the false-positive rates are not high. However, the resolution of linkage analysis can be low. On the other hand, the resolution of the association study is high. But the association study is prone to false positives caused by population stratifications. Using the method proposed in this article, it is more likely to avoid high false-positive rates by performing an association study in the presence of prior linkage. The low resolution of a prior linkage analysis can be remedied by the follow-up high-resolution association study.

In recent years, there has been great interest in linkage disequilibrium mapping of QTL (Allison 1997; Rabinowitz 1997; Zhang and Zhao 2001). Various methods of joint analysis of linkage and association are proposed by researchers (Almasy et al. 1999; George et al. 1999; Martin et al. 2000). On the basis of variance component models, a combined linkage and association AbAw approach has been developed to decompose association effects into within- and between-family components (Fulker et al. 1999; Abecasis et al. 2000a,b, 2001; Cardon 2000; Sham et al. 2000). However, most research is limited to using one diallelic marker a time to model the association of QTL. This article proposes use of multiple markers to model the association and linkage. The genetic effects are orthogonally decomposed into additive and dominance effects. The method has the advantage of high-resolution dissection of genetic traits in an era in which dense marker maps are available (International SNP Map Working Group 2001; Kong et al. 2002). It is hoped that the current research may stimulate more interest in building models for joint linkage disequilibrium and linkage mapping of QTL.

In a genetics study, the first-hand data are usually genotyping information. The methods developed in this article can be directly used in analyzing quantitative and genotyping data of nuclear families by combining linkage and association information together. In the meantime, one may argue the use of haplotype data in an analysis that can be constructed on the basis of genotyping data. The question is an important issue as the haplotype map project will soon be completed and haplotype data will be readily available (International HapMap Consortium 2003; HapMap project, http://www.hapmap.org). The proposed method deals with diallelic markers. When the markers are not diallelic as is the case in the analyzed data, we collapse alleles into two groups to form two allele types. The hidden question is whether this collapsing has any consequence in type I error because the collapsing is not unique, which leads to the selection issue. It is important to develop appropriate models and handy algorithms in linkage and association mapping of complex diseases using haplotype/multiallelic marker data. It would be interesting to see a comparison of the two approaches. In Jung et al. (2004), a population-based regression approach is explored for association mapping of QTL using haplotype data. It is important to extend the research to utilize both population and pedigree data based on multiallelic markers/haplotypes.

One potential problem of using multiple markers in analysis is that the degrees of freedom of test statistics can be large, which may lead to low power. Moreover, the number of LD measures can be large. Thus, selection of appropriate markers for analysis is one issue that needs careful consideration. The optimal number of markers needed depends on a specific trait in a study. Also, it depends on the LD measures among the QTL and the markers. In data analysis, the markers that show significance in the model can be included in the final analysis. On the one hand, it would not be a good strategy to use many diallelic markers in the model. More markers will lead to higher degrees of freedom in test statistics. The number of markers that show significance is unlikely to be too large. Usually, using three or four relevant markers in an analysis would be worthwhile, since it may have higher power than a two- or one-marker analysis. In the meantime, the degrees of freedom of test statistics and number of LD measures would not be too big using three or four markers in an analysis. The second problem is the existence of a dominance trait effect. If the dominance effect is present, one may lose power by excluding it from analysis (Fan and Xiong 2002). However, one may get low power by testing hypothesis α₁ = · · · = α_k = δ₁ = · · · = δ_k = 0, if the dominance effect is not significantly present to influence the trait values, due to the increase of degrees of freedom of test statistics.

So far, only one trait locus Q is assumed to be located in the chromosome region. Suppose that there are multiple QTL in the region. The mixed-effect model (1) can still be used in QTL mapping. In addition, suppose that the trait value is influenced by unlinked trait loci in different regions. Then model (3) needs to be generalized to use markers from different regions in analysis (Hoh and Ott 2003). If multiple-trait loci are present, other issues such as epistasis need more in-depth investigation. For IBD estimation, we follow the method proposed by Fulker et al. (1995) and Almasy and Blangero (1998). If there is LD between the trait and markers, LD among markers would also be expected and needs to be incorporated in estimating IBD. However, it is not clear how to achieve this. This is a very interesting and important research area for future study. Better IBD estimates would lead to a fitted variance-covariance structure that is a better approximation of the true variance-covariance structure. This would improve the performance of the proposed models.

Acknowledgments

We thank G. Gibson for kindness and patience in handling this article; and we thank two anonymous reviewers for very detailed and thoughtful critiques, which improved the article greatly. We are grateful to G. R. Abecasis for kindly providing the simulation program LDSIMUL to generate simulated data sets. R. Fan was supported partially by a research fellowship from the Alexander von Humboldt Foundation, Germany, by an international research travel assistance grant, Texas A&M University, and by the National Science Foundation Grant DMS-0505025.

APPENDIX A

Taking variance-covariance among x_ij, z_ij, y_i of the mixed-effect model (1) leads to the following variance-covariance equations:

In a similar way to that in Fan and Xiong (2002)(Appendix A), the following expectations, variance, and covariances can be derived accordingly: Ex_ij = 0, Ez_ij = 0, Inline graphic for j, l = 1, · · · , k, j ≠ l. Plugging the above quantities into (A1) gives

Therefore, the coefficients of (5) are derived.

APPENDIX B

To simplify notations, we omit subscripts ij from π_ijQ, π_ijM₁, · · · , π_{ijM_k}, Δ_ijM₁, · · · , Δ_{ijM_k} in appendixes b and c. Taking variance-covariance among π_Q, π_{M_j}, y_i of Equation 7 leads to

From Elston and Keats (1985) and Almasy and Blangero (1998), we have

Plugging the above quantities into Equation B1 gives

which leads to

APPENDIX C

Taking variance-covariance among Δ_Q, π_{M_j}, Δ_{M_l} of Equation 8 leads to

As in appendix B, the following covariances are from Elston and Keats (1985), Almasy and Blangero (1998), and Fan and Jung (2003),

where Inline graphic . Plugging the above results into the equation (C1), we have a submatrix block equation,

where

Therefore, we have from Harville (1997) that

The equation Inline graphic leads to

Moreover, we have

APPENDIX D

To derive a₁, a₂, a₃ in approximation (9), we assume three subsamples of a population: n individuals; m trio families, each having both parents and a single child; and s nuclear families, each having both parents and two offspring.

For each y_i of the n individuals, Σ_i = σ² and X_i = (1, x_i₁, · · · , x_ik, z_i₁, · · · , z_ik), i = 1, · · · , n. When the sample size n of individuals is large, the large number law leads to
Therefore, we have the approximation
D1
where V_A and V_D are additive and dominance variance-covariance matrices defined by (4).
For the ith trio family, let (y_fi, y_mi, y_i₁)^τ be the trait values and X_i = (X_fi, X_mi, X_i₁)^τ be the related model matrix, i = n + 1, · · · , n + m. In the same way as that of Fan and Xiong (2003)(Appendix A), the covariance matrix between parents and their offspring can be shown to be
D2

where O_k is a zero k × k matrix. For each of the m trio families, the variance-covariance matrix

The inverse matrix of Σ_i is

By the above formulas, we can show the following:

For the ith family that is composed of both parents and two offspring, let (y_fi, y_mi, y_i₁, y_i₂)^τ be the trait values and X_i = (X_fi, X_mi, X_i₁, X_i₂)^τ be the related model matrix, i = n + m + 1, · · · , n + m + s. In the same way as that of Fan and Xiong (2003)(Appendix C), it can be shown that
D4
For each of the s families, the inverse variance-covariance matrix
D5
where . Using (D2), (D4), and (D5), we can show
D6
where the constants are given by . Combining the n individuals, m trio families, and s families with two offspring, the equations (D1), (D3), and (D6) lead to , where
D7

APPENDIX E

Using (D2) and (D4), we can show approximation (10). The constants b₁ and b₂ are given by

References

Abecasis, G. R., L. R. Cardon and W. O. C. Cookson, 2000. a A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66: 279–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2000. b Pedigree tests of linkage disequilibrium. Eur. J. Hum. Genet. 8: 545–551. [DOI] [PubMed] [Google Scholar]
Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2001. The power to detect linkage disequilibrium with quantitative traits in selected samples. Am. J. Hum. Genet. 68: 1463–1474. [DOI] [PMC free article] [PubMed] [Google Scholar]
Allison, D. B., 1997. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60: 676–690. [PMC free article] [PubMed] [Google Scholar]
Almasy, L., and J. Blangero, 1998. Multipoint quantitative trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62: 1198–1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Almasy, L., J. T. Williams, T. D. Dyer and J. Blangero, 1999. Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet. Epidemiol. 17(Suppl. 1): S31–S36. [DOI] [PubMed] [Google Scholar]
Amos, C. I., 1994. Robust variance-components approach for assessing linkage in pedigrees. Am. J. Hum. Genet. 54: 534–543. [PMC free article] [PubMed] [Google Scholar]
Amos, C. I., R. C. Elston, A. F. Wilson and J. E. Bailey-Wilson, 1989. A more powerful robust sib-pair test of linkage for quantitative traits. Genet. Epidemiol. 6: 435–449. [DOI] [PubMed] [Google Scholar]
Cardon, L. R., 2000. A sib-pair regression model of linkage disequilibrium for quantitative traits. Hum. Hered. 50: 350–358. [DOI] [PubMed] [Google Scholar]
Cotterman, C. W., 1940 A calculus for statistico-genetics. Ph.D. Thesis, Ohio State University, Columbus, OH.
Elston, R. C., and B. J. B. Keats, 1985. Genetic analysis workshop III: sib pair analyses to determine linkage groups and to order loci. Genet. Epidemiol. 2: 211–213. [Google Scholar]
Ewens, W. J., and R. S. Spielman, 1995. The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57: 455–464. [PMC free article] [PubMed] [Google Scholar]
Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, London. [DOI] [PMC free article] [PubMed]
Fan, R., and J. Jung, 2003. High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum. Hered. 56: 166–187. [DOI] [PubMed] [Google Scholar]
Fan, R., and M. Xiong, 2002. High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur. J. Hum. Genet. 10: 607–615. [DOI] [PubMed] [Google Scholar]
Fan, R., and M. Xiong, 2003. Combined high resolution linkage and association mapping of quantitative trait loci. Eur. J. Hum. Genet. 11: 125–137. [DOI] [PubMed] [Google Scholar]
Fan, R., C. Spinka, L. Jin and J. Jung, 2005. Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur. J. Hum. Genet. 13: 216–231. [DOI] [PubMed] [Google Scholar]
Fulker, D. W., S. S. Cherny and L. R. Cardon, 1995. Multiple interval mapping of quantitative trait loci, using sib-pairs. Am. J. Hum. Genet. 56: 1224–1233. [PMC free article] [PubMed] [Google Scholar]
Fulker, D. W., S. S. Cherny, P. C. Sham and J. K. Hewitt, 1999. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64: 259–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
George, V., H. K. Tiwari, X. F. Zhu and R. C. Elston, 1999. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am. J. Hum. Genet. 65: 236–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldgar, D. E., and R. S. Oniki, 1992. Comparison of a multipoint identity-by-descent method with parametric multipoint linkage analysis for mapping quantitative traits. Am. J. Hum. Genet. 50: 598–606. [PMC free article] [PubMed] [Google Scholar]
Graybill, F. A., 1976 Theory and Application of the Linear Model. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.
Harville, D. A., 1997 Matrix Algebra From a Statistician's Perspective. Springer, Berlin/Heidelberg, Germany/New York.
Haseman, J. K., and R. C. Elston, 1972. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2: 3–19. [DOI] [PubMed] [Google Scholar]
Hoh, J., and J. Ott, 2003. Mathematical multi-locus approaches to localizing complex human trait genes. Nat. Rev. Genet. 4: 701–709. [DOI] [PubMed] [Google Scholar]
International HapMap Consortium, 2003. The international HapMap project. Nature 426: 789–796. [DOI] [PubMed] [Google Scholar]
International SNP Map Working Group, 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933. [DOI] [PubMed] [Google Scholar]
Jennrich, R. I., and M. D. Schluchter, 1986. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42: 805–820. [PubMed] [Google Scholar]
Jung, J., R. Fan and L. Jin, 2004 Haplotype association mapping of quantitative trait loci, a population based approach. Abstracts of the 54th Annual Meeting of the American Society of Human Genetics, Toronto, Abstract 1970.
Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson et al., 2002. A high resolution recombination map of the human genome. Nat. Genet. 31: 241–247. [DOI] [PubMed] [Google Scholar]
Lange, K., 2002 Mathematical and Statistical Methods for Genetic Analysis, Ed. 2. Springer, Berlin/Heidelberg, Germany/New York.
Martin, E. R., S. A. Monks, L. L. Warren and N. L. Kaplan, 2000. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67: 146–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meyers, D. A., M. Wjst and C. Ober, 2001. Description of three data sets: collaborative study on the genetics of asthma (CSGA), the German affected sib pair study, and the Hutterites of South Dakota. Genet. Epidemiol. 21(Suppl. 1): S4–S8. [DOI] [PubMed] [Google Scholar]
Pinheiro, J. C., and D. M. Bates, 2000 Mixed-Effects in S and S-plus. Springer, New York.
Pratt, S. C., M. Daly and L. Kruglyak, 2000. Exact multipoint quantitative-trait linkage analysis in pedigrees by variance components. Am. J. Hum. Genet. 66: 1153–1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rabinowitz, D., 1997. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47: 342–350. [DOI] [PubMed] [Google Scholar]
Searle, S. R., G. Casella and C. E. McCulloch, 1992 Variance Components. John Wiley & Sons, New York.
Sham, P. C., S. S. Cherny, S. Purcell and J. K. Hewitt, 2000. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66: 1616–1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wjst, M., G. Fischer, T. Immervoll, M. Jung, K. Saar et al., 1999. A genome-wide search for linkage to asthma. Genomics 58: 1–8. [DOI] [PubMed] [Google Scholar]
Zhang, S. L., and H. Y. Zhao, 2001. Quantitative similarity-based association tests using population samples. Am. J. Hum. Genet. 69: 601–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao, J., and M. Xiong, 2002. Unbiased quantitative population association test. Am. J. Hum. Genet. 71(Suppl.): 568. [Google Scholar]
Zhao, J., W. Li and M. Xiong, 2001. Population based linkage disequilibrium mapping of QTL: an application to simulated data in an isolated population. Genet. Epidemiol. 21(S1): S655–S659. [DOI] [PubMed] [Google Scholar]
Zhu, X. F., and R. C. Elston, 2000. Power comparison of regression methods to test quantitative traits for association and linkage. Genet. Epidemiol. 18: 322–330. [DOI] [PubMed] [Google Scholar]

[bib1] Abecasis, G. R., L. R. Cardon and W. O. C. Cookson, 2000. a A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66: 279–292. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2000. b Pedigree tests of linkage disequilibrium. Eur. J. Hum. Genet. 8: 545–551. [DOI] [PubMed] [Google Scholar]

[bib3] Abecasis, G. R., W. O. C. Cookson and L. R. Cardon, 2001. The power to detect linkage disequilibrium with quantitative traits in selected samples. Am. J. Hum. Genet. 68: 1463–1474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Allison, D. B., 1997. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60: 676–690. [PMC free article] [PubMed] [Google Scholar]

[bib5] Almasy, L., and J. Blangero, 1998. Multipoint quantitative trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62: 1198–1211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Almasy, L., J. T. Williams, T. D. Dyer and J. Blangero, 1999. Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet. Epidemiol. 17(Suppl. 1): S31–S36. [DOI] [PubMed] [Google Scholar]

[bib7] Amos, C. I., 1994. Robust variance-components approach for assessing linkage in pedigrees. Am. J. Hum. Genet. 54: 534–543. [PMC free article] [PubMed] [Google Scholar]

[bib8] Amos, C. I., R. C. Elston, A. F. Wilson and J. E. Bailey-Wilson, 1989. A more powerful robust sib-pair test of linkage for quantitative traits. Genet. Epidemiol. 6: 435–449. [DOI] [PubMed] [Google Scholar]

[bib9] Cardon, L. R., 2000. A sib-pair regression model of linkage disequilibrium for quantitative traits. Hum. Hered. 50: 350–358. [DOI] [PubMed] [Google Scholar]

[bib10] Cotterman, C. W., 1940 A calculus for statistico-genetics. Ph.D. Thesis, Ohio State University, Columbus, OH.

[bib11] Elston, R. C., and B. J. B. Keats, 1985. Genetic analysis workshop III: sib pair analyses to determine linkage groups and to order loci. Genet. Epidemiol. 2: 211–213. [Google Scholar]

[bib12] Ewens, W. J., and R. S. Spielman, 1995. The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57: 455–464. [PMC free article] [PubMed] [Google Scholar]

[bib13] Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, London. [DOI] [PMC free article] [PubMed]

[bib14] Fan, R., and J. Jung, 2003. High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum. Hered. 56: 166–187. [DOI] [PubMed] [Google Scholar]

[bib15] Fan, R., and M. Xiong, 2002. High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur. J. Hum. Genet. 10: 607–615. [DOI] [PubMed] [Google Scholar]

[bib16] Fan, R., and M. Xiong, 2003. Combined high resolution linkage and association mapping of quantitative trait loci. Eur. J. Hum. Genet. 11: 125–137. [DOI] [PubMed] [Google Scholar]

[bib17] Fan, R., C. Spinka, L. Jin and J. Jung, 2005. Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur. J. Hum. Genet. 13: 216–231. [DOI] [PubMed] [Google Scholar]

[bib18] Fulker, D. W., S. S. Cherny and L. R. Cardon, 1995. Multiple interval mapping of quantitative trait loci, using sib-pairs. Am. J. Hum. Genet. 56: 1224–1233. [PMC free article] [PubMed] [Google Scholar]

[bib19] Fulker, D. W., S. S. Cherny, P. C. Sham and J. K. Hewitt, 1999. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64: 259–267. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] George, V., H. K. Tiwari, X. F. Zhu and R. C. Elston, 1999. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am. J. Hum. Genet. 65: 236–245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Goldgar, D. E., and R. S. Oniki, 1992. Comparison of a multipoint identity-by-descent method with parametric multipoint linkage analysis for mapping quantitative traits. Am. J. Hum. Genet. 50: 598–606. [PMC free article] [PubMed] [Google Scholar]

[bib22] Graybill, F. A., 1976 Theory and Application of the Linear Model. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.

[bib23] Harville, D. A., 1997 Matrix Algebra From a Statistician's Perspective. Springer, Berlin/Heidelberg, Germany/New York.

[bib24] Haseman, J. K., and R. C. Elston, 1972. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2: 3–19. [DOI] [PubMed] [Google Scholar]

[bib25] Hoh, J., and J. Ott, 2003. Mathematical multi-locus approaches to localizing complex human trait genes. Nat. Rev. Genet. 4: 701–709. [DOI] [PubMed] [Google Scholar]

[bib26] International HapMap Consortium, 2003. The international HapMap project. Nature 426: 789–796. [DOI] [PubMed] [Google Scholar]

[bib27] International SNP Map Working Group, 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–933. [DOI] [PubMed] [Google Scholar]

[bib28] Jennrich, R. I., and M. D. Schluchter, 1986. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42: 805–820. [PubMed] [Google Scholar]

[bib29] Jung, J., R. Fan and L. Jin, 2004 Haplotype association mapping of quantitative trait loci, a population based approach. Abstracts of the 54th Annual Meeting of the American Society of Human Genetics, Toronto, Abstract 1970.

[bib30] Kong, A., D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson et al., 2002. A high resolution recombination map of the human genome. Nat. Genet. 31: 241–247. [DOI] [PubMed] [Google Scholar]

[bib31] Lange, K., 2002 Mathematical and Statistical Methods for Genetic Analysis, Ed. 2. Springer, Berlin/Heidelberg, Germany/New York.

[bib32] Martin, E. R., S. A. Monks, L. L. Warren and N. L. Kaplan, 2000. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67: 146–154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Meyers, D. A., M. Wjst and C. Ober, 2001. Description of three data sets: collaborative study on the genetics of asthma (CSGA), the German affected sib pair study, and the Hutterites of South Dakota. Genet. Epidemiol. 21(Suppl. 1): S4–S8. [DOI] [PubMed] [Google Scholar]

[bib34] Pinheiro, J. C., and D. M. Bates, 2000 Mixed-Effects in S and S-plus. Springer, New York.

[bib35] Pratt, S. C., M. Daly and L. Kruglyak, 2000. Exact multipoint quantitative-trait linkage analysis in pedigrees by variance components. Am. J. Hum. Genet. 66: 1153–1157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Rabinowitz, D., 1997. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47: 342–350. [DOI] [PubMed] [Google Scholar]

[bib37] Searle, S. R., G. Casella and C. E. McCulloch, 1992 Variance Components. John Wiley & Sons, New York.

[bib38] Sham, P. C., S. S. Cherny, S. Purcell and J. K. Hewitt, 2000. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66: 1616–1630. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Wjst, M., G. Fischer, T. Immervoll, M. Jung, K. Saar et al., 1999. A genome-wide search for linkage to asthma. Genomics 58: 1–8. [DOI] [PubMed] [Google Scholar]

[bib40] Zhang, S. L., and H. Y. Zhao, 2001. Quantitative similarity-based association tests using population samples. Am. J. Hum. Genet. 69: 601–614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Zhao, J., and M. Xiong, 2002. Unbiased quantitative population association test. Am. J. Hum. Genet. 71(Suppl.): 568. [Google Scholar]

[bib42] Zhao, J., W. Li and M. Xiong, 2001. Population based linkage disequilibrium mapping of QTL: an application to simulated data in an isolated population. Genet. Epidemiol. 21(S1): S655–S659. [DOI] [PubMed] [Google Scholar]

[bib43] Zhu, X. F., and R. C. Elston, 2000. Power comparison of regression methods to test quantitative traits for association and linkage. Genet. Epidemiol. 18: 322–330. [DOI] [PubMed] [Google Scholar]

PERMALINK

Combined Linkage and Association Mapping of Quantitative Trait Loci by Multiple Markers

Jeesun Jung

Ruzong Fan

Lei Jin

Abstract

MODEL

PROPERTY OF REGRESSION COEFFICIENTS AND ASSOCIATION TESTS

TABLE 2.

TABLE 3.

PARAMETER ESTIMATIONS

IBD estimations:

Estimations of model coefficients and variance-covariance matrix:

F-TESTS AND NONCENTRALITY PARAMETER APPROXIMATIONS

Combined analysis of population and family data:

Nuclear family data:

TYPE I ERROR RATES

TABLE 1.

POWER CALCULATION AND COMPARISON

Comparison with the AbAw approach:

Comparisons of sample size and power of LD mapping:

Figure 1.—

Figure 2.—

Figure 3.—

Figure 4.—

Figure 5.—

Figure 6.—

AN EXAMPLE

DISCUSSION

Acknowledgments

APPENDIX A

APPENDIX B

APPENDIX C

APPENDIX D

APPENDIX E

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases