A New Association Test to Test Multiple-Marker Association

Xuexia Wang; Shuanglin Zhang; Qiuying Sha

doi:10.1002/gepi.20369

. Author manuscript; available in PMC: 2013 Feb 14.

Published in final edited form as: Genet Epidemiol. 2009 Feb;33(2):164–171. doi: 10.1002/gepi.20369

A New Association Test to Test Multiple-Marker Association

Xuexia Wang ¹, Shuanglin Zhang ^1,², Qiuying Sha ^1,^*

PMCID: PMC3572742 NIHMSID: NIHMS439856 PMID: 18720476

Abstract

As a result of the availability of a very large numbers of single nucleotide polymorphisms, there has been increasing interest in genetic associations involving several closely linked loci. Methods for detection of association between traits and multiple genetic polymorphisms are being rapidly developed, which include the Hotelling’s T² test and the LD contrast (LDC) tests. The Hotelling’s T² test can be considered as a test to compare the means of the genotypic score in cases and controls; while the (LDC) tests can be considered as a test to compare the variance-covariance matrices of the genotypic score in cases and controls. In this article, we propose a likelihood ratio test which simultaneously compares the means and the variance-covariance matrices of the genotypic score in cases and controls. We use simulation studies to evaluate the type I error rate of the proposed test, and compare the power of the test with the Hotelling’s T² test and the LDC tests. The simulation results show that when marginal effects of the disease loci are strong, the proposed test is more powerful than the LDC tests and similar with or slightly less powerful than the Hotelling’s T² test. If there are interaction effects and weak or no marginal effects, the proposed method is more powerful than the Hotelling’s T² test and slightly more powerful than the LDC tests.

Keywords: likelihood ratio test, principle component analysis, association study, complex disease

INTRODUCTION

Association tests have been used for detecting the association between the genetic variants and the disease phenotypes. Recently, as the result of the availability of a very large numbers of single nucleotide polymorphisms (SNPs), there has been increasing interest in genetic associations involving several closely linked loci. There is strong evidence that several mutations within a single gene can interact to create a super allele that has a large effect on the observed phenotype [Schaid et al., 2002; Hollox et al., 2001; Clark et al., 1998; Tavtigian et al., 2001; Drysdale et al., 2000], which emphasizes the importance of the analysis of multiple SNPs that jointly represent variation within common transcripts and other functional regions, such as promoters. In recent years, many multi-marker association tests have been proposed which include haplotype-based methods that compare the distribution of haplotypes in cases with those of haplotypes in controls [Sham, 1998; Zhao et al., 2000; Schaid et al., 2002; Zaykin et al., 2002; Sha et al., 2005, 2007]; the Hotelling’s T² test [Xiong et al., 2002; Fan and Knapp, 2003; Chapman et al., 2003; Wallace et al., 2006]; and the linkage disequilibrium (LD) contrast tests [Hayes et al., 2004; Nielsen et al., 2004; Zaykin et al., 2006; Wang et al., 2007]. Chapman et al. [2003] showed that in most cases the Hotelling’s T² test is more powerful than haplotype-based methods. However, when marginal effects are weak and interaction effects are strong, the Hotelling’s T² test which depends on the linear combination of the marginal effects will lose power dramatically. It has been noted that the extent of LD can be different between cases and controls in a region of genetic association, and the case-control LD comparison can aid the analysis in a region of putative association [Hayes et al., 2004]. In the context of association mapping, Nielsen et al. [2004] presented a direct LD comparison approach involving two biallelic loci and noted that, in certain situations, a test that directly compares the LD extent between cases and controls can be a powerful alternative to either haplotype-based or single-marker approaches. More recently, Zaykin et al. [2006] and Wang et al. [2007] suggested new LD contrast (LDC) tests to compare the matrices of pair-wise LD in cases and controls and demonstrated that the new LDC tests may be more powerful than Hotelling’s T² test in the presence of gene-gene interaction.

The LDC tests can be considered as a test to compare the variance of the genotypic score in cases and controls; while the Hotelling’s T² test can be considered as a test to compare the mean of the genotypic score in cases and controls. In this article, we propose a test to simultaneously compare the mean and variance of the genotypic score between cases and controls. We used simulation studies to compare the power of the proposed method with the Hotelling’s T² test and the LDC tests. The simulation results show that when marginal effects of the disease loci are strong, the proposed test is more powerful than the LDC tests and similar with or slightly less powerful than the Hotelling’s T² test. If there are interaction effects and weak or no marginal effects, the proposed method is more powerful than the Hotelling’s T² test and slightly more powerful than the LDC tests.

METHOD

Consider a sample of n cases and m controls. Suppose that there are k biallelic markers that have been genotyped for each of the sampled individuals. The jth marker has alleles b_j and B_j. Define a numerical code of the genotype of the jth marker for the ith case:

x_{ij} = {\begin{matrix} 1, & B_{j} B_{j} \\ 0, & B_{j} b_{j} \\ - 1, & b_{j} b_{j} \end{matrix} .

Similarly, we define a numerical code y_ij of the genotype of the jth marker for the ith control. Let

X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{ik})}^{T}, Y_{i} = {(y_{i 1}, \dots, y_{ik})}^{T},

X̅ = \frac{1}{n} \sum_{i = 1}^{n} X_{i}, Y̅ = \frac{1}{m} \sum_{i = 1}^{m} Y_{i},

S = \frac{1}{m + n - 2} [\sum_{i = 1}^{n} (X_{i} - X̅) {(X_{i} - X̅)}^{T} + \sum_{i = 1}^{m} (Y_{i} - Y̅) {(Y_{i} - Y̅)}^{T}] .

The Hotelling’s T² [Xiong et al., 2002] test statistic is given by

T^{2} = \frac{nm}{n + m} {(X̅ - Y̅)}^{T} S^{- 1} (X̅ - Y̅) .

When k = 1, the Hotelling’s T² test statistic is the square of the standard student t test statistic which is given by

t = \frac{X̅ - Y̅}{\sqrt{\frac{m + n}{mn} S}} .

The student t test is a standard test to compare the means of two populations. Thus, we can consider the Hotelling’s T² test described above as a test to compare the means of genotypic codes in cases and controls.

Let Δ_x and Δ_y denote two k × k matrices with elements $δ_{ij}^{x} and δ_{ij}^{y}$ , respectively. The test statistics of the LDC tests proposed by Zaykin et al. [2006] and Wang et al. [2007] have the form

T = trace [{(Δ_{x} - Δ_{y})}^{T} (Δ_{x} - Δ_{y})] = \sum_{i = 1}^{k} \sum_{j = 1}^{k} {(δ_{ij}^{x} - δ_{ij}^{y})}^{2} .

In Zaykin et al. [2006], Δ_x and Δ_y are the correlation matrices of the genotype codes in cases and controls, where

δ_{ij}^{x} = \frac{\sum_{l = 1}^{n} (x_{li} - {x̄}_{i}) (x_{lj} - {x̄}_{j})}{\sqrt{\sum_{l = 1}^{n} {(x_{li} - {x̄}_{i})}^{2} \sum_{l = 1}^{n} {(x_{lj} - {x̄}_{j})}^{2}}}

and

δ_{ij}^{y} = \frac{\sum_{l = 1}^{m} (y_{li} - ȳ_{i}) (y_{lj} - ȳ_{j})}{\sqrt{\sum_{l = 1}^{m} {(y_{li} - ȳ_{i})}^{2} \sum_{l = 1}^{m} {(y_{lj} - ȳ_{j})}^{2}}}

In Wang et al. [2007], Δ_x and Δ_y are the modified variance-covariance matrices of the genotype codes in cases and controls. From the statistics, we can see that the LDC tests proposed by Zaykin et al. [2006] and Wang et al. [2007] are in fact used to compare the variance-covariance of the genotypic codes in cases and controls.

In this article we proposed a test statistic to compare the means and the variance-covariance matrices of the genotypic scores in cases and controls simultaneously. Suppose X₁, X₂ ⋯ X_n are independent and identically distributed with a mean of μ_case = (μ₁, …, μ_k) and a variance-covariance matrix of Σ_case, and Y₁, Y₂, …, Y_m are independent and identically distributed with a mean of μ_control = (μ₁, …, μ_k) and a variance-covariance matrix of Σ_control. We consider the null hypothesis

H_{0} : μ_{case} = μ_{control} and Σ_{case} = Σ_{control} .

In the following discussion, we use multivariate normal distribution to deduce the log-likelihood ratio test statistic, and then we use this log-likelihood ratio as our test statistic. Although the multivariate normal assumption maybe violated, the log-likelihood ratio may still be a good test.

If X_i and Y_i both follow a multivariate normal distribution, the log-likelihood function is

log L (μ_{case}, μ_{control}, Σ_{case}, Σ_{control}) = - \frac{n}{2} log | Σ_{case} | - \frac{1}{2} \sum_{i = 1}^{n} {(X_{i} - μ_{case})}^{T} Σ_{case}^{- 1} (X_{i} - μ_{case}) - \frac{m}{2} log | Σ_{control} | - \frac{1}{2} \sum_{i = 1}^{m} {(Y_{i} - μ_{control})}^{T} Σ_{control}^{- 1} (Y_{i} - μ_{control}) .

Under the null hypothesis H₀ : μ_case = μ_control and Σ_case = Σ_control, the maximum likelihood estimates of the mean and the variance-covariance matrix are μ̂ and Σ̂, where

μ̂ = \frac{1}{(m + n)} (\sum_{i = 1}^{n} X_{i} + \sum_{i = 1}^{m} Y_{i})

and

Σ̂ = \frac{1}{m + n} [\sum_{i = 1}^{n} (X_{i} - μ̂) {(X_{i} - μ̂)}^{T} + \sum_{i = 1}^{m} (Y_{i} - μ̂) {(Y_{i} - μ̂)}^{T}] .

Under the alternative hypothesis, the maximum likelihood estimates of μ_case, μ_control, Σ_case and Σ_control are μ̂_case, μ̂_control,Σ̂_case and Σ̂_control, respectively, where

{μ̂}_{case} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}, {μ̂}_{control} = \frac{1}{m} \sum_{i = 1}^{m} Y_{i},

{Σ̂}_{case} = \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - {μ̂}_{case}) {(X_{i} - {μ̂}_{case})}^{T}

and

{Σ̂}_{control} = \frac{1}{m} \sum_{i = 1}^{m} (Y_{i} - {μ̂}_{control}) {(Y_{i} - {μ̂}_{control})}^{T} .

So, the log-likelihood ratio test statistic is

LRT = 2 (log L_{H_{a}} - log L_{H_{0}}) = - n log | {Σ̂}_{case} | - m log | {Σ̂}_{control} | + (m + n) log | Σ | .

(2)

We propose to use a permutation procedure to evaluate the P-value of the test instead of using χ² distribution. To evaluate the P-value of the test, let LRT₀ denote the value of the test statistic based on the original data set. For each permutation, we randomly shuffle the case and control states among the sampled individuals and denote the value of the test statistic based on the permuted data set by LRT^per. We perform the permutation procedure many times. Then, the P-value of the test is the proportion of the number of permutations with LRT^per ≥ LRT₀.

If there is a very small determinant of Σ̂_case, Σ̂_control or Σ̂, the log |Σ̂_case|, log |Σ̂_control| or log |Σ| will be very sensitive to a small change of eigenvalues. Hence, we propose to use the principle component (PC) analysis to reduce the sensitivity of the determinants. The PC analysis can also increase the power due to reducing the degrees of the freedom of the test [Sha et al., 2005]. To carry out the PC analysis, let q_j be the eigenvector corresponding to the jth largest eigenvalue λ_j of the variance-covariance matrix Σ̂_control in controls. Then, the jth PC of the ith individual in controls is $z_{ij} = q_{j}^{T} Y_{i}$ and the total variance in controls explained by the jth PC is λ_j/(λ₁ + ⋯ + λ_k). Let z_i1, …, z_il denote the first l PCs of Y_i, that can explain the majority of the total variability, where l is a pre-specified number and will be discussed below. We use a l-dimensional vector (z_i1, …, z_il) instead of a k-dimensional vector Y_i, where l < k, as the new numerical codes for the multi-marker genotype of individual i in controls. Similarly, we calculate the jth PC of the ith individual in cases as $w_{ij} = q_{j}^{T} X_{i}$ and the jth PC of the ith individual in pooled data as $v_{ij} = q_{j}^{T} G_{i}$ ; we use a l-dimensional vector (w_i1, …, w_il) as the new numerical codes for the multi-marker genotype of individual i in cases. Let Z_i = (z_i1, z_i2, …, z_il)^T and W_i = (w_i1, …, w_il)^T. Using the new coding scheme, the log-likelihood ratio test statistic will be

LRT_PC = 2 (log L_{H_{a}} - log L_{H_{0}}) = - n log | {Σ̂}_{case - pc} | - m log | {Σ̂}_{control - pc} | + (m + n) log | {Σ̂}_{pc} |,

where

{μ̂}_{case - pc} = \frac{1}{n} \sum_{i = 1}^{n} W_{i}, {μ̂}_{control - pc} = \frac{1}{m} \sum_{i = 1}^{m} Z_{i},

{μ̂}_{pc} = \frac{1}{(m + n)} (\sum_{i = 1}^{n} W_{i} + \sum_{i = 1}^{m} Z_{i}),

{Σ̂}_{case - pc} = \frac{1}{n} \sum_{i = 1}^{n} (W_{i} - {μ̂}_{case - pc}) (W_{i} - {μ̂}_{case - pc}),

{Σ̂}_{control - pc} = \frac{1}{m} \sum_{i = 1}^{m} (Z_{i} - {μ̂}_{control - pc}) {(Z_{i} - {μ̂}_{control - pc})}^{T}

and

{Σ̂}_{pc} = \frac{1}{m + n} [\sum_{i = 1}^{n} (W_{i} - {μ̂}_{pc}) {(W_{i} - {μ̂}_{pc})}^{T} + \sum_{i = 1}^{m} (Z_{i} - {μ̂}_{pc}) {(Z_{i} - {μ̂}_{pc})}^{T}] .

One question that needs to be answered in the proposed testing procedure is how to choose the value of l. We propose to choose l such that (λ₁ + λ₂ + ⋯ + λ_l)/(λ₁ + λ₂ + ⋯ + λ_k) is greater than a pre-specified value δ to make sure that the majority of the total variation in the data is explained. We call this δ the cutoff value of PC. In the simulation studies, we choose δ equal to 85% and we will give a further discussion about the choice of δ in the discussion section.

SIMULATIONS

We use simulation studies to evaluate the performance of the proposed test LRT_PC and to compare the power of the LRT_PC with four other tests: the LDC test proposed by Zaykin et al. [2006], the modified LD contrast (MLDC) test proposed by Wang et al. [2007], the Hotelling’s T² test developed by Xiong et al. [2002], and the single-marker test. For the k biallelic markers, we calculate the 1 degree freedom χ² test statistics t₁, t₂, …, t_k and use SMT = max{t₁, t₂, …, t_k} as the test statistic of our single-marker test. For each simulation scenario, we simulate 1,000 replicated data sets and use 1,000 permutations to evaluate the P-values of all the tests.

DATA SETS FOR ASSESSING THE TYPE I ERROR

We generate a data set to evaluate the type I error rate of the proposed method using the simulation setting similar to that of Wang et al. [2007]. Briefly, we simulate a set of markers with four SNPs in a local region or a candidate gene. The haplotypes of the four correlated SNPs are simulated on the basis of a multivariate normal distribution with a pair-wise correlation coefficient ρ. Each allele of a haplotype is generated by dichotomizing the marginal normal distribution, and the cutoff is determined by the minor allele frequency. For each individual, we randomly assign disease status. We consider different minor allele frequencies, different values of ρ, and different sample sizes.

DATA SETS FOR POWER COMPARISON

To assess the power of the proposed tests, we consider two sets of simulations which are generated based on a two-locus disease model and a haplotype effect model.

Haplotype effect model

In this set of simulation we use the same method as given in “Data sets for assessing the type I error” section to generate genotypes, except that the number of marker is 10. For a dichotomous trait, we assume that the trait is due to an underlying continuous liability y. The trait y follows a linear model y = g₁ + g₂ + e, where g₁ and g₂ are trait-locus effects of the two haplotypes, and e is a random environment effect. For a haplotype, the trait-locus effect is set in the following way: for a haplotype across the 10 markers, let s_j represent the code of the allele at the jth marker (s_j = 1 for minor allele; s_j = 0 for major allele) and $S = \sum_{j = 1}^{10} s_{j}$ . The trait-locus effect of this haplotype is defined by g = |S−5|/5. Disease status is defined by a threshold Z, such that all individuals with y > Z are classified as cases. In our simulations, Z = 1.65. We consider a sampling of 100 cases and 100 controls.

Two-locus disease model

In this set of simulation, we assume that the true risk model involves two potentially interacting causal SNPs, S₁ and S₂, residing on two separate candidate genes, G₁ and G₂, respectively. For each gene, there are seven SNPs and the third SNP is the causal SNP. After we generate genotypes, we delete the genotypes at the third SNP (the causal SNP). In our data analysis, we assume that, for each gene, genotype data are available on six marker SNPs. To simulate a realistic LD pattern among the markers within each gene (two genes assume to be in linkage equilibrium), we use haplotype data extracted from two segments on chromosome 21 in the CEU HapMap sample [Thorisson et al., 2005]. The haplotypes and their frequencies are given in Table I.

TABLE I.

Haplotypes and their frequencies used in simulating genotype data of the two candidate genes

Gene I (G₁)		Gene II (G₂)

Haplotype	Frequency	Haplotype	Frequency
0100111	0.4000	1100011	0.4833
1000011	0.2083	1111111	0.0566
1110101	0.1217	1000101	0.0166
1100100	0.0416	0101011	0.2033
1000110	0.0333	1000100	0.0500
1101000	0.0166	1000000	0.0750
1100111	0.0316	1000011	0.0166
0101111	0.0083	1100001	0.0283
1101111	0.0316	1100111	0.0250
1101101	0.0333	1101011	0.0250
1100110	0.0166	1101111	0.0100
1100000	0.0083	0111011	0.0050
1001011	0.0083	1110001	0.0050
1100101	0.0199
1110111	0.0099
1111111	0.0099

Open in a new tab

To generate disease status, we follow the models used by Chatterjee et al. [2006]: purely epistatic model, additively model, and crossover model as given in Table II. Define marginal relative risks (MRR) as

{MRR}_{1} = \frac{p (D | s_{1} \geq 1)}{p (D | s_{1} = 0)},

and

{MRR}_{2} = \frac{p (D | s_{2} \geq 1)}{p (D | s_{2} = 0)},

where D denotes the disease, s₁ and s₂ refer to the number of copies of allele 1 in the causal loci of G₁ and G₂, respectively. The values of the parameters θ₁, θ₂, and θ₁₂ are decided by the values of the prevalence (assume to be 0.1 in our simulation), MRR₁, and MRR₂. For each model, we vary the value of MRR₁ in the set {1.2, 1.5, 1.75, 2}. For the epistatic mode, MRR₂ is determined by MRR₁. For the additive model, we fix the MRR₂ to be 2.0. For the crossover model, we assume θ₁ = 0.9 and MRR₂ is determined by MRR₁. For each model, we consider a sample of 500 cases and 500 controls.

TABLE II.

Relative risk in three disease models

Model	s₁ = 0, s₂ = 0	s₁ ≥ 1, s₂ = 0	s₁ = 0, s₂ ≥ 1	s₁ ≥ 1, s₂ ≥ 1
Purely epistatic	1	1	1	θ(>1)
Additive	1	θ₁	θ₂	θ₁+θ₂−1
Crossover	1	θ₁(<1)	1	θ₁₂(> 1)

Open in a new tab

s₁ and s₂ refer to the number of copies of allele 1 in the causal loci of G₁ and G₂, respectively

RESULTS

TYPE I ERROR RATES

The estimated type I error rates for different allele frequencies, different correlation coefficients, different sample sizes, and different value of δ (the cutoff value of the PC) are given in Table III. The standard deviation for the estimated type I error is $\sqrt{0.05 \times 0.95 / 1000} \approx 0.0069$ for the nominal level of 0.05 and 1,000 replicated samples, so the 95% confidence interval is (0.0362, 0.0638). From Table III we can see that the estimated type I error rates of the proposed test are not statistically significantly different from the nominal levels in all the cases.

TABLE III.

Type I error rates of the LRT_PC at a significance level of 5%

Sample size	Allele frequency	Correlation coefficient ρ	δ = 65%	δ = 75%	δ = 85%	δ = 95%
200	0.1	0.1	0.042	0.047	0.041	0.040
		0.2	0.052	0.051	0.052	0.052
		0.4	0.050	0.047	0.055	0.053
		0.8	0.058	0.061	0.054	0.050
	0.2	0.1	0.059	0.057	0.068	0.068
		0.2	0.062	0.064	0.057	0.057
		0.4	0.063	0.063	0.063	0.062
		0.8	0.054	0.056	0.05	0.050
	0.3	0.1	0.052	0.051	0.063	0.063
		0.2	0.052	0.046	0.054	0.054
		0.4	0.052	0.064	0.06	0.058
		0.8	0.061	0.054	0.057	0.053
800	0.1	0.1	0.046	0.046	0.043	0.043
		0.2	0.055	0.055	0.046	0.046
		0.4	0.036	0.036	0.045	0.045
		0.8	0.057	0.055	0.052	0.065
	0.2	0.1	0.040	0.040	0.046	0.046
		0.2	0.044	0.044	0.054	0.054
		0.4	0.051	0.051	0.049	0.049
		0.8	0.058	0.053	0.054	0.054
	0.3	0.1	0.049	0.049	0.049	0.049
		0.2	0.040	0.040	0.045	0.045
		0.4	0.042	0.052	0.051	0.051
		0.8	0.054	0.056	0.064	0.041

Open in a new tab

Note: δ is the cutoff value of PC.

POWER COMPARISONS

First, we evaluate the effects of δ (the cutoff value of the PC) on the power. We use the haplotype effect models given in the above section to generate data sets. The results for different simulation scenarios are summarized in Figure 1. From Figure 1 we can see that the trends of the power are very similar for different values of correlation coefficients and allele frequencies. The power is increasing as the cutoff value δ increases and it reaches its maximum value at some point of δ, then the power is decreasing as the cutoff value δ increases. Although there is no universal optimal value of δ, there is a wide range of δ in which the power is higher than that of δ = 100% that is equivalent to the case of not using PC analysis. In the following discussion, we use the cutoff value δ = 85% when we compare the power of the proposed method with the other three methods.

Fig. 1 — Power comparisons of the proposed test for different cutoff values of the PC analysis under the haplotype effect model. δ is the cutoff value of PC. ρ is the correlation coefficient. MAF is the minor allele frequency. The sample consists of 100 cases and 100 controls.

To compare the power of the proposed test with the other four tests, we consider two sets of simulations. One is based on two-locus disease models and the other is based on a haplotype effect model. Under the haplotype effect model, we consider three different scenarios of the high-risk allele frequencies of 0.1, 0.2 and 0.3 and vary the value of correlation coefficient from 0 to 0.8. The results of the power comparisons are summarized in Figure 2. The proposed method LRT_PC is consistently more powerful than the single-marker test and the two LDC tests, LDC and MLDC, in all the cases. Except for the cases of correlation coefficient being 0, the LRT_PC is also more powerful than that of the T² test. The powers of the LRT_PC, LDC and MLDC have similar patterns—power is increasing as the value of the correlation coefficient ρ increases. The single-marker test and the Hotelling’s T² have similar patterns—power is increasing first then decreasing as the value of ρ increases. The pattern of the power of the single-marker test indicates that the marginal effects will be increasing first then decreasing as the value of ρ increases. This may partly explain the power pattern of the other four tests. For large values of ρ such as ρ = 0.8, the LRT_PC is much more powerful than the T² test and the single-marker test.

Fig. 2 — Power comparisons of the five tests under the haplotype effect model. The sample consists of 100 cases and 100 controls. LRT_PC: the test proposed in this article; LDC: the LD contrast test proposed by Zaykin et al. [2006]; MLDC: the modified LD contrast test proposed by Wang et al. [2007]; T²: the Hotelling’s T² test developed by Xiong et al. [2002]; SMT: the single-marker test; ρ: correlation coefficient.

The results of the power comparisons under the two-locus disease models are summarized in Figure 3. The MRR given in Table IV as well as the power of the single-marker test in Figure 3 indicate that all of the three two-locus models have strong marginal effects.Figure 3 again shows that the LRT_PC is consistently more powerful than the single-marker test and the two LDC tests, LDC and MLDC. Comparing the power of LRT_PC with that of the T² test, we can see that the two tests have similar power under the epistatic and crossover models and the T² test is slightly more powerful than the LRT_PC under the additive model. The additive model is favorable to the T² test because the T² test is derived under the assumption of the additive effect between markers. Under the additive model which assumes no interaction, the LDC and MLDC have almost no power.

Fig. 3 — Power comparisons of the five tests under three two-locus models. The sample consists of 500 cases and 500 controls. LRT_PC: the test proposed in this article; LDC: the LD contrast test proposed by Zaykin et al. [2006]; MLDC: the modified LD contrast test proposed by Wang et al. [2007]; T²: the Hotelling’s T² test developed by Xiong et al. [2002]; SMT: the single-marker test; MRR₁: marginal relative risk.

TABLE IV.

The values of the parameters under the three two-locus models when MRR₁ varies from 1.2 to 2.0

Model	MRR₁ = 1.2	MRR₁ = 1.5	MRR₁ = 1.75	MRR₁ = 2.0	Average MRR
Purely epistatic	θ = 2.55	θ = 4.88	θ = 6.82	θ = 8.76	1.855
	MRR₂ = 0.8	MRR₂ = 2.02	MRR₂ = 2.53	MRR₂ = 3.04
Additive	θ₁ = 1.31	θ₁ = 1.86	θ₁ = 2.44	θ₁ = 3.17	1.806
	θ₂ = 2.08	θ₂ = 2.22	θ₂ = 2.38	θ₂ = 2.56
	MRR₂ = 2	MRR₂ = 2	MRR₂ = 2	MRR₂ = 2
Crossover	θ₁ = 0.9	θ₁ = 0.9	θ₁ = 0.9	θ₁ = 0.9	1.927
	θ₁₂ = 2.33	θ₁₂. = 4.66	θ₁₂ = 6.60	θ₁₂ = 8.54
	MRR₂ = 1.38	MRR₂ = 2.01	MRR₂ = 2.53	MRR₂ = 3.05

Open in a new tab

Summarizing the results from the two sets of simulations, we can see that the two LDC tests (LDC and MLDC) which are designed to test interaction effects will lose power when there exist marginal effects; the T² test will lose power when the marginal effects are weak. The proposed LRT_PC test has reasonable power in all the cases. The LRT_PC is consistently more powerful than the single-marker test and the LDC tests (LDC and MLDC). Except for the additive model, the LRT_PC test is not less powerful than the T² test.

DISCUSSION

In this article we proposed a new multi-marker association test for case-control studies. The proposed test statistic compares the difference of the mean and the variance of the genotypic scores in cases and controls simultaneously. At the same time, we propose to use PC analysis to reduce the degrees of the freedom of the test to improve the power of the association test. We compared the power of the proposed test with that of the existing association tests which include the single-marker test, the Hotelling’s T² test and two recently developed LDC tests. The simulation results show that the proposed test is consistently more powerful than the single-marker test and the two LDC tests. When there are interaction effects and weak or no marginal effects, our proposed method is more powerful than the Hotelling’s T² test, when there are strong marginal effects, our proposed method has similar power with or slightly less powerful than the Hotelling’s T² test (see the three two-locus models).

All the three two-locus models in our simulation studies have strong marginal effects. We also did a simulation study under a two-locus model with weak marginal effects: considering two markers A and B with alleles a, A and b, B, the two-locus high-risk genotypes are {AAbb, AaBb, aaBB}. This model has weak marginal effects and strong joint effects. Our simulation results showed that under this model the proposed method is the most powerful one while the single-marker test and the Hotelling’s T² test are almost no power at all.

In the PC analysis used in this article, we first find PC directions (eigenvectors of the variance-covariance matrix) using the control sample only. Then, we project the original numerical codes of multi-marker genotypes in cases and controls to the PC directions, and get the PC codes for the multi-marker genotypes. Based on the PC codes, we calculate the test statistic LRT_PC. Using this PC analysis, we need to recalculate PC directions and PC codes in each permutation when we use permutation procedure to evacuate the P-value of the test. Thus, this PC analysis makes the permutation procedure computationally intensive. An alternative way to do the PC analysis is that we find PC directions using pooled sample (cases and controls together) only. In this way, we do not need to recalculate PC directions and PC codes in each permutation and the permutation procedure will be much faster. We denote the corresponding LRT_PC by using this PC analysis as LRT_PC-Pool. Our simulation studies showed that the power of the LRT_PC-Pool test is only slightly less powerful than the LRT_PC test (results are not shown). Thus, we suggest using this PC analysis whenever computational time becomes a concern.

One remaining question in the proposed LRT_PC is choosing the cutoff value δ in the PC analysis. Our simulation studies show that there is no universal optimal value for δ. However, we feel that the values around 90% are good choices. Although our simulation results show that the optimal value of δ may be less then 50%, our experience shows that we are more likely to miss rare disease associated alleles when we use small value of δ. In general, we need further investigation on choosing the optimal value of δ.

Our method cannot be applied directly to genome-wide association studies. However, we can apply the proposed method to genome-wide association studies by using a sliding window approach. We have done a simulation study to compare the power of the proposed test with the two LDC tests, the Hotelling’s T² test and the single-marker test by using a sliding window approach. Our simulation study showed that the pattern of the power comparison by using a sliding window approach is similar to that of the other simulation results. However, the power of the five tests is affected by the window sizes or the number of markers in each window. The optimal choice of the window size needs further investigation.

Acknowledgments

Contract grant sponsor: National Institutes of Health (NIH) grants; Contract grant numbers: R01 GM069940, R03 HG 003613, R01 HG003054, R03 AG024491.; Contract grant sponsor: Overseas-Returned Scholars Foundation of Department of Education of Heilongjiang Province; Contract grant numbers: 1152HZ01.

REFERENCES

Chapman JM, Cooper JD, Todd JA, Clayton DG. Detecting disease association due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered. 2003;56:18–31. doi: 10.1159/000073729. [DOI] [PubMed] [Google Scholar]
Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79:1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark AG, Weiss KM, Nickerson DA, Taylor SL, Buchanan A, Stengard J, Salomaa V, Vartiainen E, Perola M, Boerwinkle E, Sing CF. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am J Hum Genet. 1998;63:595–612. doi: 10.1086/301977. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drysdale CM, McGraw DW, Stack CB, Stephens JC, Judson RS, Nandabalan K, Arnold K, Ruano G, Liggett SB. Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc Natl Acad Sci USA. 2000;97:10483–10488. doi: 10.1073/pnas.97.19.10483. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan R, Knapp M. Genome association studies of complex diseases by case-control designs. Am J Hum Genet. 2003;72:850–868. doi: 10.1086/373966. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hayes MG, Roe CA, Ng M, Bosque-Plata L, Tsuchiya T, Wu X, Ambrose NG, Yairi E, Cook EH, Cox NJ. Case-control differences in linkage disequilibrium as a tool for gene mapping in complex diseases. Am J Hum Genet Suppl. 2004;54:A146. [Google Scholar]
Hollox EJ, Poulter M, Zvarik M, Ferak V, Krause A, Jenkins T, Saha N, Kozlov AI, Swallow DM. Lactase haplotype diversity in the old world. Am J Hum Genet. 2001;68:160–172. doi: 10.1086/316924. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nielsen DM, Ehm MG, Zaykin DV, Weir BS. Effect of two and three-locus linkage disequilibrium on the power to detect marker/phenotype associations. Genetics. 2004;168:1029–1040. doi: 10.1534/genetics.103.022335. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score test for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002;70:425–434. doi: 10.1086/338688. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sha Q, Dong J, Jiang R, Zhang S. Tests of association between quantitative traits and haplotypes in a reduced-dimensional space. Ann Hum Genet. 2005;69:715–732. doi: 10.1111/j.1529-8817.2005.00216.x. [DOI] [PubMed] [Google Scholar]
Sha Q, Chen HS, Zhang S. New association tests based on haplotype similarity. Genetic Epidemiol. 2007;31:577–593. doi: 10.1002/gepi.20230. [DOI] [PubMed] [Google Scholar]
Sham P. Statistics in Human Genetics. New York: Oxford University Press, Inc; 1998. [Google Scholar]
Tavtigian SV, Simard J, Teng DH, Abtin V, Baumgard M, Beck A, Camp NJ, Carillo AR, Chen Y, Dayananth P, Desrochers M, Dumont M, Farnham JM, Frank D, Frye C, Ghaffari S, Gupte JS, Hu R, Iliev D, Janecki T, Kort EM, Laity KE, Leavitt A, Leblanc G, McArthur-Morrison J, Pederson A, Penn B, Peterson KT, Reid JE, Richards S, Schroeder M, Smith R, Snyder SC, Swedlund B, Swensen J, Thomas A, TranchantM,Woodland A, Labrie F, Skolnick MH, Neuhausen S, Rommens J, Cannon-Albright LA. A candidate prostate cancer susceptibility gene at chromosome 17p. Nat Genet. 2001;27:172–180. doi: 10.1038/84808. [DOI] [PubMed] [Google Scholar]
Thorisson GA, Smith AV, Krishnan L, Stein LD. The international hapMap project web site. Genome Res. 2005;15:1591–1593. doi: 10.1101/gr.4413105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang T, Zhu X, Elston RC. Improving power in contrasting linkage-disequilibrium patterns between cases and controls. Am J Hum Genet. 2007;80:911–920. doi: 10.1086/516794. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wallace C, Chapman JM, Clayton DJ. Score test for selective genotyping. Am J Hum Genet. 2006;78:498–504. doi: 10.1086/500562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiong M, Zhao J, Berwinkle E. Generalized t2 test for genome association studies. Am J Hum Genet. 2002;70:1257–1268. doi: 10.1086/340392. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered. 2002;53:79–91. doi: 10.1159/000057986. [DOI] [PubMed] [Google Scholar]
Zaykin DV, Meng Z, Ehm MG. Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet. 2006;78:737–746. doi: 10.1086/503710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao H, Zhang S, Merikangas KR, et al. Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet. 2000;67:936–946. doi: 10.1086/303073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Chapman JM, Cooper JD, Todd JA, Clayton DG. Detecting disease association due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered. 2003;56:18–31. doi: 10.1159/000073729. [DOI] [PubMed] [Google Scholar]

[R2] Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79:1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Clark AG, Weiss KM, Nickerson DA, Taylor SL, Buchanan A, Stengard J, Salomaa V, Vartiainen E, Perola M, Boerwinkle E, Sing CF. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am J Hum Genet. 1998;63:595–612. doi: 10.1086/301977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Drysdale CM, McGraw DW, Stack CB, Stephens JC, Judson RS, Nandabalan K, Arnold K, Ruano G, Liggett SB. Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc Natl Acad Sci USA. 2000;97:10483–10488. doi: 10.1073/pnas.97.19.10483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Fan R, Knapp M. Genome association studies of complex diseases by case-control designs. Am J Hum Genet. 2003;72:850–868. doi: 10.1086/373966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Hayes MG, Roe CA, Ng M, Bosque-Plata L, Tsuchiya T, Wu X, Ambrose NG, Yairi E, Cook EH, Cox NJ. Case-control differences in linkage disequilibrium as a tool for gene mapping in complex diseases. Am J Hum Genet Suppl. 2004;54:A146. [Google Scholar]

[R7] Hollox EJ, Poulter M, Zvarik M, Ferak V, Krause A, Jenkins T, Saha N, Kozlov AI, Swallow DM. Lactase haplotype diversity in the old world. Am J Hum Genet. 2001;68:160–172. doi: 10.1086/316924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Nielsen DM, Ehm MG, Zaykin DV, Weir BS. Effect of two and three-locus linkage disequilibrium on the power to detect marker/phenotype associations. Genetics. 2004;168:1029–1040. doi: 10.1534/genetics.103.022335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score test for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002;70:425–434. doi: 10.1086/338688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Sha Q, Dong J, Jiang R, Zhang S. Tests of association between quantitative traits and haplotypes in a reduced-dimensional space. Ann Hum Genet. 2005;69:715–732. doi: 10.1111/j.1529-8817.2005.00216.x. [DOI] [PubMed] [Google Scholar]

[R11] Sha Q, Chen HS, Zhang S. New association tests based on haplotype similarity. Genetic Epidemiol. 2007;31:577–593. doi: 10.1002/gepi.20230. [DOI] [PubMed] [Google Scholar]

[R12] Sham P. Statistics in Human Genetics. New York: Oxford University Press, Inc; 1998. [Google Scholar]

[R13] Tavtigian SV, Simard J, Teng DH, Abtin V, Baumgard M, Beck A, Camp NJ, Carillo AR, Chen Y, Dayananth P, Desrochers M, Dumont M, Farnham JM, Frank D, Frye C, Ghaffari S, Gupte JS, Hu R, Iliev D, Janecki T, Kort EM, Laity KE, Leavitt A, Leblanc G, McArthur-Morrison J, Pederson A, Penn B, Peterson KT, Reid JE, Richards S, Schroeder M, Smith R, Snyder SC, Swedlund B, Swensen J, Thomas A, TranchantM,Woodland A, Labrie F, Skolnick MH, Neuhausen S, Rommens J, Cannon-Albright LA. A candidate prostate cancer susceptibility gene at chromosome 17p. Nat Genet. 2001;27:172–180. doi: 10.1038/84808. [DOI] [PubMed] [Google Scholar]

[R14] Thorisson GA, Smith AV, Krishnan L, Stein LD. The international hapMap project web site. Genome Res. 2005;15:1591–1593. doi: 10.1101/gr.4413105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Wang T, Zhu X, Elston RC. Improving power in contrasting linkage-disequilibrium patterns between cases and controls. Am J Hum Genet. 2007;80:911–920. doi: 10.1086/516794. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Wallace C, Chapman JM, Clayton DJ. Score test for selective genotyping. Am J Hum Genet. 2006;78:498–504. doi: 10.1086/500562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Xiong M, Zhao J, Berwinkle E. Generalized t2 test for genome association studies. Am J Hum Genet. 2002;70:1257–1268. doi: 10.1086/340392. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered. 2002;53:79–91. doi: 10.1159/000057986. [DOI] [PubMed] [Google Scholar]

[R19] Zaykin DV, Meng Z, Ehm MG. Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet. 2006;78:737–746. doi: 10.1086/503710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Zhao H, Zhang S, Merikangas KR, et al. Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet. 2000;67:936–946. doi: 10.1086/303073. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A New Association Test to Test Multiple-Marker Association

Xuexia Wang

Shuanglin Zhang

Qiuying Sha

Abstract

INTRODUCTION

METHOD