Robust Tests for Single-marker Analysis in Case-Control Genetic Association Studies

Qizhai Li; Gang Zheng; Xueying Liang; Kai Yu

doi:10.1111/j.1469-1809.2009.00506.x

. Author manuscript; available in PMC: 2017 Dec 25.

Published in final edited form as: Ann Hum Genet. 2009 Mar;73(2):245–252. doi: 10.1111/j.1469-1809.2009.00506.x

Robust Tests for Single-marker Analysis in Case-Control Genetic Association Studies

Qizhai Li ^1,², Gang Zheng ³, Xueying Liang ¹, Kai Yu ^1,^*

PMCID: PMC5742554 NIHMSID: NIHMS95004 PMID: 19208106

Abstract

Choosing an appropriate single-marker association test is critical to the success of case-control genetic association studies. An ideal single-marker analysis should have robust performance across a wide range of potential disease risk models. MAX was designed specifically to achieve such robustness. In this work, we derived the power calculation formula for MAX and conducted a comprehensive power comparison between MAX and two other commonly used single-marker tests, the one-degree-of-freedom (1-df) Cochran-Armitage trend test and the 2-df Pearson Chi-squared test. We used a single-marker disease risk model and a two-marker haplotype risk model to explore the performances of the above three tests. We found that each test has its own “sweet” spots. Among the three tests considered, MAX appears to have the most robust performance.

Keywords: association, chi-square, genetic model, MAX, power, robustness

1. Introduction

In case-control genetic association studies (CCGAS), single-marker analysis, which tests the association between the outcome and an individual SNP, is often used. The following two tests are usually applied in single-marker analysis when there are no other covariates to be adjusted for: the 1-degree-of-freedom (1-df) Cochran-Armitage trend test (CATT) (Klein et al. 2005; WTCCC, 2007) thatcorresponds to the score test derived from an additive disease model (CATTA) (on the logit scale), and the 2-df Pearson Chi-squared test (Chi-2df) that compares the 3-category genotype frequencies between the case and control groups (Yeager et al. 2007). In addition to these two tests, MAX, which takes the maximum of three CATTs derived under dominant, recessive, and additive models as the test statistic, has also been proposed for the association test (Sladek et al. 2007). A detailed definition of each test will be given below. When there are other covariates to be adjusted for, the test corresponding to each of those above can be derived from the standard logistic regression model that models the effect of the genotype coded according to the assumed disease model, with adjustment for the covariates. An important common feature of CATTA, Chi-2df, and MAX is that their 2-sided testing results are independent of the choice of the risk allele.

The significance level (p-value) of CATTA and Chi-2df can be obtained easily according to their theoretical asymptotic distributions. The calculation of the p-value for MAX is a little bit more involved and requires a multiple-integration or permutation procedure (Conneely & Boehnke, 2007; Li et al. 2008a). Although all three tests have been used in recent genome-wide association studies (GWAS) (e.g., Klein et al. 2005; Hunter et al. 2007; Sladek et al. 2007; WTCCC, 2007; Yeager et al. 2007), there is no consensus as to which one is generally preferrable, and also there are few discussions in the literature of the analytic power of MAX or of comparisons of MAX with the other tests. In this work, we derive an analytic formula for calculating the power of MAX. The existence of a power calculation formula for MAX, together with the power formulas that already exist for CATTA and Chi-2df, enables us to conduct a comprehensive power comparison among the three tests.

Comparisons of various single-marker analyses have been reported by several groups (Freidlin et al., 2002, Guedj et al. 2006; Kuo & Feingold, 2008). The uniqueness of this work is to add the promising MAX test to the comparison. In particular we compare the asymptotic powers of these tests under a broad range of single-locus and multi-locus disease models. The comparison results should shed more light on the relative merits of the three considered tests under various disease risk models and provide guidance for the analysis of future CCGAS.

2. Test statistics definition

We focus first on situations where there are no covariates to be adjusted for. Assume that there are r cases and s controls in a CCGAS, and that there are two alleles, G and g, at a given SNP locus with the possible genotypes gg, Gg, and GG. The notations for genotype counts in the case and control groups are given in Table 1. Based upon Table 1, the general form of the CATT can be written as

Z_{x} = \frac{\sqrt{n} \sum_{i = 0}^{2} x_{i} (r_{i} / r - s_{i} / s)}{\sqrt{\frac{n}{r} [\sum_{i = 0}^{2} x_{i}^{2} θ_{i} - {(\sum_{i = 0}^{2} x_{i} θ_{i})}^{2}] + \frac{n}{s} [\sum_{i = 0}^{2} x_{i}^{2} ϑ_{i} - {(\sum_{i = 0}^{2} x_{i} ϑ_{i})}^{2}]}},

(1)

where x=(x₀,x₁,x₂)'is a genotype score vector for the coding of genotypes gg, Gg, and GG, and θ_i=ϑ_i=(r_i+s_i)/(r+s), i=0,1,2 . The genotype score vector x is chosen by the investigator. It should be pointed out that the CATT given by (1) is equivalent to the score statistic testing for the null hypothesis H₀: β=0 derived from the following standard logistic regression that models the effect of genotypes represented by x,

log \frac{Pr (case | x)}{1 - Pr (case | x)} = α + β x .

(2)

Based upon (2), the three commonly assumed genetic models--recessive, additive, and dominant--correspond to the following assignments of the genotype score vector x:R=(R₀,R₁,R₂)'=(0,0,1)',A=(A₀,A₁,A₂)'=(0,0.5,1)',and D=(D₀,D₁,D₂)'=(0,1,1)', respectively. Among the three CATT tests, Z_x=A , called CATTA, which is derived according to an additive model, is usually preferred over Z_{x=_D} and Z_x=R, as it does not rely on the assumption of a high-risk allele (assuming a two-sided test is performed), and thus this is the version of CATT that is generally used in CCGAS. The p-value for Z_x can be obtained according to the standard normal distribution.

Table 1.

Notation for genotype frequencies

	gg	Gg	GG	Total
Case	r₀	r₁	r₂	r
Control	s₀	s₁	s₂	s

Total	n₀	n₁	n₂	n

Open in a new tab

If the true underlying disease model is known, the CATT test Z_x is the most efficient. But in reality, the true disease model is unknown. For a more robust test that enjoys a good performance over a wide range of disease models, the following test statistic, called MAX, has been proposed (Freidlin et al. 2002; Sladek et al. 2007; Li et al. 2008a, 2008b):

Z_{MAX} = max (| Z_{x = R} |, | Z_{x = A} |, | Z_{x = D} |) .

There are several ways to evaluate the significance level of MAX. For example, the multiple-integration procedure, which is available in R, can be used (Conneely & Boehnke, 2007), and it is computationally feasible in the context of GWAS. A more computationally challenging approach is through a permutation procedure (Sladek et al. 2007). Li et al. (2008a) derived an analytic upper bound that is reasonably accurate for small p-values.

Another robust test is the 2-df Chi-squared test . Using the notations listed in Table 1, we can define the Chi-2df test as

χ_{2}^{2} = \sum_{i = 0}^{2} [{(r_{i} - {rn}_{i} / n)}^{2} / ({rn}_{i} / n) + {(s_{i} - {sn}_{i} / n)}^{2} / ({sn}_{i} / n)] .

(3)

The significance level of the Chi-2df test can be evaluated through the 2-df Chi-squared $(χ_{2}^{2})$ distribution.

3. The formula for power calculation

Under a given disease model, we denote the expected genotype frequencies of (gg,Gg,GG) for cases and controls as (p₀,p₁,p₂) and (q₀ ,q₁,q₂) , respectively. The analytic power calculation for CATTA Z_A can be found in Freidlin et al. (2002) and Pfeiffer & Gail (2003).

The power calculation for the Chi-2df test is also straightforward. Under the significance level α , the reject region is [η,∞) , where η is the 1−α quantile of the 2-df Chi-squared distribution. The Chi-2df test statistic (defined by (3)) in general follows a non-central 2-df Chi-squared distribution (Edwards et al. 2005) under a given disease model, with the non-centrality parameter $δ = rs \sum_{i = 0}^{2} {(p_{i} - q_{i})}^{2} / ({rp}_{i} + {sq}_{i})$ , so the power for the Chi-2df test is

β_{Chi - 2 df} = \int_{η}^{\infty} \sum_{k = 0}^{\infty} \frac{{(δ / 2)}^{k} u^{k} e^{- \frac{δ + u}{2}}}{2^{k + 1} {(k!)}^{2}} du

Finally, we derive the power calculation formula for MAX. We denote the reject region of MAX under the significance level α by[γ ,∞), where γ (≥0) satisfies Pr_null (Z_MAX ≥γ)=α. Since (Z_R,Z_A,Z_D)' follows a multivariate normal distribution under the null hypothesis with the mean vector of (0,0,0)'and the covariance matrix Δ given by Freidlin et al (2002), we can obtain the threshold γ by solving the following equation:

\int_{- γ}^{γ} \int_{- γ}^{γ} \int_{- γ}^{γ} \frac{1}{{(2 π)}^{3 / 2} \sqrt{det (Δ)}} exp {- \frac{1}{2} (v_{R}, v_{A}, v_{D}) Δ^{- 1} {(v_{R}, v_{A}, v_{D})}^{'}} {dv}_{R} {dv}_{A} {dv}_{D} = 1 - α .

This can be accomplished easily using an existing function in the R package.

Under the disease model with the expected genotype frequencies (p₀,p₁,p₂) and (q₀,q₁,q₂) in cases and controls, respectively, (Z_R,Z_A,Z_D)'asymptotically follows a multinormal distribution with mean vector μ=(μ_R,μ_A,μ_D)'and covariance matrix Λ. The mean vector is given by

μ_{x} = \frac{\sqrt{n} \sum_{i = 0}^{2} x_{i} (p_{i} - q_{i})}{\sqrt{(\frac{n}{r} + \frac{n}{s}) [\sum_{i = 0}^{2} x_{i}^{2} k_{i} + {(\sum_{i = 0}^{2} x_{i} k_{i})}^{2}]}},

(4)

with the score vector x chosen as (0, 0, 1), (0, 0.5, 1), and (0, 1, 1) for μ_R ,μ_A, and μ_D, respectively, and with $k_{i} = \frac{{rp}_{i} + {sq}_{i}}{r + s}, i = 0, 1, 2$ . The definition for the covariance matrix Λ is more complicated and is presented in the Appendix, as well as its detailed derivations.

The power of the MAX test for the alternative hypothesis H₁ can be written as

\begin{array}{l} β_{MAX} = & 1 - {Pr}_{H_{1}} (| Z_{R} | < γ, | Z_{A} | < γ, | Z_{D} | < γ) \\ = 1 - \int_{- γ}^{γ} \int_{- γ}^{γ} \int_{- γ}^{γ} \frac{1}{{(2 π)}^{3 / 2} \sqrt{det (Λ)}} exp {- \frac{1}{2} (v - μ)^{'} Λ^{- 1} (v - μ)} {dv}_{R} {dv}_{A} {dv}_{D}, \end{array}

(5)

where v=(v_R,v_A,v_D)'and Λis the covariance matrix.

4. Power comparison

We assume that the case and control sample sizes are r=s=1,000 . We first conduct the comparison under a single-marker disease risk model. We let the minor allele frequency (MAF) f for a particular SNP in the study population be in the range of 5–50%. For the MAF=f, we let the genotype frequencies of (gg, Gg, GG) in the control population, (q₀ ,q₁,q₂) , have the values (q₀ ,q₁,q₂)=((1−f)²,2f(1−_f), f²). This is reasonable for the study of a rare disease in a source population where Hardy-Weinberg equilibrium holds. Let the odds ratios (ORs) for having 1 copy and 2 copies of the high-risk alleles be R₁ and R₂ , respectively. We have $R_{2} = R_{1}^{2} > 1$ for an additive model (in the logit scale), R₂ =R₁>1 for a dominant model, and R₂ >R₁=1 for a recessive model. Given (R₁,R₂), we know that the genotype frequencies of (gg,Gg,GG) in the case population (p₀ ,p₁,p₂) are (q₀ ,q₁R₁,q₂R₂)/(q₀+q₁R₁+q₂R₂).

In addition to the single-marker disease risk model, we compare the power of the three single-marker tests under the following 2-marker haplotype risk model. Suppose the disease risk is conferred by haplotypes consisting of two linked markers, with marker #1 having allele types B and b, and with marker #2 having allele types C and c. We designate the haplotype BC as the high-risk variant (corresponding to the high-risk allele in the single-marker risk model). As with the single-marker risk model, we can define the haplotype risk model as dominant, recessive, and additive. For example, if R₁ and R₂ denote the ORs for having one copy and two copies of the high-risk haplotype, respectively, we have $R_{2} = R_{1}^{2}$ for the additive haplotype risk model. To simplify the power comparison setup, we let p₁ be the BC haplotype frequency in the study population and assume the other three 2-marker haplotypes have the same haplotype frequency. We further assume the independence of the two haplotypes within a subject in the study population. In the Appendix, we provide the formula for calculation of (p₀,p₁,p₂) and (q₀,q₁,q₁) , the genotype frequencies of (bb, Bb, BB) in the case and the control populations, respectively.

Fig. 1 shows the power curves of the above-considered association tests under each of three commonly assumed single-marker risk models (additive, dominant, and recessive) at a significance level of 0.05. From Fig. 1, we can see that MAX is always more powerful than Chi-2df, and in some cases it is associated with up to a 5% power increase. Comparing CATTA with MAX, we see that CATTA is slightly more powerful than MAX under the additive model, but in most cases the advantage is negligible. Under the recessive model, MAX (as well as the Chi-2df) is noticeably superior to CATTA. Under the dominant model, it is interesting to see that neither CATTA nor MAX dominates the other. CATTA is more favorable when the risk allele is relatively rare, while MAX becomes more attractive as the risk allele frequency gets larger.

Besides the three commonly used disease models, we also compared the power under a single-marker risk model with all possible combinations of two odds ratios R₁ and R₂ , with each ranging from 1 to 1.5. Fig. 2 summarizes the power comparison results. Clearly, there is no test that can outperform the others in all of the single-marker risk models considered. When the risk allele is relatively rare (say, MAF less than 0.2), all three tests have comparable power under various single-marker models, although CATTA outperforms the others in most of the R₁,R₂ region. As the risk allele gets more common, CATTA becomes less powerful than both MAX and Chi-2df under the single-marker risk model when R₁>R₂, although whether this kind of disease risk model is reasonable is debatable. MAX and Chi-2df have similar performances under all the considered choices of risk models and MAFs, with MAX performing more favorably under the risk model when R₁<R₂, and less favorably when R₁>R₂.

Fig. 2 — Power of CATTA (red), Chi-2df (blue), and MAX (black) under the significance level of 0.05. The number of cases and controls is equal to 1,000. R₁ and R₂ are the odds ratios of 1 copy and 2 copies of the high-risk alleles, respectively. MAF is the minor allele frequency.

Power comparison results under the 2-marker haplotype risk models are given in Fig. 3. Similar to what we observed in Fig. 1, MAX appears to have the most robust performance. Although MAX is slightly less powerful than CATTA under the additive haplotype risk model, it has a noticeable power advantage over CATTA (more than 10% higher) under the dominant haplotype risk model. Also, from Fig. 3, we notice that MAX is consistently better than the Chi-2df, although the percentage increase in power is limited.

5. Discussion

Choosing the right single-marker analysis is a critical step for the success of CCGAS. Because of the uncertainty about the true underlying disease risk model, robust tests that have good performances under a wide range of disease models are preferred over those that are too sensitive to the model assumptions. MAX was designed specifically to achieve such robustness. Compared with the commonly used CATTA and Chi-2df, the power of MAX is less understood even though its type I error rate has been thoroughly investigated by Li et al. (2008). In this work, we derived the power calculation formula for MAX. Based on this power calculation formula, as well as the ones already existing for CATTA and Chi-2df, we conducted a comprehensive power comparison among the three tests. Not surprisingly, we found that each test has its own “sweet” spots. But MAX appears to have the most robust performances when the underlying genetic models are recessive, additive, or dominant. Under various overdominant models, the Chi-2df and MAX have very similar performance, with the power of the Chi-2df slightly higher than that of the MAX.

In order to assess the statistical significance of MAX, Sladek et al. (2007) used a permutation procedure to estimate p-values of MAX for each SNP. In order to ensure a reliable estimation for any p-value falling below the level of 10⁻⁶, we would need to carry out more than 10⁷ permutation steps. It would be time-consuming and computationally challenging for a large-scale CCGAS. Alternatively, multiple integration (Conneely & Boehnke, 2007) and an efficient approximation method (Li et al., 2008) have been proposed to evaluate the statistical significance of MAX. For the integration procedure (Conneely & Boehnke, 2007), it would be possible to use the R package “mvtnorm”, which could be freely downloaded from the website http://cran.r-project.org/. The efficient approximation approach (Li et al., 2008), which is based on a one-dimensional integral, is user-friendly and can be implemented in many software packages, such as C, C++, R, Matlab, and SAS.

Since the MAX test did not perform as well as the chi-squared test under the overdominant model (R₁>R₂) , we also considered its extension, called MAX4, which is the maximum of four trend tests under four genetic models-- recessive, additive, dominant, and overdominant--with scores (0,0,1), (0,0.5,1), (0,1,1), and (0,1,0), respectively. We conducted some simulation studies to compare the asymptotic power of the MAX4 with that of CATTA, Chi-2df, and MAX. Table 3 showed the results. It can be seen from Table 3 that the MAX4 has the best performance among the four considered tests under the overdominant models, but it is slightly less powerful than the MAX test under the other three models considered. The choice between MAX and MAX4 depends on the likelihood of the overdominant model in real applications.

Table 3.

Power comparison under four genetic models with 1,000 cases and 1,000 controls at the significance level of 0.05

		Minor allele frequency

	(R₁,R₂)	0.1	0.2	0.3	0.4	0.5
Recessive model	(1,1.3)
	CATTA	0.059	0.114	0.237	0.411	0.584
	Chi-2df	0.080	0.177	0.334	0.506	0.645
	MAX	0.080	0.18	0.345	0.526	0.667
	MAX4	0.078	0.17	0.325	0.501	0.648
Additive model	(1.2,1.44)
	CATTA	0.433	0.658	0.766	0.812	0.820
	Chi-2df	0.341	0.552	0.669	0.722	0.732
	MAX	0.368	0.595	0.714	0.766	0.776
	MAX4	0.362	0.574	0.684	0.733	0.742
Dominant model	(1.3,1.3)
	CATTA	0.643	0.767	0.750	0.656	0.503
	Chi-2df	0.562	0.727	0.746	0.693	0.585
	MAX	0.582	0.749	0.767	0.714	0.604
	MAX4	0.591	0.752	0.763	0.702	0.584
Overdominant model	(1.3,1.1)
	CATTA	0.584	0.627	0.491	0.277	0.106
	Chi-2df	0.544	0.695	0.710	0.668	0.600
	MAX	0.542	0.669	0.650	0.560	0.443
	MAX4	0.566	0.713	0.721	0.670	0.588

Open in a new tab

When there are covariates to be adjusted for, the corresponding MAX test can be derived from the logistic regression model. Li et al. (2008a) suggested a procedure for evaluating the p-value of the covariate-adjusted MAX test. Although the power comparison was conducted without any adjustment for covariates, we expect similar conclusions will still hold when covariate effects need to be adjusted for.

ACKNOWLEDGEMENTS

We would like to thank the editor and two anonymous reviewers for their insightful comments, which improved our presentation. We also thank B.J. Stone for her valuable help. K Yu, X Liang, and Q Li are supported by the Intramural Program of the National Institutes of Health. Q Li is supported in part by the Knowledge Innovation Program of the Chinese Academy of Sciences, Nos. 30465W0 and 30475V0.

APPENDIX

APPENDIX A: The Covariance between Z_x and Z_y

Theorem

Let x=(x₀,x₁,x₂) ' and y=(y₀,y₁,y₂) ' be any two score vectors. The asymptotic covariance between Z_x and Z_y can be written as

{cov}_{H_{1}} (Z_{x}, Z_{y}) = \frac{n}{ω_{x} ω_{y}} x^{'} [Δ_{r} / r^{2} + Δ_{s} / s^{2}] y

where $ω_{x} = \sqrt{(\frac{n}{r} + \frac{n}{s}) [\sum_{i = 0}^{2} x_{i}^{2} k_{i} - {(\sum_{i = 0}^{2} x_{i} k_{i})}^{2}]}, with k_{i} = \frac{{rp}_{i} + {sq}_{i}}{r + s} for i = 0, 1, 2$ ,

Δ_{r} = (\begin{matrix} {rp}_{0} (1 - p_{0}) & - {rp}_{0} p_{1} & - {rp}_{0} p_{2} \\ - {rp}_{0} p_{1} & {rp}_{1} (1 - p_{1}) & - {rp}_{1} p_{2} \\ - {rp}_{0} p_{2} & - {rp}_{1} p_{1} & {rp}_{2} (1 - p_{2}) \end{matrix}), and Δ_{s} = (\begin{matrix} {sq}_{0} (1 - q_{0}) & - {sq}_{0} q_{1} & - {sq}_{0} q_{2} \\ - {sq}_{0} q_{1} & {sq}_{1} (1 - q_{1}) & - {sq}_{1} q_{2} \\ - {sq}_{0} q_{2} & - {sq}_{1} q_{2} & {sq}_{2} (1 - q_{2}) \end{matrix}) .

Proof

Let r⃗=(r₀,r₁,r₂) ' and s⃗=(s₀,s₁,s₂)'. Then we have

\begin{array}{l} \begin{array}{l} cov (Z_{x}, Z_{y}) = & \frac{n}{ω_{x} ω_{y}} cov [\sum_{i = 0}^{2} x_{i} (r_{i} / r - s_{i} / s), \sum_{i = 0}^{2} y_{i} (r_{i} / r - s_{i} / s)] \\ = \frac{n}{ω_{x} ω_{y}} cov (x^{'} \vec{r} / r - x^{'} \vec{s} / s, y^{'} \vec{r} / r - y^{'} \vec{s} / s) \\ = \frac{n}{ω_{x} ω_{y}} x^{'} [cov (\vec{r}, \vec{r}) / r^{2} + cov (\vec{s}, \vec{s}) / s^{2}] y \\ = \frac{n}{ω_{x} ω_{y}} x^{'} [Δ_{r} / r^{2} + Δ_{s} / s^{2}] y . \end{array} \end{array}

APPENDIX B: Two-marker Joint Genotype Frequencies under the Haplotype Risk Model

Suppose the disease risk is conferred by haplotypes consisting of two linked markers, with marker #1 having allele types B and b and marker #2 having allele types C and c. We designate the haplotype BC as the high-risk variant. Denote the haplotype frequencies for BC, Bc, bC, and bc as p₁ , p₂ ,p₃ , and p₄ , respectively. Let R₁ and R₂ denote the ORs for having one copy and two copies of the high-risk haplotype, respectively. We assume that HWE holds in the control group. Table 2 gives the joint genotype frequencies. From the table, we can see that the frequencies of BB, Bb, and bb in the control group at marker #1 are (p₁+p₂)²,2(p₂+p₂)(p₃+p₄) , and (p₃+p₄)², respectively; the frequencies of BB, Bb, and bb in the case group at marker #2 are $(R_{2} p_{1}^{2} + 2 R_{1} p_{1} p_{2} + p_{2}^{2}) / ξ, 2 (R_{1} p_{1} p_{3} + R_{1} p_{1} p_{4} + R_{1} p_{2} p_{3} + p_{2} p_{4}) / ξ$ , and (p₃+p₄)²/ξ respectively, where $ξ = R_{2} p_{1}^{2} + 2 R_{1} p_{1} p_{2} + p_{2}^{2} + 2 {(R_{1} p_{1} p_{3} + R_{1} p_{1} p_{4} + R_{1} p_{2} p_{3} + p_{2} p_{4}) + (p_{3} + p_{4})}^{2}$ .

Table 2.

Two-marker joint genotype frequencies

Genotype pair

Frequency

BBCC

p_{1}^{2}

BBCc

2p₁p₂

BBcc

p_{2}^{2}

BbCC

2p₁p₃

BbCc

2(p₁p₄+p₂p₃)

Bbcc

2p₂p₄

bbCC

p_{3}^{2}

bbCc

2p₃p₄

bbcc

p_{4}^{2}

Open in a new tab

REFERENCES

Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D. Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet. 2005;8:1–18. doi: 10.1186/1471-2156-6-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
Guedj M, Della-Chiesa E, Picard F, Nuel G. Computing power in case-control association studies through the use of quadratic approximations: application to meta-statistics. Ann Hum Genet. 2006;71:262–270. doi: 10.1111/j.1469-1809.2006.00316.x. [DOI] [PubMed] [Google Scholar]
Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007;39:870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li Q, Zheng G, Li Z, Yu K. Efficient approximation of p-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet. 2008a;72:397–406. doi: 10.1111/j.1469-1809.2008.00437.x. [DOI] [PubMed] [Google Scholar]
Li Q, Yu K, Li Z, Zheng G. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Hum Genet. 2008b;123:617–623. doi: 10.1007/s00439-008-0514-8. [DOI] [PubMed] [Google Scholar]
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in aged-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuo CL, Feingold E. What’s the best statistic for a simple test of genetic association in a case-control study?. Joint Statistical Meetings, Biometrics Section; August 2–7; 2008. [DOI] [PubMed] [Google Scholar]
Pfeiffer RM, Gail MH. Sample size calculations for population- and family-based case-control association studies on marker genotypes. Genet Epidemiol. 2003;25:136–148. doi: 10.1002/gepi.10245. [DOI] [PubMed] [Google Scholar]
Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885. doi: 10.1038/nature05616. [DOI] [PubMed] [Google Scholar]
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. doi: 10.1038/ng2022. [DOI] [PubMed] [Google Scholar]

[R1] Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D. Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet. 2005;8:1–18. doi: 10.1186/1471-2156-6-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]

[R4] Guedj M, Della-Chiesa E, Picard F, Nuel G. Computing power in case-control association studies through the use of quadratic approximations: application to meta-statistics. Ann Hum Genet. 2006;71:262–270. doi: 10.1111/j.1469-1809.2006.00316.x. [DOI] [PubMed] [Google Scholar]

[R5] Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007;39:870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Li Q, Zheng G, Li Z, Yu K. Efficient approximation of p-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet. 2008a;72:397–406. doi: 10.1111/j.1469-1809.2008.00437.x. [DOI] [PubMed] [Google Scholar]

[R7] Li Q, Yu K, Li Z, Zheng G. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Hum Genet. 2008b;123:617–623. doi: 10.1007/s00439-008-0514-8. [DOI] [PubMed] [Google Scholar]

[R8] Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in aged-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Kuo CL, Feingold E. What’s the best statistic for a simple test of genetic association in a case-control study?. Joint Statistical Meetings, Biometrics Section; August 2–7; 2008. [DOI] [PubMed] [Google Scholar]

[R10] Pfeiffer RM, Gail MH. Sample size calculations for population- and family-based case-control association studies on marker genotypes. Genet Epidemiol. 2003;25:136–148. doi: 10.1002/gepi.10245. [DOI] [PubMed] [Google Scholar]

[R11] Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885. doi: 10.1038/nature05616. [DOI] [PubMed] [Google Scholar]

[R12] The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. doi: 10.1038/ng2022. [DOI] [PubMed] [Google Scholar]

PERMALINK

Robust Tests for Single-marker Analysis in Case-Control Genetic Association Studies

Qizhai Li

Gang Zheng

Xueying Liang

Kai Yu

Abstract

1. Introduction