Abstract
Choosing an appropriate single-marker association test is critical to the success of case-control genetic association studies. An ideal single-marker analysis should have robust performance across a wide range of potential disease risk models. MAX was designed specifically to achieve such robustness. In this work, we derived the power calculation formula for MAX and conducted a comprehensive power comparison between MAX and two other commonly used single-marker tests, the one-degree-of-freedom (1-df) Cochran-Armitage trend test and the 2-df Pearson Chi-squared test. We used a single-marker disease risk model and a two-marker haplotype risk model to explore the performances of the above three tests. We found that each test has its own “sweet” spots. Among the three tests considered, MAX appears to have the most robust performance.
Keywords: association, chi-square, genetic model, MAX, power, robustness
1. Introduction
In case-control genetic association studies (CCGAS), single-marker analysis, which tests the association between the outcome and an individual SNP, is often used. The following two tests are usually applied in single-marker analysis when there are no other covariates to be adjusted for: the 1-degree-of-freedom (1-df) Cochran-Armitage trend test (CATT) (Klein et al. 2005; WTCCC, 2007) thatcorresponds to the score test derived from an additive disease model (CATTA) (on the logit scale), and the 2-df Pearson Chi-squared test (Chi-2df) that compares the 3-category genotype frequencies between the case and control groups (Yeager et al. 2007). In addition to these two tests, MAX, which takes the maximum of three CATTs derived under dominant, recessive, and additive models as the test statistic, has also been proposed for the association test (Sladek et al. 2007). A detailed definition of each test will be given below. When there are other covariates to be adjusted for, the test corresponding to each of those above can be derived from the standard logistic regression model that models the effect of the genotype coded according to the assumed disease model, with adjustment for the covariates. An important common feature of CATTA, Chi-2df, and MAX is that their 2-sided testing results are independent of the choice of the risk allele.
The significance level (p-value) of CATTA and Chi-2df can be obtained easily according to their theoretical asymptotic distributions. The calculation of the p-value for MAX is a little bit more involved and requires a multiple-integration or permutation procedure (Conneely & Boehnke, 2007; Li et al. 2008a). Although all three tests have been used in recent genome-wide association studies (GWAS) (e.g., Klein et al. 2005; Hunter et al. 2007; Sladek et al. 2007; WTCCC, 2007; Yeager et al. 2007), there is no consensus as to which one is generally preferrable, and also there are few discussions in the literature of the analytic power of MAX or of comparisons of MAX with the other tests. In this work, we derive an analytic formula for calculating the power of MAX. The existence of a power calculation formula for MAX, together with the power formulas that already exist for CATTA and Chi-2df, enables us to conduct a comprehensive power comparison among the three tests.
Comparisons of various single-marker analyses have been reported by several groups (Freidlin et al., 2002, Guedj et al. 2006; Kuo & Feingold, 2008). The uniqueness of this work is to add the promising MAX test to the comparison. In particular we compare the asymptotic powers of these tests under a broad range of single-locus and multi-locus disease models. The comparison results should shed more light on the relative merits of the three considered tests under various disease risk models and provide guidance for the analysis of future CCGAS.
2. Test statistics definition
We focus first on situations where there are no covariates to be adjusted for. Assume that there are r cases and s controls in a CCGAS, and that there are two alleles, G and g, at a given SNP locus with the possible genotypes gg, Gg, and GG. The notations for genotype counts in the case and control groups are given in Table 1. Based upon Table 1, the general form of the CATT can be written as
(1) |
where x=(x0,x1,x2)'is a genotype score vector for the coding of genotypes gg, Gg, and GG, and θi=ϑi=(ri+si)/(r+s), i=0,1,2 . The genotype score vector x is chosen by the investigator. It should be pointed out that the CATT given by (1) is equivalent to the score statistic testing for the null hypothesis H0: β=0 derived from the following standard logistic regression that models the effect of genotypes represented by x,
(2) |
Based upon (2), the three commonly assumed genetic models--recessive, additive, and dominant--correspond to the following assignments of the genotype score vector x:R=(R0,R1,R2)'=(0,0,1)',A=(A0,A1,A2)'=(0,0.5,1)',and D=(D0,D1,D2)'=(0,1,1)', respectively. Among the three CATT tests, Zx=A , called CATTA, which is derived according to an additive model, is usually preferred over Zx=D and Zx=R, as it does not rely on the assumption of a high-risk allele (assuming a two-sided test is performed), and thus this is the version of CATT that is generally used in CCGAS. The p-value for Zx can be obtained according to the standard normal distribution.
Table 1.
gg | Gg | GG | Total | |
---|---|---|---|---|
Case | r0 | r1 | r2 | r |
Control | s0 | s1 | s2 | s |
Total | n0 | n1 | n2 | n |
If the true underlying disease model is known, the CATT test Zx is the most efficient. But in reality, the true disease model is unknown. For a more robust test that enjoys a good performance over a wide range of disease models, the following test statistic, called MAX, has been proposed (Freidlin et al. 2002; Sladek et al. 2007; Li et al. 2008a, 2008b):
There are several ways to evaluate the significance level of MAX. For example, the multiple-integration procedure, which is available in R, can be used (Conneely & Boehnke, 2007), and it is computationally feasible in the context of GWAS. A more computationally challenging approach is through a permutation procedure (Sladek et al. 2007). Li et al. (2008a) derived an analytic upper bound that is reasonably accurate for small p-values.
Another robust test is the 2-df Chi-squared test . Using the notations listed in Table 1, we can define the Chi-2df test as
(3) |
The significance level of the Chi-2df test can be evaluated through the 2-df Chi-squared distribution.
3. The formula for power calculation
Under a given disease model, we denote the expected genotype frequencies of (gg,Gg,GG) for cases and controls as (p0,p1,p2) and (q0 ,q1,q2) , respectively. The analytic power calculation for CATTA ZA can be found in Freidlin et al. (2002) and Pfeiffer & Gail (2003).
The power calculation for the Chi-2df test is also straightforward. Under the significance level α , the reject region is [η,∞) , where η is the 1−α quantile of the 2-df Chi-squared distribution. The Chi-2df test statistic (defined by (3)) in general follows a non-central 2-df Chi-squared distribution (Edwards et al. 2005) under a given disease model, with the non-centrality parameter , so the power for the Chi-2df test is
Finally, we derive the power calculation formula for MAX. We denote the reject region of MAX under the significance level α by[γ ,∞), where γ (≥0) satisfies Prnull (ZMAX ≥γ)=α. Since (ZR,ZA,ZD)' follows a multivariate normal distribution under the null hypothesis with the mean vector of (0,0,0)'and the covariance matrix Δ given by Freidlin et al (2002), we can obtain the threshold γ by solving the following equation:
This can be accomplished easily using an existing function in the R package.
Under the disease model with the expected genotype frequencies (p0,p1,p2) and (q0,q1,q2) in cases and controls, respectively, (ZR,ZA,ZD)'asymptotically follows a multinormal distribution with mean vector μ=(μR,μA,μD)'and covariance matrix Λ. The mean vector is given by
(4) |
with the score vector x chosen as (0, 0, 1), (0, 0.5, 1), and (0, 1, 1) for μR ,μA, and μD, respectively, and with . The definition for the covariance matrix Λ is more complicated and is presented in the Appendix, as well as its detailed derivations.
The power of the MAX test for the alternative hypothesis H1 can be written as
(5) |
where v=(vR,vA,vD)'and Λis the covariance matrix.
4. Power comparison
We assume that the case and control sample sizes are r=s=1,000 . We first conduct the comparison under a single-marker disease risk model. We let the minor allele frequency (MAF) f for a particular SNP in the study population be in the range of 5–50%. For the MAF=f, we let the genotype frequencies of (gg, Gg, GG) in the control population, (q0 ,q1,q2) , have the values (q0 ,q1,q2)=((1−f)2,2f(1−f), f2). This is reasonable for the study of a rare disease in a source population where Hardy-Weinberg equilibrium holds. Let the odds ratios (ORs) for having 1 copy and 2 copies of the high-risk alleles be R1 and R2 , respectively. We have for an additive model (in the logit scale), R2 =R1>1 for a dominant model, and R2 >R1=1 for a recessive model. Given (R1,R2), we know that the genotype frequencies of (gg,Gg,GG) in the case population (p0 ,p1,p2) are (q0 ,q1R1,q2R2)/(q0+q1R1+q2R2).
In addition to the single-marker disease risk model, we compare the power of the three single-marker tests under the following 2-marker haplotype risk model. Suppose the disease risk is conferred by haplotypes consisting of two linked markers, with marker #1 having allele types B and b, and with marker #2 having allele types C and c. We designate the haplotype BC as the high-risk variant (corresponding to the high-risk allele in the single-marker risk model). As with the single-marker risk model, we can define the haplotype risk model as dominant, recessive, and additive. For example, if R1 and R2 denote the ORs for having one copy and two copies of the high-risk haplotype, respectively, we have for the additive haplotype risk model. To simplify the power comparison setup, we let p1 be the BC haplotype frequency in the study population and assume the other three 2-marker haplotypes have the same haplotype frequency. We further assume the independence of the two haplotypes within a subject in the study population. In the Appendix, we provide the formula for calculation of (p0,p1,p2) and (q0,q1,q1) , the genotype frequencies of (bb, Bb, BB) in the case and the control populations, respectively.
Fig. 1 shows the power curves of the above-considered association tests under each of three commonly assumed single-marker risk models (additive, dominant, and recessive) at a significance level of 0.05. From Fig. 1, we can see that MAX is always more powerful than Chi-2df, and in some cases it is associated with up to a 5% power increase. Comparing CATTA with MAX, we see that CATTA is slightly more powerful than MAX under the additive model, but in most cases the advantage is negligible. Under the recessive model, MAX (as well as the Chi-2df) is noticeably superior to CATTA. Under the dominant model, it is interesting to see that neither CATTA nor MAX dominates the other. CATTA is more favorable when the risk allele is relatively rare, while MAX becomes more attractive as the risk allele frequency gets larger.
Besides the three commonly used disease models, we also compared the power under a single-marker risk model with all possible combinations of two odds ratios R1 and R2 , with each ranging from 1 to 1.5. Fig. 2 summarizes the power comparison results. Clearly, there is no test that can outperform the others in all of the single-marker risk models considered. When the risk allele is relatively rare (say, MAF less than 0.2), all three tests have comparable power under various single-marker models, although CATTA outperforms the others in most of the R1,R2 region. As the risk allele gets more common, CATTA becomes less powerful than both MAX and Chi-2df under the single-marker risk model when R1>R2, although whether this kind of disease risk model is reasonable is debatable. MAX and Chi-2df have similar performances under all the considered choices of risk models and MAFs, with MAX performing more favorably under the risk model when R1<R2, and less favorably when R1>R2.
Power comparison results under the 2-marker haplotype risk models are given in Fig. 3. Similar to what we observed in Fig. 1, MAX appears to have the most robust performance. Although MAX is slightly less powerful than CATTA under the additive haplotype risk model, it has a noticeable power advantage over CATTA (more than 10% higher) under the dominant haplotype risk model. Also, from Fig. 3, we notice that MAX is consistently better than the Chi-2df, although the percentage increase in power is limited.
5. Discussion
Choosing the right single-marker analysis is a critical step for the success of CCGAS. Because of the uncertainty about the true underlying disease risk model, robust tests that have good performances under a wide range of disease models are preferred over those that are too sensitive to the model assumptions. MAX was designed specifically to achieve such robustness. Compared with the commonly used CATTA and Chi-2df, the power of MAX is less understood even though its type I error rate has been thoroughly investigated by Li et al. (2008). In this work, we derived the power calculation formula for MAX. Based on this power calculation formula, as well as the ones already existing for CATTA and Chi-2df, we conducted a comprehensive power comparison among the three tests. Not surprisingly, we found that each test has its own “sweet” spots. But MAX appears to have the most robust performances when the underlying genetic models are recessive, additive, or dominant. Under various overdominant models, the Chi-2df and MAX have very similar performance, with the power of the Chi-2df slightly higher than that of the MAX.
In order to assess the statistical significance of MAX, Sladek et al. (2007) used a permutation procedure to estimate p-values of MAX for each SNP. In order to ensure a reliable estimation for any p-value falling below the level of 10−6, we would need to carry out more than 107 permutation steps. It would be time-consuming and computationally challenging for a large-scale CCGAS. Alternatively, multiple integration (Conneely & Boehnke, 2007) and an efficient approximation method (Li et al., 2008) have been proposed to evaluate the statistical significance of MAX. For the integration procedure (Conneely & Boehnke, 2007), it would be possible to use the R package “mvtnorm”, which could be freely downloaded from the website http://cran.r-project.org/. The efficient approximation approach (Li et al., 2008), which is based on a one-dimensional integral, is user-friendly and can be implemented in many software packages, such as C, C++, R, Matlab, and SAS.
Since the MAX test did not perform as well as the chi-squared test under the overdominant model (R1>R2) , we also considered its extension, called MAX4, which is the maximum of four trend tests under four genetic models-- recessive, additive, dominant, and overdominant--with scores (0,0,1), (0,0.5,1), (0,1,1), and (0,1,0), respectively. We conducted some simulation studies to compare the asymptotic power of the MAX4 with that of CATTA, Chi-2df, and MAX. Table 3 showed the results. It can be seen from Table 3 that the MAX4 has the best performance among the four considered tests under the overdominant models, but it is slightly less powerful than the MAX test under the other three models considered. The choice between MAX and MAX4 depends on the likelihood of the overdominant model in real applications.
Table 3.
Minor allele frequency | ||||||
---|---|---|---|---|---|---|
(R1,R2) | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | |
Recessive model | (1,1.3) | |||||
CATTA | 0.059 | 0.114 | 0.237 | 0.411 | 0.584 | |
Chi-2df | 0.080 | 0.177 | 0.334 | 0.506 | 0.645 | |
MAX | 0.080 | 0.18 | 0.345 | 0.526 | 0.667 | |
MAX4 | 0.078 | 0.17 | 0.325 | 0.501 | 0.648 | |
Additive model | (1.2,1.44) | |||||
CATTA | 0.433 | 0.658 | 0.766 | 0.812 | 0.820 | |
Chi-2df | 0.341 | 0.552 | 0.669 | 0.722 | 0.732 | |
MAX | 0.368 | 0.595 | 0.714 | 0.766 | 0.776 | |
MAX4 | 0.362 | 0.574 | 0.684 | 0.733 | 0.742 | |
Dominant model | (1.3,1.3) | |||||
CATTA | 0.643 | 0.767 | 0.750 | 0.656 | 0.503 | |
Chi-2df | 0.562 | 0.727 | 0.746 | 0.693 | 0.585 | |
MAX | 0.582 | 0.749 | 0.767 | 0.714 | 0.604 | |
MAX4 | 0.591 | 0.752 | 0.763 | 0.702 | 0.584 | |
Overdominant model | (1.3,1.1) | |||||
CATTA | 0.584 | 0.627 | 0.491 | 0.277 | 0.106 | |
Chi-2df | 0.544 | 0.695 | 0.710 | 0.668 | 0.600 | |
MAX | 0.542 | 0.669 | 0.650 | 0.560 | 0.443 | |
MAX4 | 0.566 | 0.713 | 0.721 | 0.670 | 0.588 |
When there are covariates to be adjusted for, the corresponding MAX test can be derived from the logistic regression model. Li et al. (2008a) suggested a procedure for evaluating the p-value of the covariate-adjusted MAX test. Although the power comparison was conducted without any adjustment for covariates, we expect similar conclusions will still hold when covariate effects need to be adjusted for.
ACKNOWLEDGEMENTS
We would like to thank the editor and two anonymous reviewers for their insightful comments, which improved our presentation. We also thank B.J. Stone for her valuable help. K Yu, X Liang, and Q Li are supported by the Intramural Program of the National Institutes of Health. Q Li is supported in part by the Knowledge Innovation Program of the Chinese Academy of Sciences, Nos. 30465W0 and 30475V0.
APPENDIX
APPENDIX A: The Covariance between Zx and Zy
Theorem
Let x=(x0,x1,x2) ' and y=(y0,y1,y2) ' be any two score vectors. The asymptotic covariance between Zx and Zy can be written as
where ,
Proof
Let r⃗=(r0,r1,r2) ' and s⃗=(s0,s1,s2)'. Then we have
APPENDIX B: Two-marker Joint Genotype Frequencies under the Haplotype Risk Model
Suppose the disease risk is conferred by haplotypes consisting of two linked markers, with marker #1 having allele types B and b and marker #2 having allele types C and c. We designate the haplotype BC as the high-risk variant. Denote the haplotype frequencies for BC, Bc, bC, and bc as p1 , p2 ,p3 , and p4 , respectively. Let R1 and R2 denote the ORs for having one copy and two copies of the high-risk haplotype, respectively. We assume that HWE holds in the control group. Table 2 gives the joint genotype frequencies. From the table, we can see that the frequencies of BB, Bb, and bb in the control group at marker #1 are (p1+p2)2,2(p2+p2)(p3+p4) , and (p3+p4)2, respectively; the frequencies of BB, Bb, and bb in the case group at marker #2 are , and (p3+p4)2/ξ respectively, where .
Table 2.
Genotype pair | Frequency | |
---|---|---|
BBCC | ||
BBCc | 2p1p2 | |
BBcc | ||
BbCC | 2p1p3 | |
BbCc | 2(p1p4+p2p3) | |
Bbcc | 2p2p4 | |
bbCC | ||
bbCc | 2p3p4 | |
bbcc |
REFERENCES
- Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D. Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet. 2005;8:1–18. doi: 10.1186/1471-2156-6-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
- Guedj M, Della-Chiesa E, Picard F, Nuel G. Computing power in case-control association studies through the use of quadratic approximations: application to meta-statistics. Ann Hum Genet. 2006;71:262–270. doi: 10.1111/j.1469-1809.2006.00316.x. [DOI] [PubMed] [Google Scholar]
- Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007;39:870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Q, Zheng G, Li Z, Yu K. Efficient approximation of p-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet. 2008a;72:397–406. doi: 10.1111/j.1469-1809.2008.00437.x. [DOI] [PubMed] [Google Scholar]
- Li Q, Yu K, Li Z, Zheng G. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Hum Genet. 2008b;123:617–623. doi: 10.1007/s00439-008-0514-8. [DOI] [PubMed] [Google Scholar]
- Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in aged-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuo CL, Feingold E. What’s the best statistic for a simple test of genetic association in a case-control study?. Joint Statistical Meetings, Biometrics Section; August 2–7; 2008. [DOI] [PubMed] [Google Scholar]
- Pfeiffer RM, Gail MH. Sample size calculations for population- and family-based case-control association studies on marker genotypes. Genet Epidemiol. 2003;25:136–148. doi: 10.1002/gepi.10245. [DOI] [PubMed] [Google Scholar]
- Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885. doi: 10.1038/nature05616. [DOI] [PubMed] [Google Scholar]
- The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. doi: 10.1038/ng2022. [DOI] [PubMed] [Google Scholar]