Summary
Sasieni (1997, Biometrics) stated that, when Hardy-Weinberg equilibrium (HWE) holds in the combined case-control samples, the allelic test is asymptotically equivalent to the trend test (for the additive model) for testing genetic association, and hence the allelic test should not be used. Guedj et al. (2008, Ann. Hum. Genet.) show that the allelic test and the trend test are asymptotically equivalent when HWE holds in the population. It is known that, when HWE does not hold, the trend test can still be used while the allelic test is no longer valid. Therefore, the allelic test is either not valid or is asymptotically equivalent to the trend test. It appears that the allelic test is a nuisance test. Can it be retired from the analysis of case-control association studies? It all depends on data and model assumptions. We give conditions under which the allelic test and the trend test are asymptotically equivalent under both null and alternative hypotheses.
Keywords: Allele-based test, trend test, disequilibrium coefficient, genetic model, robust tests
Introduction
To test genetic association using the case-control study design, the data for a single bi-allelic marker can be presented in a 2 × 3 table when genotypes are counted or a 2 × 2 table when alleles are counted. The trend test (ZT) and the allele-based test (ZA) are usually used to test for association in the 2 × 3 and 2×2 tables, respectively. These tests compare the genotype distributions under the additive model or the allele frequencies between cases and controls.
Sasieni (1997) compared ZT and ZA and gave a condition under which the two tests are asymptotically equivalent. The condition can be interpreted as Hardy-Weinberg equilibrium (HWE) in the combined case-control samples. Without this condition, ZA is not a valid test and its asymptotic null distribution may not be a Chi-square distribution with 1 degree of freedom. The trend test ZT, however, can be used regardless of HWE. Schaid & Jacobsen (1999) further showed that ZA is biased under the null hypothesis of no association when HWE does not hold. Recently, Guedj et al. (2008) proved the asymptotic equivalence of ZT and ZA when HWE holds in the population, and their arguments were simplified by Knapp (2008). Based on these discussions, when HWE holds in the population, either ZT or ZA can be used because they are asymptotically equivalent, while only ZT can be used when HWE does not hold (see also Li et al., 2008a for correcting allelic test in the presence of allelic correlation). It appears that the allelic test is a nuisance test. Does this mean that ZA can be retired from the analysis of case-control association studies? We study the problem here and find that it all depends on what you assume. In particular, conditions under which the two tests are asymptotically equivalent are given.
The Algebraic Relationship between the Allelic and Trend Tests
Using the notation of Guedj et al. (2008), denote the genotype counts for three genotypes (G0, G1, G2) = (aa, Aa, AA) as (D0, D1, D2) in cases and (H0, H1, H2) in controls. Let ND = D0 + D1 + D2, NH = H0 + H1 + H2, N = ND + NH, and Ni = Di + Hi for i = 0, 1, 2. Hence, a 2 × 3 table is formed by counting genotypes. On the other hand, by counting alleles, the numbers of alleles a and A are 2D0 + D1 and 2D2 + D1 in cases, and 2H0 + H1 and 2H2 + H1 in controls, respectively. Therefore, a 2 × 2 table is formed.
Denote ND/N → ψ ∈ (0, 1) as N → ∞, k = Pr(disease), gi = Pr(Gi), and fi = Pr(disease|Gi) for i = 0, 1, 2. The genotype counts (D0, D1, D2) and (H0, H1, H2) follow the multinomial distributions mul(ND; P0, P1, P2) and mul(NH; Q0, Q1, Q2), respectively, where Pi = Pr(Gi|case) = gifi/k and Qi = Pr(Gi|control) = gi(1−fi)/(1− k). Thus,
(1) |
The allelic test (ZA) and the trend tset (ZT) can be written as
where p̂DA = (2D2 +D1)/(2ND), p̂HA = (2H2 +H1)/(2NH), and p̂ = (2N2 +N1)/(2N).
Sasieni (1997) proved that . Thus, ZA ≡ZT provided , which was interpreted as HWE in the combined samples. In reality, even when HWE holds in the combined samples, only holds asymptotically. Thus, ZA and ZT are asymptotically equivalent whether or not the candidate marker is associated with a disease. Here the asymptotic equivalence of ZA and ZT (under either the null or the alternative hypotheses) is defined as ZA/ZT → 1 in probability as N → ∞. The assumption of HWE in the population is also often used. How the condition is related to HWE in the population will be discussed later. To study HWE in the population, Guedj et al. (2008) obtained a different expression ZA/ZT = 1 + {(P̂2 − p ^2)/(p̂(1 − p̂)}, where P̂2 = N2/N. Under the null hypothesis, as shown in Guedj et al. (2008) and Knapp (2008), P̂2 − p̂2 → 0 in probability as N → ∞. Hence, since p̂(1 − p̂) is bounded in probability, ZA and ZT are asymptotically equivalent when HWE holds in the population (each has an asymptotically Chi-square distribution with 1 degree of freedom under the null hypothesis). It should be pointed out that P̂2 − p̂2 → 0 only holds under the null hypothesis of no association. Therefore, the asymptotic equivalence of ZA and ZT only establishes the validity of the allelic test ZA as a test for association.
The Allelic and Trend Tests under the Alternative Hypothesis
Note that ZA and ZT are asymptotically equivalent when HWE holds in the combined samples, regardless of the null or alternative hypotheses (Sasieni, 1997). However, we follow the approach of Guedj et al. (2008) and examine the limit of P̂2 − p̂2 under the alternative hypothesis. Note that P̂2 = (D2 +H2)/N = (D2/ND)(ND/N)+ (H2/NH)(NH/N) → ψP2 + (1 − ψ)Q2 in probability as N → ∞, and p̂ = (2N2 + N1)/(2N) → ψ(P2 + P1/2) + (1 − ψ)(Q2 + Q1/2) in probability as N → ∞. Denote . (Under H0, Pi = Qi for any i. Thus, P̂2 and p̂ are unbiased estimates for and p* = Pr(A) under H0, respectively.) Thus, under the alternative hypothesis H1, P̂2 − p̂2 does not converge to 0 in probability even when HWE holds in the population, except for a specific situation outlined below. Therefore, the two tests may have different power.
We will identify a new condition under which ZA and ZT are asymptotically equivalent under H1 assuming HWE in the population. Then we compare our condition with that of Sasieni (1997). Define disequilibrium coefficients in cases and controls as ΔD = P2 − (P2 + P1/2)2 and ΔH = Q2 − (Q2 + Q1/2)2, respectively. When HWE holds in the case (or control) population, ΔD = 0 (or ΔH = 0). Note that under H1, ΔD and ΔH cannot both equal zero when HWE holds in the population (Song & Elston, 2006; Zheng & Ng, 2008). Then, under H1,
(2) |
where (P2 + P1/2) − (Q2 + Q1/2) = Pr(A|case) − Pr(A|control) ≠ 0 under H1. The sign of ZA − ZT is determined by that of P̂2 − p̂2. Note that (2) gives a condition for the asymptotic equivalence of ZA and ZT under H1. For example, when the right hand side (RHS) of (2) is zero, ZA and ZT are asymptotically equivalent under H1 (without HWE in the population). Simple conditions for ZA = ZT are HWE in the population and ψ = k, where the latter condition indicates that the proportion of cases (ND/N) in the case-control samples is an unbiased estimate for the disease prevalence k. Applying (1) to the RHS of (2), the RHS of (2) can be written as g2 − (g2 + g1/2)2, which is zero when HWE holds in the population.
Under H1, we can link the above results to the genetic model. When HWE holds in the population, Wittke-Thompson et al. (2005) and Zheng & Ng (2008) proved that: (i) ΔD > 0 and Delta;H < 0 under the recessive model (f1 = f0), (ii) ΔD < 0 and ΔH < 0 under the additive model (2f1 = f0 +f2), (iii) ΔD = 0 and ΔH < 0 under the multiplicative model , and (iv) ΔD < 0 and ΔH > 0 under the dominant model (f1 = f2). Define genotype relative risks (GRRs) λi = fi/f0 for i = 1, 2. From the signs of ΔD and ΔH and the RHS of (2), we see that under the recessive model, if 1 − ψ is small enough, the RHS of (2) is positive (i.e., ZA > ZT). On the other hand, for the dominant model, if ψ is small enough, the RHS of (2) is positive (ZA > ZT). Because both ΔD and ΔH are negative under the additive model, ZA may be less than ZT for common choices of parameter values.
Assuming HWE in the population (but ψ ≠ k), Table 1 reports the values of , denoted by . If , ZA is asymptotically more powerful than ZT because the asymptotic power of ZA is , the asymptotic power of ZT. To calculate Table 1, we chose the prevalence k = 0.10, allele frequency p = 0.3, and the proportion of cases ψ = 0.05, 0.30, and 0.50. The alternative hypothesis was given by λ2 = 2 or 4 and λ1 was calculated using the given genetic model and the value of λ2. Four genetic models (recessive - REC, additive - ADD, multiplicative - MUL, and dominant - DOM) were considered.
Table 1.
Model | GRR λ2 | ψ | Model | GRR λ2 | ψ | ||
---|---|---|---|---|---|---|---|
REC | 2.0 | 0.05 | 0.989 | DOM | 2.0 | 0.05 | 1.008 |
0.30 | 1.041 | 0.30 | 0.968 | ||||
0.50 | 1.079 | 0.50 | 0.934 | ||||
4.0 | 0.05 | 0.972 | 4.0 | 0.05 | 1.014 | ||
0.30 | 1.099 | 0.30 | 0.941 | ||||
0.50 | 1.181 | 0.50 | 0.876 | ||||
ADD | 2.0 | 0.05 | 0.999 | MUL | 2.0 | 0.05 | 0.998 |
0.30 | 0.999 | 0.30 | 1.005 | ||||
0.50 | 0.994 | 0.50 | 1.007 | ||||
4.0 | 0.05 | 0.999 | 4.0 | 0.05 | 0.993 | ||
0.30 | 0.994 | 0.30 | 1.020 | ||||
0.50 | 0.977 | 0.50 | 1.028 |
Table 1 shows that ZT is always more powerful than ZA under the additive model. For the other three models, the allelic test ZA can be more powerful than the trend test ZT, in particular under the recessive model with moderate to common ψ (0.30 and 0.50). For example, when the numbers of cases and controls are equal(ψ= 0.50) and λ2= 4 under the recessive model, . This shows that ZT is more conservative than ZA, even though both tests are asymptotically equivalent under H0.
Comparison with the Condition of Sasieni (1997)
To establish the asymptotic equivalence of the allelic and trend tests under both null and alternative hypotheses, Sasieni (1997) provided the condition that HWE must hold in the combined case-control samples. Guedj et al. (2008) and Knapp (2008) demonstrated that the allelic test is valid when HWE holds in the population. In this note we show that, in addition to the condition of HWE in the population, if the proportion of cases in the samples equals the disease prevalence, then the two tests are asymptotically equivalent under the alternative hypothesis. Note that the case-control samples are obtained retrospectively from the case population and control population. From the point of view of a prospective case-control design, the condition of Sasieni (1997) is equivalent to HWE in the population, because the case-control samples can be regarded as random samples from the entire population. Then, the proportion of cases in the samples is indeed an unbiased estimate of the prevalence. Thus, the condition of Sasieni (1997) not only requires HWE in the population but also reflects the sampling of the retrospective study. Our condition, equivalent to that of Sasieni (1997) under the prospective design, is a modification of that of Sasieni (1997) in the retrospective study.
Can the Allelic Test be Retired?
At its 100th anniversary (1908 – 2008), HWE still plays a very important role in genetic studies and analyses. HWE in the population is a basic assumption required by many statistical procedures (Sasieni, 1997; Guedj et al., 2008; Wittke-Thompson et al., 2005; Song & Elston, 2006; Zheng & Ng, 2008). For case-control data, due to the selective sampling scheme, whether HWE holds in the population cannot be tested using case-control data because the disease prevalence is unknown. When the true disease prevalence is known and when HWE holds in the population, as we have shown, we can design a case-control study with the proportion of cases equal to the prevalence. Then the allelic and trend tests are asymptotically equivalent under both null and alternative hypotheses. Therefore, there is no need to apply the allelic test.
In practice, however, we do not know whether or not HWE holds in the population. Zou & Donner (2006) discussed several issues arising from testing HWE using the observed data. Even though HWE holds in the population, our results show that the allelic test can be asymptotically more powerful than the trend test due to sampling issues in the case-control data. Thus, for most analyses of case-control association, the allelic test cannot be retired. Not only can it not be retired, but we also propose the application of both the allelic and trend tests to test case-control genetic associations. If both tests are significant or neither is significant, the conclusions are easy to draw. However, if only one test is significant, then reporting results from both tests is important, because reporting only the significant p-value would distort the interpretation of the p-value and inflate the false positive rate.
Discussion
Both the allelic test and trend test are useful in practice and they are based on the additive genetic model. For large association studies (such as genome-wide association studies), the genetic models for markers with true associations are usually unknown. Using allelic or trend tests based on the additive model is not robust. In this case, Guedj et al. (2008) suggested unbiased and exact tests (Guedj et al., 2006). In addition, Pearson’s Chi-square test with two degrees of freedom is also often used. On the other hand, robust tests are also studied and have been implemented, e.g., the constrained likelihood ratio test (Wang and Sheffield, 2005) and maximum tests (Freidlin et al., 2002; Gonzalez et al., 2008; Li et al., 2008b). In particular, Zheng et al. (2006) showed by simulation studies that the maximum of three trend tests (optimal for the recessive, additive and dominant models), called MAX or MAX3, is always more powerful than Pearson’s Chi-square test when the genetic models are restricted to the above three genetic models. Hence, for large genetic association studies using case-control data, MAX or other robust tests are more efficient and preferable (Li et al., 2008c).
Acknowledgments
The author would like to thank Xiaofeng Zhu, Jungnam Joo, Myron Waclawiw and Nancy Geller, and two reviewers for their helpful comments.
References
- Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Human Heredity. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
- Gonzalez JR, Carrasco JL, Dudbridge F, Armengol L, Estivill X, Moreno V. Maximizing association statistics over genetic models. Genetic Epidemiology. 2008;32:246–254. doi: 10.1002/gepi.20299. [DOI] [PubMed] [Google Scholar]
- Guedj M, Wojcik J, Della-Chiesa E, Nuel G, Forner K. A fast, unbiased and exact allelic test for case-control association studies. Human Heredity. 2006;61:210–221. doi: 10.1159/000094776. [DOI] [PubMed] [Google Scholar]
- Guedj M, Nuel G, Prum B. A note on allelic tests in case-control association studies. Annals of Human Genetics. 2008;72:407–409. doi: 10.1111/j.1469-1809.2008.00438.x. [DOI] [PubMed] [Google Scholar]
- Knapp M. On the asymptotic equivalence of allelic and trend statistic under Hardy-Weinberg equilibrium. Annals of Human Genetics. 2008 doi: 10.1111/j.1469–1809.2008.00453.x. in press. [DOI] [PubMed] [Google Scholar]
- Li Z, Zhang H, Zheng G, Gastwirth JL, Gail MH. Excess false positive rate caused by population stratification and disease rate heterogeneity in case-control association studies. Computational Statistics and Data Analysis. 2008a doi: 10.1016/j.csda.2008.02.021. in press. [DOI] [Google Scholar]
- Li Q, Zheng G, Li Z, Yu K. Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Annals of Human Genetics. 2008b;72:397–406. doi: 10.1111/j.1469-1809.2008.00437.x. [DOI] [PubMed] [Google Scholar]
- Li Q, Yu K, Li Z, Zheng G. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Human Genetics. 2008c;123:617–623. doi: 10.1007/s00439-008-0514-8. [DOI] [PubMed] [Google Scholar]
- Sasieni PD. From genotype to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
- Schaid DJ, Jacobsen SJ. Biased tests of association: comparison of allele frequencies when departing from Hardy-Weinberg proportions. American Journal of Epidemiology. 1999;149:706–711. doi: 10.1093/oxfordjournals.aje.a009878. [DOI] [PubMed] [Google Scholar]
- Song K, Elston RC. A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies. Statistics in Medicine. 2006;25:105–126. doi: 10.1002/sim.2350. [DOI] [PubMed] [Google Scholar]
- Wang K, Sheffield VC. A constrained-likelihood approach to marker-trait association studies. American Journal of Human Genetics. 2005;77:768–780. doi: 10.1086/497434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittke-Thompson JK, Pluzhnikov A, Cox N. Rational inferences about departure from Hardy-Weinberg equilibrium. American Journal of Human Genetics. 2005;76:967–986. doi: 10.1086/430507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng G, Freidlin B, Gastwirth JL. IMS Lecture Notes - Monograph Series. 2006. Comparison of robust tests for genetic association using case-control studies; pp. 320–336. 2nd special issue in honor of E.L. Lehmann. [Google Scholar]
- Zheng G, Ng HKT. Genetic model selection in two-phase analysis for case-control association studies. Biostatistics. 2008;9:391–399. doi: 10.1093/biostatistics/kxm039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou GY, Donner A. The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched case-control data: a cautionary not. Annals of Human Genetics. 2006;70:923–933. doi: 10.1111/j.1469-1809.2006.00267.x. [DOI] [PubMed] [Google Scholar]