Main Text
To the Editor: The population-based case-control study is a useful approach to evaluating genetic association with many common and complex diseases. In general, one first uses the generalized linear model to fit the data and then uses an asymptotic test to detect the true association. In addition to this regression-based analysis, when Hardy-Weinberg equilibrium (HWE) holds in the population, testing HWE in cases has been used for indicating the association. Because the regression-based analyses (including the trend test and the likelihood-ratio test) are generally more powerful than testing HWE in cases, they are often employed in case-control studies. Less attention is paid to testing HWE in cases.
In the July 2008 issue of The American Journal of Human Genetics, Wang and Shete1 proposed a novel approach of using the tail strength to combine the p value of the likelihood-ratio test (LRT) for association and the p value of an exact test for the deviation from HWE in cases. Taylor and Tibshirani2 originally proposed the tail strength as a measure of the overall strength of association for a large number of hypotheses in microarray analyses and genome-wide association studies (GWAS). Compared to Fisher's combination of p values3, which weights each p value equally, the tail strength weights each ordered p value by its expectation under the null hypothesis. The tail strength can be used for combining independent and dependent p values and is not restricted to any special genetic model underlying the data. Wang and Shete1 combined the two p values by using the tail strength and extended the original tail strength by using the medians of the ordered p values as weights. They derived asymptotic null distributions for the tail strengths by applying the additive model and using the mean and median as weights, respectively. Their results showed significant improvement in terms of the power when the tail strengths were used. They also showed that the type I errors were under control, although we notice that almost all reported type I errors in their tables are less than the nominal levels.
Normally, when the tail strength is used as a test statistic, its asymptotic null distribution is approximated by Monte-Carlo simulation procedures. Simulation-based approaches to determining the tail probabilities or p values of complex statistics have limitations for applications in GWAS.4,5 In this situation, deriving their asymptotic distributions is important. Although Wang and Shete1 derived the asymptotic null distributions and critical values for their tail-strength statistics, they assumed in their derivations that the two p values were independent even though in the introduction section they mentioned that they would use the tail strength to combine two dependent p values. When the two p values are correlated, their asymptotic null distributions may be inappropriate. Using two test statistics different from those in Wang and Shete,1 Zheng and Ng6 noticed that the correlation between the p values of the trend test and testing HWE between cases and controls (HWDTT7) could also vary from the recessive (REC) model to the additive (ADD) model, the multiplicative (MUL) model, and the dominant (DOM) model. As we mentioned before, Wang and Shete1 considered the tail strengths based only on the ADD model. However, the performance of testing HWE in cases would also vary across the genetic models. For example, it is known that testing HWE cannot detect association under the MUL model even though testing HWE has been used for detecting association.8–10
Therefore, the performance of the tail strength of Wang and Shete1 can be potentially affected by two factors that were either ignored or not examined in their article. One is the correlation between the two p values of the LRT and the test for HWE in cases, and the other is the unknown underlying genetic models. In this letter, using Monte-Carlo simulation procedures, we study the correlations between the p values of the LRT and the exact test for Hardy-Weinberg proportion in cases under the four genetic models. If the two p values are indeed correlated, we examine the performance of the tail-strength statistics of Wang and Shete1 under the null and alternative hypotheses. The analytical formula of the correlation, if any, between the LRT and the exact test for HWE used in Wang and Shete1 is difficult to obtain. Therefore, we consider the combination of the p values of the trend test and chi-square test for HWE between cases and controls (HWDTT), from which the asymptotic correlation between the two p values has been obtained.6,7 This new tail strength with the correlation is denoted by TSC. We further derive its asymptotic null distribution and critical value (see Appendix A). Comparison between our TSC and that of Wang and Shete1 is obtained by Monte-Carlo simulations under the null and alternative hypotheses. We also denote the tail strengths based on the mean and median in Wang and Shete1 by TS and TSM, respectively.
Here we report the main results from our simulation study. In the simulation, we assumed HWE holds in the population. In each replicate, 500 cases and 500 controls were generated under the null hypothesis with the baseline penetrance fixed at 0.02 (the probability of disease with a genotype of zero risk alleles), and minor-allele frequency (MAF) increases from 0.1 to 0.5 in increments of 0.1. We used a total of 10,000 replicates to estimate the null correlations between the two p values, the type I error rates, and power. The nominal levels 0.01 and 0.05 were used. For LRT statistics, we considered 1-degree-of-freedom tests. Therefore, for each genetic model under the alternative hypothesis (REC, ADD/MUL, and DOM), an optimal test is available for the LRT or trend test. In the simulation, we consider three LRTs and three trend tests, optimal for the three genetic models. Therefore, a total of nine tail strengths were considered in the simulation: TS, TSM, and TSC each have three model choices depending on the targeted genetic model. The results of the null correlations between the two p values in Wang and Shete1 and corresponding type I errors are reported in Table 1 for the nominal level 0.01 and in Table 2 for the nominal level 0.05.
Table 1.
Simulated Null Correlations of the Two p Values of Wang and Shete1 and the Asymptotic Type I Errors with Nominal Level 0.01
| MAF | Model | Simulated Null Correlations | TS | TSM | TSC |
|---|---|---|---|---|---|
| 0.1 | REC | 0.2702 | 0.0262 | 0.0272 | 0.0067 |
| ADD | −0.0049 | 0.0056 | 0.0057 | 0.0081 | |
| DOM | 0.0238 | 0.0072 | 0.007 | 0.009 | |
| 0.2 | REC | 0.2327 | 0.0248 | 0.0251 | 0.0123 |
| ADD | 0.0018 | 0.0092 | 0.0092 | 0.0108 | |
| DOM | 0.0328 | 0.0129 | 0.0128 | 0.0116 | |
| 0.3 | REC | 0.1672 | 0.0268 | 0.026 | 0.0131 |
| ADD | 0.0187 | 0.0104 | 0.0101 | 0.0112 | |
| DOM | 0.0505 | 0.017 | 0.017 | 0.0092 | |
| 0.4 | REC | 0.1454 | 0.0225 | 0.0225 | 0.0118 |
| ADD | −0.0149 | 0.0076 | 0.0074 | 0.0083 | |
| DOM | 0.0716 | 0.0157 | 0.0153 | 0.0092 | |
| 0.5 | REC | 0.1037 | 0.0197 | 0.0201 | 0.0128 |
| ADD | −0.0047 | 0.0074 | 0.0077 | 0.0103 | |
| DOM | 0.0919 | 0.0174 | 0.0175 | 0.0081 |
TS uses means as weights, TSM uses medians as weights, and TSC is the proposed test with the correlations. Three genetic models, which are only used for constructing the optimal LRTs (for TS and TSM) and optimal-trend tests (for TSC), are considered.
Table 2.
Simulated Null Correlations of the Two p Values of Wang and Shete1 and the Asymptotic Type I Errors with Nominal Level 0.05
| MAF | Model | Simulated Null Correlations | TS | TSM | TSC |
|---|---|---|---|---|---|
| 0.1 | REC | 0.2702 | 0.0841 | 0.0832 | 0.0436 |
| ADD | −0.0049 | 0.0394 | 0.0403 | 0.0477 | |
| DOM | 0.0238 | 0.0409 | 0.0408 | 0.0462 | |
| 0.2 | REC | 0.2327 | 0.0765 | 0.0772 | 0.0476 |
| ADD | 0.0018 | 0.0443 | 0.0441 | 0.0557 | |
| DOM | 0.0328 | 0.0513 | 0.0524 | 0.0472 | |
| 0.3 | REC | 0.1672 | 0.0727 | 0.072 | 0.0482 |
| ADD | 0.0187 | 0.0438 | 0.0438 | 0.0468 | |
| DOM | 0.0505 | 0.0519 | 0.0524 | 0.0531 | |
| 0.4 | REC | 0.1454 | 0.0695 | 0.0678 | 0.0458 |
| ADD | −0.0149 | 0.0519 | 0.0516 | 0.0521 | |
| DOM | 0.0716 | 0.0574 | 0.0569 | 0.0494 | |
| 0.5 | REC | 0.1037 | 0.0722 | 0.0704 | 0.0481 |
| ADD | −0.0047 | 0.0474 | 0.0483 | 0.0509 | |
| DOM | 0.0919 | 0.0647 | 0.0633 | 0.0475 |
TS uses means as weights, TSM uses medians as weights, and TSC is the proposed test with the correlations. Three genetic models, which are only used for constructing the optimal LRTs (for TS and TSM) and optimal-trend tests (for TSC), are considered.
The results in Tables 1 and 2 follow similar patterns. Thus, we focus on Table 1. The simulated null correlations between the p value of LRT and the p value of the exact HWE test in cases indicate that the null correlations are not zero when the LRT is optimal for the REC or DOM models, but they are close to zero for the ADD (MUL) model. Hence, the type I errors of the TS and TSM of Wang and Shete1 are under control when the LRT is optimal for the ADD (MUL) model but are largely inflated when the LRTs are optimal for the REC and DOM models, especially for the REC model. Note that Wang and Shete1 only considered the LRT optimal for the ADD model. Therefore, their type I errors were under control. On the other hand, the type I errors of TSC, which takes care of the correlations, are close to the nominal level regardless of the targeted genetic models.
We also conducted simulations to compare the powers of the TS, TSM, and TSC. For the TS and TSM, the correlations between the two p values were not incorporated. Thus, on the basis of results in Tables 1 and 2, their powers could be inflated under the REC and DOM models, but not under the ADD and MUL models. The powers are presented for the TS, TSM, and TSC (from left to right) under the REC model (Figure 1) and ADD model (Figure 2). The plots for the MUL and DOM models can be found in the Supplemental Data available online (Figures S1 and S2, respectively). The parameter values of the simulations under the alternative hypotheses are similar to those in Tables 1 and 2, except that the genotype relative risk (gamma2, which is defined as the ratio of penetrances with two risk alleles to those with zero risk alleles) ranges from 1 to 2, and the MAF is fixed at 0.3. The “asymptotic” and “simulated” powers in the figures were based on the critical values obtained from 10,000 parametric bootstrap samples and 10,000 permutations, respectively.
Figure 1.

The Asymptotic and Simulated Powers under the REC Model
The tests from left to right are TS, TSM, and TSC. Gamma2 is the ratio of penetrances with two risk alleles to no risk alleles.
Figure 2.

The Asymptotic and Simulated Powers under the ADD Model
The tests from left to right are TS, TSM, and TSC. Gamma2 is the ratio of penetrances with two risk alleles to zero risk allele.
Figure 1 (under the REC model) shows that TS and TSM have similar powers and are more powerful than TSC. This could be due to the fact that TS and TSM had inflated type I errors as shown in Tables 1 and 2. On the other hand, Figure 2 shows that the powers of TS, TSM, and TSC are similar under the ADD model because the three statistics had similar type I errors. For the TS and TSM, the bootstrap and permutation procedures yield similar powers under the ADD, MUL, and DOM models but have slightly different powers under the REC model.
We also studied empirical powers of the TSC, the optimal trend test, a robust test MAX311, and classical Pearson's test for association under the four genetic models. The description and summary of our findings are given in Appendix B. The results show that the TSC has moderate power improvement under the REC model but loses significant power under the ADD and MUL models. This can be explained by the fact that testing HWE has little power under the ADD and MUL models.
In summary, the tail strength may improve power under some specific genetic models after correction for the correlation. However, when the underlying genetic model is unknown, the robust statistics are more preferable.6,11
Appendix A
The Asymptotic Null Distribution of the TSC with the Correlation
Denote the HWDTT by Z∗, which is a statistic testing HWE between cases and control and was proposed by Song and Elston.7 Denote the trend test as Zx, where x = 0, 0.5, and 1 for the REC, ADD (MUL), and DOM models, respectively.11–13 Under the null hypothesis H0, (Z∗, Zx) follows the bivariate normal distribution N(0, Σ1) with the density function f1, where , and (−Z∗, Zx) follows the bivariate distribution N(0, Σ2) with the density function f2, where . The expressions for ρx were given in Zheng and Ng for different x values.6 The following derivation can be modified to the tail strength of any two correlated p values.
The p value of Z∗ is P∗ = 2Φ(− |z∗|), and the p value of Zx is Px = 2Φ(− |zx|), where Φ is the cumulative distribution function of the standard normal N(0, 1), and z∗ and zx are observed statistics. Then the joint distribution of P∗ and Px is:
Thus, its density function can be written as
Therefore, the ordered p values have the cumulative function given by
The density function of the ordered p values is given by
| (A1) |
Once we obtain the above joint distribution g(x(1), x(2)), we can use the results of Wang and Shete1 to obtain the asymptotic null distribution for TSC:
The density function of TSC is given by
where g is given in Equation (A1). We also consider a test for departure from HWE only by using cases in Appendix B. In this case, the above formulas can also be used except that the correlation needs to be modified accordingly.
Appendix B
Power Comparison between the Optimal-Trend Test, MAX3, Pearson's Test, and the TSC Tests
We compared the performance of several test statistics under the alternative hypotheses with the genotype relative risk 1.5, the disease prevalence 0.1, and 500 cases and 500 controls. The nominal level was 0.05. All critical values were obtained from the simulation with 100,000 replicates. The estimated powers were obtained from 10,000 replicates.
We considered four different genetic models: REC, ADD, MUL, and DOM models. Under each model the optimal-trend test was used.11–13 These optimal-trend tests may not be realistic when the underlying genetic model is unknown. Thus, for comparison, we included two robust tests: MAX3, proposed by Freidlin et al.11, and the classical Pearson's test with 2 degrees of freedom. For the tail strength, we considered TSC (the tail strength with the correlation). Two TSCs were considered. One is discussed in the text (denoted by TSC2, where HWDTT is used), and the second one only uses cases to detect departure from HWE (denoted by TSC1).
The results from the simulations are reported in Table S1. The results show that TSC1 is usually more powerful than TSC2. Note that TSC1 is more powerful than the optimal trend test under the REC model when MAF is small to moderate. But TSC1 is much less powerful than the optimal trend test under the ADD and MUL models. This is because testing HWE has little power under these two models. TSC1 catches some power under the DOM model, but it is slightly less powerful than the optimal-trend test. On the other hand, when the genetic model is unknown, we cannot use the optimal-trend test. However, we compare the TSC1 with the robust test MAX3, which does not require that we know the genetic model. Table S1 shows that, except for the REC model, MAX3 is more powerful than TSC1.
Supplemental Data
Supplemental Data include two figures and one table and are available with this article online at http://www.ajhg.org/.
Supplemental Data
Web Resources
The URL for data presented herein is as follows:
The R program (TSC.txt) used in the simulation can be downloaded from the website: www.statisticalsource.com/software/TSC1.txt.
Acknowledgments
We would like to thank Yaning Yang for some helpful discussions that brought our attention to the tail strength. The work of Y. Zang and W.K. Fung were partially supported by The Croucher Foundation and China Natural Science Foundation (no. 10701067).
References
- 1.Wang J., Shete S. A test for genetic association that incorporates information about deviation from Hardy-Weinberg proportions in cases. Am. J. Hum. Genet. 2008;83:53–63. doi: 10.1016/j.ajhg.2008.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Taylor J., Tibshirani R. A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics. 2006;7:167–181. doi: 10.1093/biostatistics/kxj009. [DOI] [PubMed] [Google Scholar]
- 3.Elston R.C. On Fisher method of combining p-values. Biometrical J. 1991;33:339–345. [Google Scholar]
- 4.Sladek R., Rocheleau G., Rung J., Dina C., Shen L., Serre D., Boutin P., Vincent D., Belisle A., Hadjadj S. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885. doi: 10.1038/nature05616. [DOI] [PubMed] [Google Scholar]
- 5.Conneely K.N., Boehnke M. So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. Am. J. Hum. Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zheng G., Ng H.K.T. Genetic model selection in two-phase analysis for case-control association studies. Biostatistics. 2008;9:391–399. doi: 10.1093/biostatistics/kxm039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Song K., Elston R.C. A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies. Stat. Med. 2006;25:105–126. doi: 10.1002/sim.2350. [DOI] [PubMed] [Google Scholar]
- 8.Nielsen D.M., Ehm M.G., Weir B.S. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am. J. Hum. Genet. 1998;63:1531–1540. doi: 10.1086/302114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wittke-Thompson J.K., Pluzhnikov A., Cox N.J. Rational inferences about departure from Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:967–986. doi: 10.1086/430507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang T., Zhu X., Elston R.C. Improving power in contrasting linkage-disequilibrium patterns between cases and controls. Am. J. Hum. Genet. 2007;80:911–920. doi: 10.1086/516794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Freidlin B., Zheng G., Li Z., Gastwirth J.L. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum. Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
- 12.Sasieni P.D. From genotype to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
- 13.Zheng G., Freidlin B., Li Z., Gastwirth J.L. Choice of scores in trend tests for case-control studies of candidate-gene associations. Biometrical J. 2003;45:335–348. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
