Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2013 Oct 29;15(2):284–295. doi: 10.1093/biostatistics/kxt045

Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations

Qizhai Li 1,*, Jiyuan Hu 2, Juan Ding 3, Gang Zheng 4
PMCID: PMC3944971  PMID: 24174580

Abstract

A classical approach to combine independent test statistics is Fisher's combination of Inline graphic-values, which follows the Inline graphic distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.

Keywords: Dependent tests, Fisher's combination, Gamma distributions, Genetic pleiotropic associations, Genome-wide association studies, Type I error

1. Introduction

Combining independent test statistics is common in biomedical research. One approach is to combine the p-values of one-sided tests using Fisher's method (Fisher, 1932), referred to here as the Fisher's combination test (FCT). It has optimal Bahadur efficiency (Little and Folks, 1971). However, in general, it has a disadvantage in the interpretation of the results for combining two-sided tests. If one combines m independent p-values, the null distribution of the FCT is the Inline graphic distribution with 2m degrees of freedom Inline graphic. If m-test statistics are dependent, the null distribution of the FCT is no longer Inline graphic.

Most research on the FCT has focused on combining independent p-values and comparisons between the FCT and alternative approaches. In many genetic studies, test statistics are often dependent. In the analysis of Affymetrix expression array data, the FCT was employed to combine probe level two-sample t-tests, where the probes of the same probe set are correlated (Hess and Iyer, 2007). In genetic pleiotropic association studies, the association is tested between multiple traits collected from the same subjects and a marker. The FCT was used to combine the dependent single-trait analyses and was approximated using a gamma distribution (GD) (Zheng and others, 2012). Combining case–control and family-based designs or several genetic association studies can improve power to detect disease-associated markers (Infante-Rivard and others, 2009; Pfeiffer and others, 2009). When the two study designs share the cases or multiple association studies share controls, the dependent test statistics can be combined using the FCT. In genetic linkage and association studies under model uncertainty, multiple correlated tests are obtained under various genetic models (Joo and others, 2010).

Several approaches to handle the dependence in the FCT have been discussed. The GD is most commonly used. Assuming that m-tests follow a multivariate normal distribution, Brown (1975) and Kost and McDermott (2002) considered fitting the FCT with a GD, Inline graphic, where Inline graphic and Inline graphic are the scale and shape parameters, respectively. Yang (2010) compared using the GD, normal approximations, and permutation methods for the FCT with the dependent p-values and showed that the GD approximation and permutation methods overall perform well, the latter of which also approximate the exact distribution of the FCT. Zheng and others (2012) derived an approximate GD for the FCT to test pleiotropic associations.

Since Inline graphic, the two-parameter Inline graphic is more flexible for the FCT when the tests are dependent. In this paper, we consider two generalizations of the GD with a third parameter. One is the generalized GD (GGD) with an extra shape parameter, and the other is the exponentiated GD (EGD) with an extra power parameter. Some properties of using the GD when the GGD or EGD is the true distribution for the FCT are studied, which provide insight into the performance of the FCT. Our results show that both the GGD and EGD have better control of type I error than the GD, especially at extremely small significance levels. Applications of the FCT to genetic pleiotropic associations are described. A simple strategy to apply the GGD and EGD to genome-wide association studies (GWASs) with several mixed types of traits is discussed. Simulations and applications using real data illustrate the performance of the proposed methods.

2. Fisher's combination of p-values

Let Inline graphic test statistics Inline graphic, Inline graphic, be combined. Denote the p-value of Inline graphic as Inline graphic (Inline graphic). The FCT test statistic, Inline graphic, is given by

2. (2.1)

If all the p-values are independent, Inline graphic under the null hypothesis Inline graphic. Let Inline graphic with the density function Inline graphic, where Inline graphic is the gamma function, and the distribution function is Inline graphic. When the p-values are dependent, model Inline graphic with Inline graphic under Inline graphic, where Inline graphic are estimated using the maximum likelihood estimates (MLEs) or the moments estimates (Kost and McDermott, 2002; Zheng and others, 2012). In our applications, all p-values are allowed to be two-sided because all m p-values are obtained from testing m different null hypotheses. The following results, however, are valid when m one-sided p-values are combined using the FCT.

3. Generalizations of the GD

3.1. Applying the GGD

3.1.1. The GGD Inline graphic

Its density function is Inline graphic for Inline graphic, where Inline graphic is the scale parameter and both Inline graphic and Inline graphic are shape parameters (Stacy, 1962). Its inference has been discussed in Stacy and Mihram (1965) and Huang and Hwang (2006). We use the MLEs of Inline graphic and model Inline graphic with Inline graphic.

3.1.2. Fitting the GGD with the GD

Let Inline graphic be a simple random sample with the GGD Inline graphic for some Inline graphic. Denote the sample mean and variance as Inline graphic and Inline graphic, respectively. Then, from Huang and Hwang (2006), as Inline graphic,

3.1.2.

If we fit the sample with the GD Inline graphic, by the moments estimates,

3.1.2. (3.1)

Denote Inline graphic and Inline graphic. Then,

3.1.2. (3.2)

Note that Inline graphic and Inline graphic unless Inline graphic. The following result provides more insight into fitting the GGD with the GD; the proof is given in supplementary material available at Biostatistics online. The monotonicity of Inline graphic in Inline graphic is shown with a condition Inline graphic, which is satisfied in our applications. In the proof, however, we show graphically that the result holds even if Inline graphic.

Theorem 3.1 —

Let Inline graphic and Inline graphic be defined above and Inline graphic be the moments estimates for the GD. Then Inline graphic and Inline graphic are asymptotically uncorrelated. Given Inline graphic, Inline graphic is a decreasing function in k when Inline graphic and Inline graphic is an increasing function in k.

Next, we compute the p-value of a test statistic Inline graphic using the GD when the GGD is the true distribution. Let Inline graphic under Inline graphic for some Inline graphic. If we fit Inline graphic with Inline graphic given in (3.2), then the p-values for Inline graphic using the GGD and GD are given, respectively, by

3.1.2. (3.3)

We assume that all the parameters in the distributions are known for studying analytical behavior of the p-values. In practice, these parameters are estimated using the data. Theorem 3.2 shows that using the GD for Inline graphic with dependent p-values may not be appropriate for extremely small significance levels when the GGD is the true distribution. In particular, when Inline graphic, type I error rate using the GD tends to be much larger than that using the GGD. A proof is given in supplementary material available at Biostatistics online.

Theorem 3.2 —

Let Inline graphic for some known Inline graphic with p-value Inline graphic. If we calculate the p-value Inline graphic with Inline graphic, where Inline graphic are given in (3.2), then Inline graphic if Inline graphic; 1 if Inline graphic; and Inline graphic if Inline graphic.

3.2. Applying the EGD

An alternative approach is the EGD, denoted as Inline graphic with distribution function is given by Inline graphic and a power parameter Inline graphic (de Pascoa and others, 2011). We can find the MLEs for Inline graphic under Inline graphic and model Inline graphic with Inline graphic In hypothesis testing, Inline graphic is also referred to as Lehmann alternatives (Lehmann, 1953). Like the GGD, it is more flexible to approximate Inline graphic with the EGD than with the GD. For Inline graphic and some Inline graphic,

3.2.

where Inline graphic.

Let Inline graphic for some Inline graphic. If we fit the sample with Inline graphic, then by the moments estimates and the same notation as in (3.1) and (3.2), we have

3.2. (3.4)

Note that Inline graphic and Inline graphic unless Inline graphic. Like (3.3), the p-values of observing the FCT Inline graphic using the EGD and the GD can be written, respectively, as

3.2. (3.5)

Although (3.4), similar to (3.2), is obtained, results similar to Theorem 3.1 are not available for the EGD due to the double integrations in the moments. Numerical results presented later and also online show that Inline graphic and Inline graphic when Inline graphic and Inline graphic and Inline graphic when Inline graphic. A result similar to Theorem 3.2 is given below with proof given in supplementary material available at Biostatistics online.

Theorem 3.3 —

Let Inline graphic for some known Inline graphic with p-value Inline graphic. If we calculate p-value Inline graphic with Inline graphic, where Inline graphic are given in (3.4), then Inline graphic if Inline graphic; 1 if Inline graphic; and Inline graphic if Inline graphic.

Since we observe Inline graphic when Inline graphic, Inline graphic tends to inflate type I error rate at extreme tail values when the EGD is the true distribution and Inline graphic.

3.3. Numerical results

We present the numerical values of Inline graphic, Inline graphic, Inline graphic, and Inline graphic given a significance level Inline graphic and Inline graphic of either the GGD or EGD. When calculating Inline graphic and Inline graphic, the value of the test statistic Inline graphic is chosen so that Inline graphic or Inline graphic given Inline graphic. The results are reported in Table 1, which confirm the analytic results, and show that there is more substantial impact on type I error using the GD when the data are drawn from the GGD or EGD with Inline graphic than with Inline graphic. Table 1 also shows that the patterns of Inline graphic and Inline graphic have opposite directions depending on whether or not Inline graphic or Inline graphic. Using the GD for the GGD (or the EGD), results in inflation of type I error at extremely small significance levels when Inline graphic (or when Inline graphic).

Table 1.

Numerical values of the parameters in Inline graphic and Inline graphic given Inline graphic and Inline graphic respectivelyInline graphic and of Inline graphic and Inline graphic given a significance level

Significance level
0.05
0.001
0.0001
Inline graphic Inline graphic k Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.5 0.5 0.5 (4.000, 0.094) (0.549, 0.259) 1.187 0.983 0.644 1.080 0.115 1.211
5.0 (0.045, 8.062) (0.353, 1.967) 1.424 1.009 14.35 0.562 79.67 0.324
1.0 1.0 0.5 (10.00, 0.200) (1.158, 0.530) 1.229 0.981 0.462 1.175 0.075 1.413
5.0 (0.049, 19.07) (0.639, 3.572) 1.394 0.978 10.13 0.433 46.19 0.208
0.5 1.0 0.5 (5.000, 0.200) (0.579, 0.530) 1.229 0.981 0.462 1.177 0.075 1.414
5.0 (0.024, 19.07) (0.320, 3.572) 1.389 0.978 10.44 0.431 44.58 0.211

4. Application to genetic pleiotropic association

We describe how to use the proposed methods with an application to genetic pleiotropic association studies, in which multiple traits are tested for associations with a single marker. When multiple traits measured from the same individuals are available, testing pleiotropic associations is more powerful than testing association with an individual trait (Klei and others, 2008; Zhu and Zhang, 2009). We consider binary and continuous traits here, although other types of traits, e.g. ordinal traits, can also be considered (Zhang and others, 2010). Several approaches have been discussed for genetic pleiotropic associations; combining dependent univariate statistics is a useful approach when the statistics do not follow a multivariate normal distribution (Yang and others, 2010).

4.1. Test statistics for pleiotropic association

Let the alleles of a genetic marker be A and B. Denote the traits as Inline graphic (Inline graphic). Assume that the first U traits are binary and the other Inline graphic traits are continuous. For the ith individual (Inline graphic), denote the data as Inline graphic, where Inline graphic is the disease indicator for the uth binary trait (Inline graphic), Inline graphic is the vth continuous trait value (Inline graphic), and Inline graphic is the 0, 1, or 2 genotype score corresponding to the numbers of allele B in the genotype.

For testing association with the uth binary trait with Inline graphic cases and Inline graphic controls (Inline graphic), the trend test (Sasieni, 1997; Freidlin and others, 2002) can be written as

4.1.

where Inline graphic and Inline graphic are the counts of Inline graphic in cases and controls, respectively, and Inline graphic (Inline graphic). Under Inline graphic, Inline graphic asymptotically. For testing association with the vth continuous trait, the Inline graphic-test can be written as

4.1.

where Inline graphic, Inline graphic, Inline graphic, Inline graphic is an Inline graphic-dimensional vector with the elements being 1, and Inline graphic. Under Inline graphic, Inline graphic follows an asymptotic Inline graphic-distribution with degrees of freedom Inline graphic and Inline graphic.

Denote the p-values of Inline graphic and Inline graphic as Inline graphic and Inline graphic. All p-values in this application can be either two-sided or one-sided because all p-values are obtained from testing different associations, where each p-value corresponds to one trait. Then the FCT is Inline graphic. Under Inline graphic, we assume Inline graphic using the GGD or Inline graphic using the EGD, where Inline graphic are estimated separately for the two distributions.

4.2. Estimation procedures

4.2.1. For a single marker

The simulation procedure given in Zheng and others (2012) is modified below to estimate Inline graphic for both the GGD and EGD for a single marker. First, we simulate data under Inline graphic from the observed data. Then, we assume that either the GGD or EGD is the true distribution, so we can find the MLEs of the parameters for one of the two distributions. In the simulation, the correlation structure among the traits is retained but the associations between the traits and the marker are all removed. Denote the density function for either the GGD or EGD as Inline graphic.

In the Inline graphicth replicate (Inline graphic for some large Inline graphic), one keeps the observed traits Inline graphic for the ith individual (Inline graphic), but simulates his/her genotype Inline graphic from a multinomial distribution Inline graphic, where Inline graphic for Inline graphic. Then Inline graphic is computed using Inline graphic, denoted as Inline graphic (Inline graphic). The empirical likelihood function with Inline graphic replicates is Inline graphic. Then find the MLEs Inline graphic numerically. The MLEs depend on both the minor allele frequency (MAF) of the marker and the correlations among the traits. However, in our simulation, the traits are fixed. Thus, conditional on the traits (and their correlations), the MLEs for any two markers with different MAFs only depend on the MAFs. The above simulation does not require Hardy–Weinberg equilibrium (HWE). With HWE, one can generate Inline graphic where p is the estimated MAF of the marker.

Like a permutation test, the above simulation procedure can be used to find an approximate exact distribution for the FCT because Inline graphic forms an empirical distribution of the FCT when Inline graphic is large.

4.2.2. For a GWAS

In a GWAS, hundreds of thousands to millions of markers with different MAFs are tested. Although an exact FCT method described in Section 4.2.1 can achieve similar performance to the GGD or EGD, it is not computationally efficient for a GWAS as a set of simulation with large number of replicates has to be done for each marker with a typical genome-wide significance level Inline graphic. We propose a computationally efficient method based on our estimation procedure in Section 4.2.1 for applying the GGD and EGD to GWASs. The method is given below with technical details and justifications given in supplementary material available at Biostatistics online (Section 5.2 and Tables S2–S3).

For the GGD, we first find the MLEs Inline graphic for each MAF from 0.1 to 0.5 with an increment of 0.05. Then we take the average of the MLEs over the nine MAFs as the final estimates, denoted as Inline graphic. For the EGD, we first obtain Inline graphic and Inline graphic similar to the GGD, but we take Inline graphic as the maximum of all Inline graphic across the nine MAFs. Then we apply Inline graphic as the final estimates.

4.3. Real applications

4.3.1. Candidate genes

We consider the genetic data of the Genetic Analysis Workshop 16 (GAW16), which consist of 868 rheumatoid arthritis (RA) positive cases and 1194 controls, and 531 689 single-nucleotide polymorphisms (SNPs) (Amos and others, 2009). In addition to the RA status, two quantitative traits (anti-cyclic citrullinated peptide (anti-CCP) antibody and rheumatoid factor IgM) are also available only for the individuals with RA. It is known that anti-CCP and IgM levels are higher among individuals with RA than those without (Huizinga and others, 2005). For this application, we first consider testing association with anti-CCP and IgM. Because the controls have no measures of anti-CCP and IgM, as Zheng and others (2012) did for anti-CCP, we impute the unmeasured values of both traits of the controls by the respective minimum observed trait values of the cases and apply the Inline graphic-test to each trait as if the imputed values were observed. Then the FCT is applied to testing the pleiotropic association. The null hypothesis is that an SNP is not associated with either trait. Zheng and others (2012) showed that this imputation has no impact under the null hypothesis. Here, we focus on the testing problem; if we consider this an estimation problem, then replacing a large amount of the data (controls) with a single value will likely produce biased estimations. All values (observed or imputed) are log-transformed. We focus on the SNPs listed in Zheng and others (2012, Table VII), but the overall significance level for each SNP test is Inline graphic. The results are reported here in Table 2.

Table 2.

Estimates for the parameters of GDInline graphic GGDInline graphic and EGD and their p-values for pleiotropic associations with anti-CCP and IgM Inline graphic with imputations of the missing quantitative traits in the controls using the FCT

GD
GGD
EGD
SNP Inline graphic AIC/BIC p-value Inline graphic AIC/BIC p-value Inline graphic AIC/BIC p-value
rs653667 (2.801, 1.433) 32.89/55.95 1.53EInline graphic06 (0.331, 4.142, 0.599) 30.45/65.04 2.24EInline graphic05 (3.967, 0.128, 15.161) 32.14/66.74 7.72EInline graphic06
rs2454170 (2.783, 1.434) 37.57/60.64 1.45EInline graphic07 (0.353, 4.047, 0.606) 33.56/68.16 4.84EInline graphic06 (3.895, 0.141, 13.766) 36.02/70.62 1.10EInline graphic06
rs4375229 (2.771, 1.447) 54.82/77.88 2.59EInline graphic11 (0.366, 4.014, 0.611) 43.67/78.26 3.48EInline graphic08 (4.002, 0.078, 25.269) 48.59/83.19 2.14EInline graphic09
rs12743229 (2.767, 1.443) 33.67/56.74 1.02EInline graphic06 (0.317, 4.218, 0.597) 31.01/65.61 1.68EInline graphic05 (3.855, 0.144, 13.716) 33.02/67.62 4.86EInline graphic06
rs4567320 (2.799, 1.429) 16.40/39.46 6.03EInline graphic03 (0.336, 4.116, 0.601) 18.28/52.88 7.79EInline graphic03 (3.933, 0.135, 14.363) 18.46/53.06 6.61EInline graphic03
rs11264329 (2.774, 1.441) 9.10/32.17 2.56EInline graphic01 (0.382, 3.932, 0.614) 11.14/45.74 2.41EInline graphic01 (4.021, 0.094, 20.658) 11.11/45.70 2.41EInline graphic01
rs429201 (2.786, 1.440) 28.62/51.69 1.29EInline graphic05 (0.387, 3.942, 0.617) 27.81/62.41 7.74EInline graphic05 (3.937, 0.118, 16.647) 28.82/63.42 3.98EInline graphic05
rs10917678 (2.758, 1.446) 24.81/47.88 8.65EInline graphic05 (0.322, 4.194, 0.598) 24.92/59.51 3.18EInline graphic04 (3.896, 0.130, 15.124) 25.70/60.30 1.85EInline graphic04
rs1323120 (2.790, 1.437) 52.22/75.29 9.56EInline graphic11 (0.346, 4.081, 0.604) 42.06/76.65 7.80EInline graphic08 (4.013, 0.094, 20.659) 46.70/81.29 5.54EInline graphic09
rs17505650 (2.802, 1.431) 31.64/54.71 2.86EInline graphic06 (0.341, 4.100, 0.602) 29.69/64.29 3.20EInline graphic05 (4.004, 0.100, 19.516) 31.12/65.72 1.29EInline graphic05
rs2985441 (2.774, 1.442) 58.02/81.09 5.22EInline graphic12 (0.379, 3.960, 0.614) 45.55/80.15 1.37EInline graphic08 (3.943, 0.119, 16.343) 51.31/85.91 5.45EInline graphic10
rs1923946 (2.749, 1.451) 15.31/38.37 1.03EInline graphic02 (0.354, 4.063, 0.609) 17.32/51.91 1.21EInline graphic02 (3.917, 0.112, 17.603) 17.44/52.04 1.07EInline graphic02
rs2802822 (2.780, 1.437) 11.97/35.04 5.69EInline graphic02 (0.381, 3.944, 0.614) 14.18/48.78 5.50EInline graphic02 (3.942, 0.127, 15.284) 14.20/48.80 5.27EInline graphic02
rs1567602 (2.793, 1.433) 40.91/63.98 2.74EInline graphic08 (0.295, 4.307, 0.588) 35.33/69.93 2.11EInline graphic06 (3.998, 0.105, 18.547) 38.23/72.83 3.76EInline graphic07

We present the moments estimates for the GD and the MLEs for the GGD and EGD (based on Section 4.2.1) along with common model selection criteria: Akaike information criterion (AIC) and Bayesian information criterion (BIC). The results show that the estimates for the GD and GGD are consistent across the SNPs. For the EGD, only the estimates for Inline graphic are consistent but not those for k. Using AIC/BIC, the GGD and/or EGD are often a better fit for the FCT than the GD. When AIC/BIC values are similar among the three distributions, they also have similar p-values. However, when the GD is a worse fit, its p-values are more significant. This indicates that we need to be cautious when using the GD for combining dependent statistics in the FCT.

For a comparison, Table S4 in supplementary material available at Biostatistics online shows results analogous to Table 2 except that there is no imputation, so controls are not used in the Inline graphic-test. In this case, the two tests seem to be independent and the GD is the better model to use, although the p-values using all three distributions are similar. However, the p-values without imputations tend to be less significant than those with imputations because the former ignores the potential associations of these SNPs with RA. For another comparison, in supplementary material available at Biostatistics online, we report Table S5 for pleiotropic association with three traits (RA, anti-CCP, and IgM) without imputations. The three tests seem to be independent and result in similar p-values although the GD is a better fit for the FCT. When comparing the p-values using the GGD and EGD in Table 2 with those in Table S5, we note that p-values in Table 2 are often more significant, which indicates the benefit of the imputation as a way of testing pleiotropic association compared with testing all three traits.

4.3.2. GWAS application

We used the same dataset as in the previous application except that we analyzed all the SNPs from chromosomes 5, 6, and 10 after standard GWAS quality control procedures, where chromosome 6 also contains the well-known HLA region associated with RA. We applied our genome-wide methods in Section 4.2.2 and plotted the p-values using the GD, EGD, and GGD in Figure S2 in supplementary material available at Biostatistics online. Except for the HLA region, for which the GD, EGD, and GGD all have peaks, there are no significant pleiotropic associations at the GWAS level Inline graphic using the EGD and GGD. But there is a significant association on chromosome 5 and some marginally significant associations on chromosome 10 using the GD. For example, SNP rs6596147 on chromosome 5 has p-values Inline graphic (GD), Inline graphic (EGD), and Inline graphic (GGD). Based on the analytical and simulation results (presented next), one should be cautious when reporting this association based on the GD alone.

5. Simulations

We present part of the simulation results here and the rest online with a summary here. Based on our application, we simulate m-test statistics Inline graphic, for Inline graphic, from a multivariate normal distribution with mean zeros, unit variances, and a set of pair-wise positive correlations Inline graphic among the tests. We also added results for Inline graphic test statistics for some simulations. The GD, GGD, and EGD are, respectively, used for the FCT given in (2.1). In Table 3, we report the moments estimates Inline graphic for the GD and the MLEs Inline graphic for both the EGD and GGD with 100 000 samples of Inline graphic given Inline graphic and Inline graphic. Results with Inline graphic, 10 are reported online (Tables S6–S7 of supplementary material available at Biostatistics online). The results show that Inline graphic (Inline graphic, Inline graphic) using either distribution when Inline graphic (Inline graphic), in which case Inline graphic so that all three distributions can be used. The results are consistent with the analytical results and show that the distributions are influenced by Inline graphic.

Table 3.

The moments estimates Inline graphic for GD and the MLEs Inline graphic for EGD and GGD given Inline graphic and m

m Inline graphic GD GGD EGD
2 0 (2.011, 1.987) (1.951, 2.029, 0.990) (2.050, 1.889, 1.062)
0.3 (2.163, 1.848) (1.529, 2.315, 0.897) (2.501, 1.178, 1.712)
0.5 (2.453, 1.628) (0.791, 3.101, 0.732) (3.359, 0.392, 5.190)
0.7 (2.955, 1.354) (0.222, 4.583, 0.554) (4.127, 0.082, 22.58)
0.9 (3.603, 1.109) (0.127, 4.772, 0.482) (4.744, 0.065, 22.59)
3 Inline graphic (2.009, 2.981) (1.876, 3.106, 0.979) (2.057, 2.832, 1.064)
Inline graphic (2.224, 2.691) (1.076, 4.006, 0.823) (2.760, 1.512, 2.036)
Inline graphic (2.828, 2.110) (0.337, 5.623, 0.619) (4.340, 0.336, 8.492)
Inline graphic (3.171, 1.885) (0.097, 7.664, 0.509) (4.765, 0.170, 16.37)
Inline graphic (3.122, 1.919) (0.106, 7.556, 0.515) (4.649, 0.231, 11.94)
Inline graphic (3.176, 1.880) (0.176, 6.565, 0.550) (4.830, 0.153, 17.97)

We also compare their empirical type I error rates with 1 million replicates and two nominal levels (0.05 and 0.0001) using the estimates in Table 3. The results are reported in Table 4 for Inline graphic, which show that all type I error rates are close to the nominal level when Inline graphic. When Inline graphic, using the GD inflates type I error rates while using the GGD tends to be conservative, but the type I error rates of the EGD are still close to the nominal level. Q–Q plots of the p-values in Inline graphic scale using the GD, GGD, and EGD are given in Figure 1 for Inline graphic. Q–Q plots with Inline graphic, Inline graphic and Inline graphic are reported online (Figures S3–S5 of supplementary material available at Biostatistics online). To summarize, the size of the FCT using the GGD is more conservative with Inline graphic than with Inline graphic and Inline graphic. The EGD has overall better size than the GGD. The GD has inflated type I error rates with a small significance level when dependent statistics are combined. When Inline graphic, the performance of the EGD and GGD are less reliable than when Inline graphic. More discussion on Inline graphic can be found online.

Table 4.

Empirical type I error rates with Inline graphic Inline graphic and with Inline graphic Inline graphic using GDInline graphic EGDInline graphic and GGD given Inline graphic when Inline graphic

Inline graphic
Inline graphic
Inline graphic GD GGD EGD GD GGD EGD
Inline graphic 4.99 4.98 4.98 0.90 0.81 0.83
Inline graphic 5.03 5.01 5.01 1.88 1.11 1.05
Inline graphic 5.19 5.20 5.20 2.42 0.60 0.63
Inline graphic 5.10 5.15 5.11 2.96 0.58 1.17
Inline graphic 5.17 5.11 5.51 2.92 0.54 1.17
Inline graphic 5.11 5.18 5.47 2.77 0.57 0.99

Fig. 1.

Fig. 1.

Q–Q plots of p-values in Inline graphic scale using the GD, GGD, and EGD with the correlations given in Table 3 and Inline graphic. The order of the plots from the top to the bottom is the same as that in the rows in Table 3 (with Inline graphic).

The simulations for our GWAS strategies in Section 4.2.2 and the results are reported online. The results show that, when Inline graphic, using the EGD and GGD and the GWAS strategies has better control of type I error rates than using the GD, except for Inline graphic, under which type I error rates are similar using all three distributions. Simulation results with negative correlations can be found online.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

The GAW16 data were gathered with the support of grants from the National Institutes of Health (NO1-AR-2-2263 and RO1-AR-44422, Peter K. Gregersen, PI), and the National Arthritis Foundation. The use of the data was approved by GAW16. Research of Qizhai Li is partially supported by National Nature Science Foundation of China, Nos 61134013, 11371353.

Supplementary Material

Supplementary Data

Acknowledgements

We thank two reviewers for their careful reading and thoughtful comments, which greatly improved our presentation. We would like to thank Neal Jeffries for his careful reading and editing our manuscript. Conflict of Interest: None declared.

References

  1. Amos C. I., Chen W. V., Seldin M. F., Remmers E. F., Taylor K. E., Criswell L. A., Lee A. T., Plenge R. M., Kastner D. L., Gregersen P. K. Data for genetic analysis workshop 16 problem 1, association analysis of rheumatoid arthritis data. BMC Proceedings. 2009;3(Suppl7):S2. doi: 10.1186/1753-6561-3-s7-s2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Brown M. B. A method for combining non-independent, one-sided tests of significance. Biometrics. 1975;31:987–992. [Google Scholar]
  3. de Pascoa M. A. R., Marcelino A. R., Ortega E. M. M., Cordeiro G. M. The Kumaraswamy generalized gamma distribution with application in survival analysis. Statistical Methodology. 2011;8:411–433. [Google Scholar]
  4. Fisher R. A. Statistical Methods for Research Workers. 4th edition. London: Oliver and Boyd; 1932. [Google Scholar]
  5. Freidlin B., Zheng G., Li Z., Gastwirth J. L. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Human Heredity. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
  6. Hess A., Iyer H. Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics. 2007;8:96. doi: 10.1186/1471-2164-8-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Huizinga T. W., Amos C. I., van der Helm-van Mil A. H., Chen W., van Gaalen F. A., Jawaheer D., Schreuder G. M., Wener M., Breedveld F. C., Ahmad N. Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins. Arthritis and Rheumatism. 2005;52:3433–3438. doi: 10.1002/art.21385. and others. [DOI] [PubMed] [Google Scholar]
  8. Huang P., Hwang T. On new moment estimation of parameters of the generalized Gamma distribution using its characterization. The Taiwanese Journal of Mathematics. 2006;10:1083–1093. [Google Scholar]
  9. Infante-Rivard C., Mirea L., Bull S. B. Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. The American Journal of Epidemiology. 2009;170:657–664. doi: 10.1093/aje/kwp180. [DOI] [PubMed] [Google Scholar]
  10. Joo J., Kwak M., Chen Z., Zheng G. Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty. Statistics in Medicine. 2010;29:158–180. doi: 10.1002/sim.3759. [DOI] [PubMed] [Google Scholar]
  11. Klei L., Luca D., Devlin B., Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genetic Epidemiology. 2008;32:9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]
  12. Kost J. T., McDermott M. P. Combining dependent p-values. Statistics & Probability Letters. 2002;60:183–190. [Google Scholar]
  13. Lehmann E. L. The power of rank tests. The Annals of Mathematical Statistics. 1953;24:28–43. [Google Scholar]
  14. Little R. C., Folks J. L. Asymptotic optimality of Fisher's method of combining independent tests. Journal of the American Statistical Association. 1971;66:802–806. [Google Scholar]
  15. Pfeiffer R. M., Gail M. H., Pee D. On combining data from genome-wide association studies to discover disease-associated SNPs. Statistical Science. 2009;24:547–560. [Google Scholar]
  16. Sasieni P. D. From genotypes to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
  17. Stacy E. W. A generalization of the gamma distribution. The Annals of Mathematical Statistics. 1962;33:623–659. [Google Scholar]
  18. Stacy E. W., Mihram G. A. Parameter estimation for a generalized gamma distribution. Technometrics. 1965;7:349–358. [Google Scholar]
  19. Yang J. J. Distribution of Fisher's combination statistic when the tests are dependent. Journal of Statistical Computation and Simulation. 2010;80:1–12. [Google Scholar]
  20. Yang Q., Wu H., Guo C. Y., Fox C. S. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic Epidemiology. 2010;34:444–454. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhang H. P., Liu C.-T., Wang X. An association test for multiple traits based on the generalized Kendalls tau. Journal of the American Statistical Association. 2010;105:473–481. doi: 10.1198/jasa.2009.ap08387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Zheng G., Wu C. O., Kwak M., Jiang W., Joo J., Lima J. A. C. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genetic Epidemiology. 2012;36:263–273. doi: 10.1002/gepi.21619. [DOI] [PubMed] [Google Scholar]
  23. Zhu W., Zhang H. P. Why do we test multiple traits in genetic association studies?: (with discussion), The Journal of the Korean Statistical Society. 2009;38:1–10. doi: 10.1016/j.jkss.2008.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES