Abstract
A classical approach to combine independent test statistics is Fisher's combination of
-values, which follows the
distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.
Keywords: Dependent tests, Fisher's combination, Gamma distributions, Genetic pleiotropic associations, Genome-wide association studies, Type I error
1. Introduction
Combining independent test statistics is common in biomedical research. One approach is to combine the p-values of one-sided tests using Fisher's method (Fisher, 1932), referred to here as the Fisher's combination test (FCT). It has optimal Bahadur efficiency (Little and Folks, 1971). However, in general, it has a disadvantage in the interpretation of the results for combining two-sided tests. If one combines m independent p-values, the null distribution of the FCT is the
distribution with 2m degrees of freedom
. If m-test statistics are dependent, the null distribution of the FCT is no longer
.
Most research on the FCT has focused on combining independent p-values and comparisons between the FCT and alternative approaches. In many genetic studies, test statistics are often dependent. In the analysis of Affymetrix expression array data, the FCT was employed to combine probe level two-sample t-tests, where the probes of the same probe set are correlated (Hess and Iyer, 2007). In genetic pleiotropic association studies, the association is tested between multiple traits collected from the same subjects and a marker. The FCT was used to combine the dependent single-trait analyses and was approximated using a gamma distribution (GD) (Zheng and others, 2012). Combining case–control and family-based designs or several genetic association studies can improve power to detect disease-associated markers (Infante-Rivard and others, 2009; Pfeiffer and others, 2009). When the two study designs share the cases or multiple association studies share controls, the dependent test statistics can be combined using the FCT. In genetic linkage and association studies under model uncertainty, multiple correlated tests are obtained under various genetic models (Joo and others, 2010).
Several approaches to handle the dependence in the FCT have been discussed. The GD is most commonly used. Assuming that m-tests follow a multivariate normal distribution, Brown (1975) and Kost and McDermott (2002) considered fitting the FCT with a GD,
, where
and
are the scale and shape parameters, respectively. Yang (2010) compared using the GD, normal approximations, and permutation methods for the FCT with the dependent p-values and showed that the GD approximation and permutation methods overall perform well, the latter of which also approximate the exact distribution of the FCT. Zheng and others (2012) derived an approximate GD for the FCT to test pleiotropic associations.
Since
, the two-parameter
is more flexible for the FCT when the tests are dependent. In this paper, we consider two generalizations of the GD with a third parameter. One is the generalized GD (GGD) with an extra shape parameter, and the other is the exponentiated GD (EGD) with an extra power parameter. Some properties of using the GD when the GGD or EGD is the true distribution for the FCT are studied, which provide insight into the performance of the FCT. Our results show that both the GGD and EGD have better control of type I error than the GD, especially at extremely small significance levels. Applications of the FCT to genetic pleiotropic associations are described. A simple strategy to apply the GGD and EGD to genome-wide association studies (GWASs) with several mixed types of traits is discussed. Simulations and applications using real data illustrate the performance of the proposed methods.
2. Fisher's combination of p-values
Let
test statistics
,
, be combined. Denote the p-value of
as
(
). The FCT test statistic,
, is given by
![]() |
(2.1) |
If all the p-values are independent,
under the null hypothesis
. Let
with the density function
, where
is the gamma function, and the distribution function is
. When the p-values are dependent, model
with
under
, where
are estimated using the maximum likelihood estimates (MLEs) or the moments estimates (Kost and McDermott, 2002; Zheng and others, 2012). In our applications, all p-values are allowed to be two-sided because all m
p-values are obtained from testing m different null hypotheses. The following results, however, are valid when m one-sided p-values are combined using the FCT.
3. Generalizations of the GD
3.1. Applying the GGD
3.1.1. The GGD
Its density function is
for
, where
is the scale parameter and both
and
are shape parameters (Stacy, 1962). Its inference has been discussed in Stacy and Mihram (1965) and Huang and Hwang (2006). We use the MLEs of
and model
with
.
3.1.2. Fitting the GGD with the GD
Let
be a simple random sample with the GGD
for some
. Denote the sample mean and variance as
and
, respectively. Then, from Huang and Hwang (2006), as
,
![]() |
If we fit the sample with the GD
, by the moments estimates,
![]() |
(3.1) |
Denote
and
. Then,
![]() |
(3.2) |
Note that
and
unless
. The following result provides more insight into fitting the GGD with the GD; the proof is given in supplementary material available at Biostatistics online. The monotonicity of
in
is shown with a condition
, which is satisfied in our applications. In the proof, however, we show graphically that the result holds even if
.
Theorem 3.1 —
Let
and
be defined above and
be the moments estimates for the GD. Then
and
are asymptotically uncorrelated. Given
,
is a decreasing function in k when
and
is an increasing function in k.
Next, we compute the p-value of a test statistic
using the GD when the GGD is the true distribution. Let
under
for some
. If we fit
with
given in (3.2), then the p-values for
using the GGD and GD are given, respectively, by
![]() |
(3.3) |
We assume that all the parameters in the distributions are known for studying analytical behavior of the p-values. In practice, these parameters are estimated using the data. Theorem 3.2 shows that using the GD for
with dependent p-values may not be appropriate for extremely small significance levels when the GGD is the true distribution. In particular, when
, type I error rate using the GD tends to be much larger than that using the GGD. A proof is given in supplementary material available at Biostatistics online.
Theorem 3.2 —
Let
for some known
with p-value
. If we calculate the p-value
with
, where
are given in (3.2), then
if
; 1 if
; and
if
.
3.2. Applying the EGD
An alternative approach is the EGD, denoted as
with distribution function is given by
and a power parameter
(de Pascoa and others, 2011). We can find the MLEs for
under
and model
with
In hypothesis testing,
is also referred to as Lehmann alternatives (Lehmann, 1953). Like the GGD, it is more flexible to approximate
with the EGD than with the GD. For
and some
,
![]() |
where
.
Let
for some
. If we fit the sample with
, then by the moments estimates and the same notation as in (3.1) and (3.2), we have
![]() |
(3.4) |
Note that
and
unless
. Like (3.3), the p-values of observing the FCT
using the EGD and the GD can be written, respectively, as
![]() |
(3.5) |
Although (3.4), similar to (3.2), is obtained, results similar to Theorem 3.1 are not available for the EGD due to the double integrations in the moments. Numerical results presented later and also online show that
and
when
and
and
when
. A result similar to Theorem 3.2 is given below with proof given in supplementary material available at Biostatistics online.
Theorem 3.3 —
Let
for some known
with p-value
. If we calculate p-value
with
, where
are given in (3.4), then
if
; 1 if
; and
if
.
Since we observe
when
,
tends to inflate type I error rate at extreme tail values when the EGD is the true distribution and
.
3.3. Numerical results
We present the numerical values of
,
,
, and
given a significance level
and
of either the GGD or EGD. When calculating
and
, the value of the test statistic
is chosen so that
or
given
. The results are reported in Table 1, which confirm the analytic results, and show that there is more substantial impact on type I error using the GD when the data are drawn from the GGD or EGD with
than with
. Table 1 also shows that the patterns of
and
have opposite directions depending on whether or not
or
. Using the GD for the GGD (or the EGD), results in inflation of type I error at extremely small significance levels when
(or when
).
Table 1.
Numerical values of the parameters in
and
given
and
respectively
and of
and
given a significance level
| Significance level |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0.05 |
0.001 |
0.0001 |
||||||||
![]() |
![]() |
k | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
| 0.5 | 0.5 | 0.5 | (4.000, 0.094) | (0.549, 0.259) | 1.187 | 0.983 | 0.644 | 1.080 | 0.115 | 1.211 |
| 5.0 | (0.045, 8.062) | (0.353, 1.967) | 1.424 | 1.009 | 14.35 | 0.562 | 79.67 | 0.324 | ||
| 1.0 | 1.0 | 0.5 | (10.00, 0.200) | (1.158, 0.530) | 1.229 | 0.981 | 0.462 | 1.175 | 0.075 | 1.413 |
| 5.0 | (0.049, 19.07) | (0.639, 3.572) | 1.394 | 0.978 | 10.13 | 0.433 | 46.19 | 0.208 | ||
| 0.5 | 1.0 | 0.5 | (5.000, 0.200) | (0.579, 0.530) | 1.229 | 0.981 | 0.462 | 1.177 | 0.075 | 1.414 |
| 5.0 | (0.024, 19.07) | (0.320, 3.572) | 1.389 | 0.978 | 10.44 | 0.431 | 44.58 | 0.211 | ||
4. Application to genetic pleiotropic association
We describe how to use the proposed methods with an application to genetic pleiotropic association studies, in which multiple traits are tested for associations with a single marker. When multiple traits measured from the same individuals are available, testing pleiotropic associations is more powerful than testing association with an individual trait (Klei and others, 2008; Zhu and Zhang, 2009). We consider binary and continuous traits here, although other types of traits, e.g. ordinal traits, can also be considered (Zhang and others, 2010). Several approaches have been discussed for genetic pleiotropic associations; combining dependent univariate statistics is a useful approach when the statistics do not follow a multivariate normal distribution (Yang and others, 2010).
4.1. Test statistics for pleiotropic association
Let the alleles of a genetic marker be A and B. Denote the traits as
(
). Assume that the first U traits are binary and the other
traits are continuous. For the ith individual (
), denote the data as
, where
is the disease indicator for the uth binary trait (
),
is the vth continuous trait value (
), and
is the 0, 1, or 2 genotype score corresponding to the numbers of allele B in the genotype.
For testing association with the uth binary trait with
cases and
controls (
), the trend test (Sasieni, 1997; Freidlin and others, 2002) can be written as
![]() |
where
and
are the counts of
in cases and controls, respectively, and
(
). Under
,
asymptotically. For testing association with the vth continuous trait, the
-test can be written as
![]() |
where
,
,
,
is an
-dimensional vector with the elements being 1, and
. Under
,
follows an asymptotic
-distribution with degrees of freedom
and
.
Denote the p-values of
and
as
and
. All p-values in this application can be either two-sided or one-sided because all p-values are obtained from testing different associations, where each p-value corresponds to one trait. Then the FCT is
. Under
, we assume
using the GGD or
using the EGD, where
are estimated separately for the two distributions.
4.2. Estimation procedures
4.2.1. For a single marker
The simulation procedure given in Zheng and others (2012) is modified below to estimate
for both the GGD and EGD for a single marker. First, we simulate data under
from the observed data. Then, we assume that either the GGD or EGD is the true distribution, so we can find the MLEs of the parameters for one of the two distributions. In the simulation, the correlation structure among the traits is retained but the associations between the traits and the marker are all removed. Denote the density function for either the GGD or EGD as
.
In the
th replicate (
for some large
), one keeps the observed traits
for the ith individual (
), but simulates his/her genotype
from a multinomial distribution
, where
for
. Then
is computed using
, denoted as
(
). The empirical likelihood function with
replicates is
. Then find the MLEs
numerically. The MLEs depend on both the minor allele frequency (MAF) of the marker and the correlations among the traits. However, in our simulation, the traits are fixed. Thus, conditional on the traits (and their correlations), the MLEs for any two markers with different MAFs only depend on the MAFs. The above simulation does not require Hardy–Weinberg equilibrium (HWE). With HWE, one can generate
where p is the estimated MAF of the marker.
Like a permutation test, the above simulation procedure can be used to find an approximate exact distribution for the FCT because
forms an empirical distribution of the FCT when
is large.
4.2.2. For a GWAS
In a GWAS, hundreds of thousands to millions of markers with different MAFs are tested. Although an exact FCT method described in Section 4.2.1 can achieve similar performance to the GGD or EGD, it is not computationally efficient for a GWAS as a set of simulation with large number of replicates has to be done for each marker with a typical genome-wide significance level
. We propose a computationally efficient method based on our estimation procedure in Section 4.2.1 for applying the GGD and EGD to GWASs. The method is given below with technical details and justifications given in supplementary material available at Biostatistics online (Section 5.2 and Tables S2–S3).
For the GGD, we first find the MLEs
for each MAF from 0.1 to 0.5 with an increment of 0.05. Then we take the average of the MLEs over the nine MAFs as the final estimates, denoted as
. For the EGD, we first obtain
and
similar to the GGD, but we take
as the maximum of all
across the nine MAFs. Then we apply
as the final estimates.
4.3. Real applications
4.3.1. Candidate genes
We consider the genetic data of the Genetic Analysis Workshop 16 (GAW16), which consist of 868 rheumatoid arthritis (RA) positive cases and 1194 controls, and 531 689 single-nucleotide polymorphisms (SNPs) (Amos and others, 2009). In addition to the RA status, two quantitative traits (anti-cyclic citrullinated peptide (anti-CCP) antibody and rheumatoid factor IgM) are also available only for the individuals with RA. It is known that anti-CCP and IgM levels are higher among individuals with RA than those without (Huizinga and others, 2005). For this application, we first consider testing association with anti-CCP and IgM. Because the controls have no measures of anti-CCP and IgM, as Zheng and others (2012) did for anti-CCP, we impute the unmeasured values of both traits of the controls by the respective minimum observed trait values of the cases and apply the
-test to each trait as if the imputed values were observed. Then the FCT is applied to testing the pleiotropic association. The null hypothesis is that an SNP is not associated with either trait. Zheng and others (2012) showed that this imputation has no impact under the null hypothesis. Here, we focus on the testing problem; if we consider this an estimation problem, then replacing a large amount of the data (controls) with a single value will likely produce biased estimations. All values (observed or imputed) are log-transformed. We focus on the SNPs listed in Zheng and others (2012, Table VII), but the overall significance level for each SNP test is
. The results are reported here in Table 2.
Table 2.
Estimates for the parameters of GD
GGD
and EGD and their p-values for pleiotropic associations with anti-CCP and IgM
with imputations of the missing quantitative traits in the controls using the FCT
| GD |
GGD |
EGD |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| SNP | ![]() |
AIC/BIC | p-value | ![]() |
AIC/BIC | p-value | ![]() |
AIC/BIC | p-value |
| rs653667 | (2.801, 1.433) | 32.89/55.95 | 1.53E 06 |
(0.331, 4.142, 0.599) | 30.45/65.04 | 2.24E 05 |
(3.967, 0.128, 15.161) | 32.14/66.74 | 7.72E 06 |
| rs2454170 | (2.783, 1.434) | 37.57/60.64 | 1.45E 07 |
(0.353, 4.047, 0.606) | 33.56/68.16 | 4.84E 06 |
(3.895, 0.141, 13.766) | 36.02/70.62 | 1.10E 06 |
| rs4375229 | (2.771, 1.447) | 54.82/77.88 | 2.59E 11 |
(0.366, 4.014, 0.611) | 43.67/78.26 | 3.48E 08 |
(4.002, 0.078, 25.269) | 48.59/83.19 | 2.14E 09 |
| rs12743229 | (2.767, 1.443) | 33.67/56.74 | 1.02E 06 |
(0.317, 4.218, 0.597) | 31.01/65.61 | 1.68E 05 |
(3.855, 0.144, 13.716) | 33.02/67.62 | 4.86E 06 |
| rs4567320 | (2.799, 1.429) | 16.40/39.46 | 6.03E 03 |
(0.336, 4.116, 0.601) | 18.28/52.88 | 7.79E 03 |
(3.933, 0.135, 14.363) | 18.46/53.06 | 6.61E 03 |
| rs11264329 | (2.774, 1.441) | 9.10/32.17 | 2.56E 01 |
(0.382, 3.932, 0.614) | 11.14/45.74 | 2.41E 01 |
(4.021, 0.094, 20.658) | 11.11/45.70 | 2.41E 01 |
| rs429201 | (2.786, 1.440) | 28.62/51.69 | 1.29E 05 |
(0.387, 3.942, 0.617) | 27.81/62.41 | 7.74E 05 |
(3.937, 0.118, 16.647) | 28.82/63.42 | 3.98E 05 |
| rs10917678 | (2.758, 1.446) | 24.81/47.88 | 8.65E 05 |
(0.322, 4.194, 0.598) | 24.92/59.51 | 3.18E 04 |
(3.896, 0.130, 15.124) | 25.70/60.30 | 1.85E 04 |
| rs1323120 | (2.790, 1.437) | 52.22/75.29 | 9.56E 11 |
(0.346, 4.081, 0.604) | 42.06/76.65 | 7.80E 08 |
(4.013, 0.094, 20.659) | 46.70/81.29 | 5.54E 09 |
| rs17505650 | (2.802, 1.431) | 31.64/54.71 | 2.86E 06 |
(0.341, 4.100, 0.602) | 29.69/64.29 | 3.20E 05 |
(4.004, 0.100, 19.516) | 31.12/65.72 | 1.29E 05 |
| rs2985441 | (2.774, 1.442) | 58.02/81.09 | 5.22E 12 |
(0.379, 3.960, 0.614) | 45.55/80.15 | 1.37E 08 |
(3.943, 0.119, 16.343) | 51.31/85.91 | 5.45E 10 |
| rs1923946 | (2.749, 1.451) | 15.31/38.37 | 1.03E 02 |
(0.354, 4.063, 0.609) | 17.32/51.91 | 1.21E 02 |
(3.917, 0.112, 17.603) | 17.44/52.04 | 1.07E 02 |
| rs2802822 | (2.780, 1.437) | 11.97/35.04 | 5.69E 02 |
(0.381, 3.944, 0.614) | 14.18/48.78 | 5.50E 02 |
(3.942, 0.127, 15.284) | 14.20/48.80 | 5.27E 02 |
| rs1567602 | (2.793, 1.433) | 40.91/63.98 | 2.74E 08 |
(0.295, 4.307, 0.588) | 35.33/69.93 | 2.11E 06 |
(3.998, 0.105, 18.547) | 38.23/72.83 | 3.76E 07 |
We present the moments estimates for the GD and the MLEs for the GGD and EGD (based on Section 4.2.1) along with common model selection criteria: Akaike information criterion (AIC) and Bayesian information criterion (BIC). The results show that the estimates for the GD and GGD are consistent across the SNPs. For the EGD, only the estimates for
are consistent but not those for k. Using AIC/BIC, the GGD and/or EGD are often a better fit for the FCT than the GD. When AIC/BIC values are similar among the three distributions, they also have similar p-values. However, when the GD is a worse fit, its p-values are more significant. This indicates that we need to be cautious when using the GD for combining dependent statistics in the FCT.
For a comparison, Table S4 in supplementary material available at Biostatistics online shows results analogous to Table 2 except that there is no imputation, so controls are not used in the
-test. In this case, the two tests seem to be independent and the GD is the better model to use, although the p-values using all three distributions are similar. However, the p-values without imputations tend to be less significant than those with imputations because the former ignores the potential associations of these SNPs with RA. For another comparison, in supplementary material available at Biostatistics online, we report Table S5 for pleiotropic association with three traits (RA, anti-CCP, and IgM) without imputations. The three tests seem to be independent and result in similar p-values although the GD is a better fit for the FCT. When comparing the p-values using the GGD and EGD in Table 2 with those in Table S5, we note that p-values in Table 2 are often more significant, which indicates the benefit of the imputation as a way of testing pleiotropic association compared with testing all three traits.
4.3.2. GWAS application
We used the same dataset as in the previous application except that we analyzed all the SNPs from chromosomes 5, 6, and 10 after standard GWAS quality control procedures, where chromosome 6 also contains the well-known HLA region associated with RA. We applied our genome-wide methods in Section 4.2.2 and plotted the p-values using the GD, EGD, and GGD in Figure S2 in supplementary material available at Biostatistics online. Except for the HLA region, for which the GD, EGD, and GGD all have peaks, there are no significant pleiotropic associations at the GWAS level
using the EGD and GGD. But there is a significant association on chromosome 5 and some marginally significant associations on chromosome 10 using the GD. For example, SNP rs6596147 on chromosome 5 has p-values
(GD),
(EGD), and
(GGD). Based on the analytical and simulation results (presented next), one should be cautious when reporting this association based on the GD alone.
5. Simulations
We present part of the simulation results here and the rest online with a summary here. Based on our application, we simulate m-test statistics
, for
, from a multivariate normal distribution with mean zeros, unit variances, and a set of pair-wise positive correlations
among the tests. We also added results for
test statistics for some simulations. The GD, GGD, and EGD are, respectively, used for the FCT given in (2.1). In Table 3, we report the moments estimates
for the GD and the MLEs
for both the EGD and GGD with 100 000 samples of
given
and
. Results with
, 10 are reported online (Tables S6–S7 of supplementary material available at Biostatistics online). The results show that
(
,
) using either distribution when
(
), in which case
so that all three distributions can be used. The results are consistent with the analytical results and show that the distributions are influenced by
.
Table 3.
The moments estimates
for GD and the MLEs
for EGD and GGD given
and m
| m | ![]() |
GD | GGD | EGD |
|---|---|---|---|---|
| 2 | 0 | (2.011, 1.987) | (1.951, 2.029, 0.990) | (2.050, 1.889, 1.062) |
| 0.3 | (2.163, 1.848) | (1.529, 2.315, 0.897) | (2.501, 1.178, 1.712) | |
| 0.5 | (2.453, 1.628) | (0.791, 3.101, 0.732) | (3.359, 0.392, 5.190) | |
| 0.7 | (2.955, 1.354) | (0.222, 4.583, 0.554) | (4.127, 0.082, 22.58) | |
| 0.9 | (3.603, 1.109) | (0.127, 4.772, 0.482) | (4.744, 0.065, 22.59) | |
| 3 | ![]() |
(2.009, 2.981) | (1.876, 3.106, 0.979) | (2.057, 2.832, 1.064) |
![]() |
(2.224, 2.691) | (1.076, 4.006, 0.823) | (2.760, 1.512, 2.036) | |
![]() |
(2.828, 2.110) | (0.337, 5.623, 0.619) | (4.340, 0.336, 8.492) | |
![]() |
(3.171, 1.885) | (0.097, 7.664, 0.509) | (4.765, 0.170, 16.37) | |
![]() |
(3.122, 1.919) | (0.106, 7.556, 0.515) | (4.649, 0.231, 11.94) | |
![]() |
(3.176, 1.880) | (0.176, 6.565, 0.550) | (4.830, 0.153, 17.97) |
We also compare their empirical type I error rates with 1 million replicates and two nominal levels (0.05 and 0.0001) using the estimates in Table 3. The results are reported in Table 4 for
, which show that all type I error rates are close to the nominal level when
. When
, using the GD inflates type I error rates while using the GGD tends to be conservative, but the type I error rates of the EGD are still close to the nominal level. Q–Q plots of the p-values in
scale using the GD, GGD, and EGD are given in Figure 1 for
. Q–Q plots with
,
and
are reported online (Figures S3–S5 of supplementary material available at Biostatistics online). To summarize, the size of the FCT using the GGD is more conservative with
than with
and
. The EGD has overall better size than the GGD. The GD has inflated type I error rates with a small significance level when dependent statistics are combined. When
, the performance of the EGD and GGD are less reliable than when
. More discussion on
can be found online.
Table 4.
Empirical type I error rates with
and with
using GD
EGD
and GGD given
when 
![]() |
![]() |
|||||
|---|---|---|---|---|---|---|
![]() |
GD | GGD | EGD | GD | GGD | EGD |
![]() |
4.99 | 4.98 | 4.98 | 0.90 | 0.81 | 0.83 |
![]() |
5.03 | 5.01 | 5.01 | 1.88 | 1.11 | 1.05 |
![]() |
5.19 | 5.20 | 5.20 | 2.42 | 0.60 | 0.63 |
![]() |
5.10 | 5.15 | 5.11 | 2.96 | 0.58 | 1.17 |
![]() |
5.17 | 5.11 | 5.51 | 2.92 | 0.54 | 1.17 |
![]() |
5.11 | 5.18 | 5.47 | 2.77 | 0.57 | 0.99 |
Fig. 1.
Q–Q plots of p-values in
scale using the GD, GGD, and EGD with the correlations given in Table 3 and
. The order of the plots from the top to the bottom is the same as that in the rows in Table 3 (with
).
The simulations for our GWAS strategies in Section 4.2.2 and the results are reported online. The results show that, when
, using the EGD and GGD and the GWAS strategies has better control of type I error rates than using the GD, except for
, under which type I error rates are similar using all three distributions. Simulation results with negative correlations can be found online.
Supplementary material
Supplementary material is available at http://biostatistics.oxfordjournals.org.
Funding
The GAW16 data were gathered with the support of grants from the National Institutes of Health (NO1-AR-2-2263 and RO1-AR-44422, Peter K. Gregersen, PI), and the National Arthritis Foundation. The use of the data was approved by GAW16. Research of Qizhai Li is partially supported by National Nature Science Foundation of China, Nos 61134013, 11371353.
Supplementary Material
Acknowledgements
We thank two reviewers for their careful reading and thoughtful comments, which greatly improved our presentation. We would like to thank Neal Jeffries for his careful reading and editing our manuscript. Conflict of Interest: None declared.
References
- Amos C. I., Chen W. V., Seldin M. F., Remmers E. F., Taylor K. E., Criswell L. A., Lee A. T., Plenge R. M., Kastner D. L., Gregersen P. K. Data for genetic analysis workshop 16 problem 1, association analysis of rheumatoid arthritis data. BMC Proceedings. 2009;3(Suppl7):S2. doi: 10.1186/1753-6561-3-s7-s2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown M. B. A method for combining non-independent, one-sided tests of significance. Biometrics. 1975;31:987–992. [Google Scholar]
- de Pascoa M. A. R., Marcelino A. R., Ortega E. M. M., Cordeiro G. M. The Kumaraswamy generalized gamma distribution with application in survival analysis. Statistical Methodology. 2011;8:411–433. [Google Scholar]
- Fisher R. A. Statistical Methods for Research Workers. 4th edition. London: Oliver and Boyd; 1932. [Google Scholar]
- Freidlin B., Zheng G., Li Z., Gastwirth J. L. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Human Heredity. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
- Hess A., Iyer H. Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics. 2007;8:96. doi: 10.1186/1471-2164-8-96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huizinga T. W., Amos C. I., van der Helm-van Mil A. H., Chen W., van Gaalen F. A., Jawaheer D., Schreuder G. M., Wener M., Breedveld F. C., Ahmad N. Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins. Arthritis and Rheumatism. 2005;52:3433–3438. doi: 10.1002/art.21385. and others. [DOI] [PubMed] [Google Scholar]
- Huang P., Hwang T. On new moment estimation of parameters of the generalized Gamma distribution using its characterization. The Taiwanese Journal of Mathematics. 2006;10:1083–1093. [Google Scholar]
- Infante-Rivard C., Mirea L., Bull S. B. Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. The American Journal of Epidemiology. 2009;170:657–664. doi: 10.1093/aje/kwp180. [DOI] [PubMed] [Google Scholar]
- Joo J., Kwak M., Chen Z., Zheng G. Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty. Statistics in Medicine. 2010;29:158–180. doi: 10.1002/sim.3759. [DOI] [PubMed] [Google Scholar]
- Klei L., Luca D., Devlin B., Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genetic Epidemiology. 2008;32:9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]
- Kost J. T., McDermott M. P. Combining dependent p-values. Statistics & Probability Letters. 2002;60:183–190. [Google Scholar]
- Lehmann E. L. The power of rank tests. The Annals of Mathematical Statistics. 1953;24:28–43. [Google Scholar]
- Little R. C., Folks J. L. Asymptotic optimality of Fisher's method of combining independent tests. Journal of the American Statistical Association. 1971;66:802–806. [Google Scholar]
- Pfeiffer R. M., Gail M. H., Pee D. On combining data from genome-wide association studies to discover disease-associated SNPs. Statistical Science. 2009;24:547–560. [Google Scholar]
- Sasieni P. D. From genotypes to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
- Stacy E. W. A generalization of the gamma distribution. The Annals of Mathematical Statistics. 1962;33:623–659. [Google Scholar]
- Stacy E. W., Mihram G. A. Parameter estimation for a generalized gamma distribution. Technometrics. 1965;7:349–358. [Google Scholar]
- Yang J. J. Distribution of Fisher's combination statistic when the tests are dependent. Journal of Statistical Computation and Simulation. 2010;80:1–12. [Google Scholar]
- Yang Q., Wu H., Guo C. Y., Fox C. S. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic Epidemiology. 2010;34:444–454. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H. P., Liu C.-T., Wang X. An association test for multiple traits based on the generalized Kendalls tau. Journal of the American Statistical Association. 2010;105:473–481. doi: 10.1198/jasa.2009.ap08387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng G., Wu C. O., Kwak M., Jiang W., Joo J., Lima J. A. C. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genetic Epidemiology. 2012;36:263–273. doi: 10.1002/gepi.21619. [DOI] [PubMed] [Google Scholar]
- Zhu W., Zhang H. P. Why do we test multiple traits in genetic association studies?: (with discussion), The Journal of the Korean Statistical Society. 2009;38:1–10. doi: 10.1016/j.jkss.2008.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

















































































































