Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2012 Jun 25;13(4):776–790. doi: 10.1093/biostatistics/kxs015

Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Tianxi Cai 1,*, Xihong Lin 2, Raymond J Carroll 3
PMCID: PMC3440238  PMID: 22734045

Abstract

In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429–435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079–1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292–2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50–57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.

Keywords: Adaptive procedures, Empirical Bayes, GWAS, Pathway analysis, Score test, SNP sets

1. Introduction

With rapid advances in high throughput technology, modern genetic studies have provided datasets that can be used to identify genetic variants associated with various diseases such as cancer, auto-immune diseases, cardiovascular diseases, and psychiatric disorders (e.g. Baum and others 2007; Frayling 2007; Hunter and others 2007; Rioux and others 2007; Yeager and others 2007; Sullivan and others 2008; Wallace and others 2008). These studies, while providing valuable resources for investigating the genetic risk of diseases, impose a grand challenge in identifying important genetic variants, due to the large number of genetic markers under investigation.

The standard approach to screening for important genetic markers is based on single-marker marginal analyses, which may suffer from low power and poor reproducibility (Vo and others, 2007). The lack of power is in part attributed to the fact that multiple genetic markers may relate to the phenotype simultaneously, and most have weak or no effects. To overcome such difficulties, biological knowledge-based methods have been advocated to integrate prior information into statistical learning (Brown and others, 2000). One useful strategy is through marker-set analysis, where a set of genetic markers are assembled based on prior knowledge such as multiple variants in a gene or a pathway. The results from marker-set analysis are often more reliable, reproducible, and powerful than the results from individual marker analysis (Curtis and others, 2005; Efroni and others, 2007; Wu and others, 2010), and have attractive interpretations.

To identify marker-sets associated with disease outcomes, one may test for the overall effect of a marker-set, of dimension p, on the phenotype of interest. A convenient way to do this is via a standard p-degrees of freedom (DF) Rao score test (Rao, 1948) for the significance of the global effect. However, when p is not small, such a test may have little power, especially when the markers are correlated. One may also consider combining p-values of univariate tests to obtain an overall assessment (Zaykin and others, 2002; Lin, 2005; Nyholt, 2004; Moskvina and Schmidt, 2008). Such an approach often works well when the signal is extremely sparse, but may not be powerful when multiple markers jointly affect the outcome. To improve the power of the standard score test, modified score tests have been derived under an empirical bayes (EB) framework (Carlin and Louis, 1997) for various types of models (Thomas and others, 1992; Commenges, 1994; Goeman and others, 2005; Liu and others, 2007, 2008; Wu and others, 2010). Such modifications achieve the power gain by taking advantage of the between-marker correlation and thus reducing the effective DF of the test. However, when the true signals are sparse and/or when the markers within a set have weak correlations, these tests also suffer from a lack of power. For example, in the case–control National Cancer Institute Cancer Genetic Markers of Susceptibility (CGEMs) study (Hunter and others, 2007) to be discussed in Section 4, one is interested in studying the association between the FGFR2 gene and breast cancer risk. There are 41 typed single-nucleotide polymorphisms (SNP) in FGFR2, including those within a 30-kb region of the gene. As the number of SNPs is not small and many of these SNPs have weak linkage disequilibrium (LD), i.e. week correlation, the standard score test and the EB-based test will suffer loss of power.

To overcome these difficulties, we propose an adaptive score test procedure that incorporates the strength of the signals from the individual markers within a set. The adaptive procedure, defined in Section 2, rescales the marker values by the Z-score of an initial estimator of their association with the outcome and thus down-weights the non-informative markers. We study its theoretical distribution and provide simple procedures for approximating the null distribution of the test statistic. Our numerical studies suggest that the adaptive test performs well with respect to both empirical size and power when the true signals are relatively sparse and the markers are weakly correlated. However, when the true signals are not sparse and the correlation among the markers is high, the adaptive procedure suffers from the variability in the initial estimator and thus may perform worse compared with the EB score test. In Section 3, we derive an automatic omnibus procedure that combines information from both types of score tests. In Section 4, we examine the overall effect of the FGFR2 gene on the risk of breast cancer using the data from the CGEMS study. Findings from our numerical studies in Section 5 indicate that the omnibus test pays little price for selecting between the EB and adaptive tests. Some concluding remarks are given in Section 6.

2. Methods

2.1. Data structure and standard score tests

Suppose data consist of n independent and identically distributed random vectors, {(Y i,Vi,Ui),i=1,…,n}, where Y i is the response, Vi=(V i1,…,V ip)T is the set of new markers under investigation, and Ui=(Ui1,…,Uipu) represents other covariates such as confounders with Ui1=1, for the ith subject. To test the overall effect of the marker set V on the response Y conditional on U, e.g. the overall effect of the 41 SNPs in the FGFR2 gene on breast cancer risk, we consider the quasilikelihood model (McCullagh and Nelder, 1989) with

2.1. (2.1)

where Inline graphic, μ(⋅), and Inline graphic are pre-specified mean and variance functions, and Inline graphic is the vector of unknown effects for U and V. Common examples of model (2.1) includes linear, logistic, Gamma, and Poisson regression.

Our primary interest lies in testing whether the set of genetic markers V is associated with Y after adjusting for confounders U. That is, we aim to test the null hypothesis,

2.1. (2.2)

In standard statistical theory, score-type testing first fits the model under H0, producing an estimate Inline graphic that is the solution to the quasilikelihood score equation under H0, defined as follows. Let Inline graphic be the derivative of μ(x), Inline graphic and Inline graphic. Then Inline graphic solves

2.1. (2.3)

An estimate of σ2 under H0 is

2.1. (2.4)

The standardized score statistic is then defined as

2.1. (2.5)

There are two ways to form a test statistic based on (2.5). Let Inline graphic, partition Inline graphic with top left block pu×pu matrix Inline graphic, bottom right block p×p matrix Inline graphic, and non-diagonal blocks Inline graphic and Inline graphic, and Inline graphic. The standard score test statistic is

2.1. (2.6)

and under H0, it is asymptotically central Inline graphic. The local power of this score test is also well known (Harris and Peers, 1980; Rao, 2005). See Martinez and others (2010) for a recent explication.

The simplicity of the standard score test is convenient, but comes at potentially considerable cost because of the p-DF. In genomic applications, p is typically not small and some markers in a set have no effects. For example, there are p=41 SNPs in the FGFR2 gene. Such a large DF carries with it a potential loss of power, as we see in our simulations (Section 5).

Another way to use the score statistic (2.5) is to employ an EB framework by imposing a working assumption that the {β0j,j=1,…,p} are independent and follow an arbitrary distribution with mean 0 and variance τ. We hence test the null hypothesis (2.2) by testing whether τ=0. The working distributional assumption, while useful for deriving a globally valid testing procedure, is not required to hold. This leads to the score test

2.1. (2.7)

where for any vector a, a denotes the ℓth element of a, and ∥aq denotes the Lq vector norm. These types of score test statistics have been shown previously to be powerful alternatives to the standard score test under various settings (Goeman and others, 2005; Liu and others, 2007; Kwee and others, 2008; Wu and others, 2010). The distribution of (2.7) is given in Section 2.3.

2.2. An efficient adaptive score test

As shown in the simulation Section 5, the EB-based score test (2.7) is quite powerful when the signal is not sparse. However, for settings when the signal is sparse and the markers are weakly correlated with each other, this test has limited power due to the high DF paid for the non-informative markers. To overcome such a difficulty, we propose an adaptive score statistic that incorporates information on the signal strength of each marker. Specifically, we first obtain an initial root-n consistent estimator of β0, denoted by Inline graphic. Without loss of generality, we assume that the vector of element-wise variances of Inline graphic, κ2, can be consistently estimated by Inline graphic. Our proposed rescaling factor is Inline graphic, where ⊘ represents element-wise division. Here, Inline graphic is the Z-statistic for testing β0ℓ=0, which reflects the strength of the signal β0ℓ relative to the noise. The adaptive score test is then constructed by rescaling Inline graphic in Inline graphic element-wise by Inline graphic such that non-informative markers will be down-weighted towards zero, thus effectively eliminating these markers from the score statistic. We define our adaptive score test statistic as

2.2. (2.8)

where ⊙ denotes element-wise product. If the ℓth component Inline graphic of Inline graphic is not significantly different from zero, this clearly shows that the ℓth marker plays little role in the test statistic.

For the initial estimator, one may obtain Inline graphic as the standard quasilikelihood estimator. Specifically, let Inline graphic be the solution to Inline graphic, where

2.2. (2.9)

is the quasilikelihood score equation under the alternative. A simple choice for the variance estimator of Inline graphic is

2.2. (2.10)

where Inline graphic could be estimated under H0 as in (2.4), Inline graphic is the ℓth row vector of Inline graphic and Inline graphic. Then one may let Inline graphic and Inline graphic in the adaptive score test (2.8), where Inline graphic.

When p is not small and the signals are moderate or weak, Inline graphic may have large variation and thus may lead to power loss for the test rescaled with Inline graphic. To overcome such difficulties, we propose to improve the power by considering a ridge penalized quasilikelihood (PQL) estimator Inline graphic, where Inline graphic is the solution to Inline graphic, and λ is a tuning parameter with λλ0≥0. The estimator Inline graphic corresponds to the maximum PQL estimator under the random effects model Inline graphic (Green, 1987; Breslow and Clayton, 1993). In practice, one may choose an optimal Inline graphic based on procedures such as generalized cross-validation. The adaptive score test (2.8) may also be constructed based on Inline graphic along with the variance estimators Inline graphic, where Inline graphic is the ℓth row vector of Inline graphic.

2.3. Distribution of the test statistic under the null and under the local alternative

In the Appendix of the supplementary material available at Biostatistics online, we derive the distributions of the test statistics (2.7)–(2.8) under H0 and more generally under the local alternative H1n:β0=n−1/2b0. Define

2.3. (2.11)

Inline graphic, z0=b0κ, and Inline graphic. Furthermore, let εw denote a Inline graphic random vector, Inline graphic and Av, respectively, denote the ℓth row of Inline graphic and Inline graphic. Then Inline graphic converges to

2.3. (2.12)

in distribution. The same arguments as given in the Appendix of the supplementary material available at Biostatistics online can be used to show that under H1n, the EB-score test statistic (2.7) converges in distribution to Inline graphic. Under H0, b0=0 and thus Inline graphic in (2.8) and Inline graphic in (2.7) converge in distribution to Inline graphic and Inline graphic, respectively, where Inline graphic.

2.4. Implementation

In general, it is straightforward to approximate the null distribution of Inline graphic in (2.8) via perturbation by repeatedly generating realizations of Inline graphic, where Inline graphic and Inline graphic are defined by replacing Inline graphic with its empirical counterpart, and where Inline graphic and Inline graphic is a vector of independent N(0,1) random variables. This is because asymptotically, the distribution of Inline graphic given the observed data and the unconditional distribution of εw are asymptotically the same. Mimicking the Satterthwaite approximation, we find that in our numerical studies, the null distribution of Inline graphic can be well approximated by a rescaled χ2 distribution, i.e. Inline graphic. The scale parameter c0 and the degrees of freedom d0 can be estimated by matching the first two moments of Inline graphic. Similarly, the null distribution of the standard score test Inline graphic in (2.7) can be approximated by Inline graphic.

2.5. Theoretical differences between the tests

Some insight as to the difference between our adaptive test and the ordinary score test can be gained by considering the case that there are no additional confounders, so that U=1 and λ0=0. In this case, Inline graphic, Inline graphic, Inline graphic and Inline graphic, where Σv=var(V). Since E(V)=0, Inline graphic. Thus Inline graphic and Inline graphic. Now, let Inline graphic denote the lth diagonal element of Inline graphic, Inline graphic, write Inline graphic and let Inline graphic. It follows that Inline graphic and Inline graphic. Therefore, we may simplify the distribution of Inline graphic as

2.5.

Furthermore, if V is uncorrelated, then Inline graphic, Inline graphic and Inline graphic are equivalent to Inline graphic. On the other hand, the distribution of (2.7) scaled by (a0σ)−2 can be written as Inline graphic. Thus, for the orthogonal case, asymptotically, the EB score statistic Inline graphic is a weighted sum of p independent 1-DF χ2 random variables with non-centrality parameters Inline graphic; whereas the adaptive score statistic is a weighted sum of p-independent squared 1-DF χ2 random variables with non-centrality parameters Inline graphic.

3. An efficient adaptive omnibus test

3.1. Theoretical local power calculations

To compare the performance of the EB score test and the adaptive test, we consider the simple setting of linear regression with σ2=1, U=1 for an intercept and V is multivariate normal with mean zero, unit variance, and a common correlation ρ. In Figure 1, we present the power curves under the local alternative for ρ=0.0,0.2 and 0.5 for two extreme settings: (i) when signals are sparse with b0=(b,0,…,0)T and (ii) when all covariates contribute equally with Inline graphic. It appears that the adaptive procedure outperforms the EB score test under the sparse setting, but the phenomenon is reversed for the setting where the signals are equally contributed from all p covariates. The relative performance of the adaptive and EB score test procedures also varies with the correlation ρ. The lower the correlation is, the more advantage the adaptive procedure has.

Figure 1.

Figure 1.

Theoretical power curve for the adaptive (solid curves), EB-score (dotted curves), and omnibus combining Inline graphic and Inline graphic (dot dashed curves) tests under local alternatives with various levels of correlations (corr.): 0.0 (thin gray curves); 0.2 (black curves); and 0.5 (thick gray curves).

3.2. Omnibus test and implementation

In general, the relative performance of the EB-based score test and the adaptive procedure depends on the sparsity of b0 and the between-marker correlation. In practice, without prior information on these factors, it is unclear which procedure should be chosen for a given dataset. To overcome this difficulty, we propose to automatically combine evidence between the EB score test and the adaptive test by taking the minimum p-value and comparing to its null counterpart. Specifically, let Inline graphic and Inline graphic be the respective p-values based on the EB score test and the adaptive score test, where Inline graphic and Inline graphic are estimators of Inline graphic and Inline graphic, respectively. Then the omnibus test is based on the minimum p-value, Inline graphic which converges in distribution to Inline graphic under H0. Our simulation studies in Section 5 indicate that the omnibus test pays a relatively low price with respect to power for utilizing two tests. Our theoretical power analysis under the aforementioned two settings also supports this finding as shown in Figure 1.

In practice, the null distribution of Inline graphic can be approximated easily via perturbation methods. Let Inline graphic denote B perturbed realization of Inline graphic, where for each b, Inline graphic and Inline graphic are perturbed with the same set of independent normal vector Inline graphic. Then the null distribution of Inline graphic can be approximated by the empirical distribution of Inline graphic across b=1,…,B, where Inline graphic and Inline graphic are obtained by fitting rescaled χ2 distributions to Inline graphic and Inline graphic. When Inline graphic is very small, it may be challenging to obtain its p-value using resampling, because a large B would be required to ensure adequate approximation. For such settings, we propose to approximate the null distribution of Inline graphic using a normal mixture. Specifically, we fit a ν0-population normal mixture, Inline graphic, to Inline graphic and obtain maximum likelihood estimates for {(πν,μν,σν),ν=1,…,ν0}, denoted by Inline graphic. Then the p-value can be estimated by Inline graphic Through our empirical studies, we find that this approximation works well and hence could be useful when aiming to control for low type I error rates. Similar strategies could be used to approximate the distribution of Inline graphic to improve the χ2 approximation, especially for the tail probabilities. Here, the number of components ν0 can be either pre-specified or chosen adaptively using criteria such as BIC. In practice, we find that ν0=3 works well for approximating the distribution of Inline graphic.

4. Example: the FGFR2 gene for the risk of breast cancer

We applied our proposed procedures to assess the association between the FGFR2 gene and the risk of sporadic postmenopausal breast cancer using the genome-wide association studies (GWAS) data of the Nurses’ Health Study, which was part of the aforementioned CGEMS study (Hunter and others, 2007). Using the Illumina HumanHap500 array, this study initially genotyped 1183 women with postmenopausal invasive breast cancer and 1185 individually matched controls. Data for analysis consist of 1091 cases and 1110 controls with complete information. Among the loci reported as potentially associated with breast cancer in Hunter and others (2007) include several SNPs in FGFR2 or its intron 2. Here, we examine the overall effect of the FGFR2 gene, consisting of 41 typed SNPs, including those within the 30-kb region of the gene, on the risk of breast cancer. The analysis adjusted for age group, hormone usage, age at menarche, and the first 4 eigenvectors generated from EIGENSTRAT principal components analysis (Price and others, 2006) to account for population stratification.

We first fit the data with marginal logistic regression models with one SNP at a time adjusting for these covariates. The log odds ratio estimates along with their 95% confidence intervals obtained from the 41 marginal models are shown in Supplementary material, Figure S1 (see supplementary material available at Biostatistics online). Out of these 41 SNPs, 14 SNPs have marginal p-value < 0.05 and 3 SNPs, rs2420946, rs1219648, rs2981579, with p-value <10−5. The SNP rs1219648 has been previously shown to be highly associated with increased risk of breast cancer while both rs2420946 and rs2981579 are in high LD with rs1219648 (Hunter and others, 2007). An experimental rationale was presented in indicating that this SNP is part of a haplotype that increases risk for ER+ breast cancer by increasing FGFR2 transcription.

To assess the overall effect of the gene, we employed the aforementioned procedures including the univariate test using the minimum of these p-values. Since this gene is highly associated with breast cancer risk, we used 100 000 perturbation samples along with the normal mixture approach to approximate the p-values. The univariate test gives an overall p-value of 4.0×10−5. On the other hand, our adaptive test gives a p-value of 4.3×10−8, the EB score test a similar p-value of 7.2×10−7 while the the standard p-DF score test has no power in detecting the signal with a p-value of 0.12. The omnibus test combining Inline graphic and Inline graphic yielded a p-value of 1.8×10−7 based on the normal mixture approximation. However, a larger number of perturbation samples would be required to obtain more accurate estimate of the p-value for these tests.

5. Simulation studies

5.1. Setup and null case

We conducted simulation studies to assess the performance of the proposed score test. For simplicity, we considered the setting that U=1. To mimic the GWAS setting, we generated V based on the LD structure of two genes: (i) the ASAH1 gene with high LD and (ii) the FGFR2 gene with moderate to low LD. Based on the Illumina 500 K platform, we included p=14 SNPs of the ASAH1 gene and p=31 SNPs of the FGFR2 gene, whose LD heat maps are shown in Supplementary material, Figure S2 (see supplementary material available at Biostatistics online). The response variable Y is generated from the linear regression model Inline graphic with ϵN(0,4) generated independent of V. For each configuration, we generated 4000 datasets to calculate the empirical size when the null hypothesis is true and 1000 datasets to calculate the empirical power when it is not. For each dataset, the resampling procedure was carried out with B=5000 and B=1000, for the null and alternative settings, respectively. We considered n=200, 500, and 1000.

As a benchmark, we also report results on the univariate test, whose significance is determined by comparing the observed minimum p-value of p univariate tests to its corresponding null distribution. For each simulated dataset, we carried out the following test procedures: (i) the p-DF score test Inline graphic; (ii) the EB based score test Inline graphic in (2.7); (iii) adaptive score test in (2.8) rescaled with ridge initial estimator Inline graphic; and (iv) univariate test (Uni). In addition, we consider various omnibus tests based on the minimum p-value among the two or three p-values from (ii), (iii), and (iv). Note that we considered only ridge estimators as our initial estimator, since the standard quasilikelihood estimator is unstable due to the high collinearity between the SNPs. To examine how well the χ2 distribution approximates the null distribution of Inline graphic and Inline graphic, we also provided the empirical size and power based on the approximation. We also examined the performance of the normal mixture approximation to the distribution of the omnibus test statistics as well as the distribution of Inline graphic. Unless noted otherwise, p-values for all other test statistics are based on the perturbation procedure, which can conveniently account for various types of correlations.

First, to examine the validity of our proposed testing procedure in finite samples, we generated data under H0 model with β0=0 to assess the size of the score test. As shown in Table 1, the empirical sizes of the aforementioned tests at type I error rate of 1% and 5% are summarized in Table 1. Across all the configurations, the empirical sizes are close to the nominal levels for all procedures except for the standard p-DF test, which is often overly conservative due to the correlation among the V. Furthermore, it appears that the χ2-based approximation works reasonably well in practice for approximating the distribution of Inline graphic at type I error rate 5%. However, it appears that at a lower error rate of 1%, the χ2 approximation is slightly anti-conservative, while the normal mixture approximation appears to provide a better approximation and works well for approximating the distribution of other minimum p-value test statistics.

Table 1.

Empirical sizes (in %) at target sizes of 1% and 5% for the score tests when V is simulated based on the 17 tagSNPs of the ASAH1 gene and 31 tagSNPs of the FGFR2 gene on the illumina chip. Here Inline graphic is the p-DF score test in (2.6), Inline graphic is the test (2.7), Inline graphic is the test at (2.8) with initial ridge estimate, “Uni” is the test that is based on the minimum p-value across p univariate tests, and minA1,…,Ak represents the omnibus test that takes the minimum p-value of the tests based on A1,…,Ak. The p-values are obtained via perturbation (ptb), chi-square approximation (χ2), and normal mixture approximation (Mix).

Inline graphic
Inline graphic
Inline graphic
Inline graphic
Inline graphic
Inline graphic
n Size ptb χ2 ptb χ2 Uni ptb χ2 Mix ptb Mix ptb Mix ptb Mix
ASAH1 gene
200 1 0.3 0.2 1.1 1.3 0.8 1.1 1.6 1.3 1.1 1.2 0.9 0.8 0.9 0.8
500 1 0.4 0.3 1.0 1.4 0.8 1.1 1.5 1.3 1.0 1.1 0.7 0.7 0.8 0.8
1000 1 0.7 0.2 0.9 1.1 0.7 1.0 1.3 1.1 1.0 1.1 0.8 0.8 0.8 0.8
200 5 3.1 1.2 5.5 5.4 4.4 5.7 5.6 5.4 5.8 5.6 4.5 4.5 4.4 4.3
500 5 4.3 1.7 5.2 5.1 4.7 5.2 5.2 5.0 5.5 5.2 4.7 4.5 4.6 4.4
1000 5 3.7 1.5 4.6 4.5 4.5 5.2 5.1 4.9 5.0 4.7 4.6 4.5 4.5 4.3
FGFR2 gene
200 1 0.1 0.1 0.7 0.9 0.7 0.6 0.9 0.8 0.6 0.6 0.7 0.7 0.7 0.7
500 1 0.3 0.5 0.9 1.1 1.0 0.9 1.2 1.0 0.9 1.0 1.1 1.0 1.1 1.1
1000 1 0.6 0.6 0.8 1.0 1.0 0.9 1.3 1.0 0.7 0.7 0.9 0.9 0.9 1.0
200 5 1.7 1.8 5.1 5.2 5.2 4.3 4.4 4.2 4.4 4.2 5.1 5.0 5.1 4.8
500 5 2.8 3.1 5.0 5.1 5.2 4.9 5.0 4.7 4.7 4.5 5.2 5.0 5.2 4.8
1000 5 3.8 3.4 4.6 4.5 5.3 5.1 5.2 4.9 5.0 4.7 5.2 5.0 5.2 4.9

5.2. Power comparisons

For empirical power analyses, we let Inline graphic, for ι=0,…,pq, and Inline graphic indexes either ASAH1 or FGFR2. Hence ι+1 represents the starting position the causal variants, q determines the sparsity of the signal, and Inline graphic reflects the strength of the signal. We consider 4 choices of q and Inline graphic: (i) dense signal q=⌈0.8p⌉, bASAH1=3.5, bFGFR2=2.8; (ii) moderately sparse signal q=⌈0.5p⌉, bASAH1=4.1, bFGFR2=3.5; (iii) sparse signal q=⌈0.1p⌉, bASAH1=7.1, bFGFR2=5.4; and (iv) single causal variant q=1, bASAH1=bFGFR2=10.6. The pattern of the results is similar across the three sample sizes and hence we present only results for n=500.

Since the SNPs in the ASAH1 gene are generally in high LD with each other, we summarize the power of the tests averaged over the entire range of ι in Figure 2. As we expect from the theoretical analysis, the EB test is the most powerful under the dense signal setting with 14% sparsity. For the sparse settings, the adaptive test is at least as powerful as other procedures. Across all settings, the standard p-DF test is the least powerful. The most robust test is the omnibus test based on Inline graphic, the minimum p-value from Inline graphic and Inline graphic, which always has power similar to the best among (a)–(d).

Figure 2.

Figure 2.

Empirical power (in %) for various tests using the ASAH1 gene under local alternatives averaged over all the choices of ι’s. (a) q=12, bASAH1=3.5 (14% sparsity); (b) q=7, bASAH1=4.1 (50% sparsity); (c) q=2, bASAH1=7.1 (86% sparsity); and (d) q=1, bASAH1=10.6 (93% sparsity).

For the FGFR2 gene, the LD is generally weak among the SNPs, but the correlation structure changes over different regions. To gauge the general pattern of how the correlation might affect the test performances, we let ℘(ι) denote the average correlation between the causal and non-causal SNPs for a given ι and summarize the power by averaging over different levels of ℘(ι). In Figure 3, we present the power averaged over the set of ι with low ℘(⋅) and the set of ι with moderate ℘(⋅). When ℘(⋅) is low and the signal is moderately sparse, the adaptive test is more powerful than its competitors. For example, when the sparsity is 19%, the average power is 54% for the EB test and 60% for the adaptive test while the univariate test has a power of 40%. The univariate test is generally less powerful except when there is only a single causal variant and ℘(⋅) is low. When ℘(⋅) is moderate, the EB test and the adaptive test have more similar performances and the univariate test generally is less powerful. Similar to the results for the ASAH1 gene, the test based on Inline graphic almost always achieves power close to the best among (a)–(d), except for when ℘(⋅) is low with extreme sparsity. Other omnibus tests based on Inline graphic and Inline graphic can be less powerful whenever the univariate test does not work well, where Inline graphic denotes the p-value based on the univariate test. For example, for the FGFR2 gene with 19% sparsity and low ℘(⋅), the powers of the univariate test and the omnibus test that includes the univariate test is only ∼ 43%, while the power of Inline graphic is close to 57%.

Figure 3.

Figure 3.

Empirical Power (in %) for various tests using the FGFR2 gene under local alternatives averaged over the set of ι with low ℘(⋅) (low) and the set of ι with moderate ℘(⋅) (moderate). For settings of q and b were considered: (a) q=25, bFGFR2=3.5 (19% sparsity); (b) q=16, bFGFR2=2.8 (48% sparsity); (c) q=4, bFGFR2=5.4 (87% sparsity); and (d) q=1, bFGFR2=10.6 (97% sparsity).

6. Discussions

In this paper, we proposed an adaptive score test procedure to test for the effect of a set of genetic markers, by rescaling the design matrix of the genetic markers with an initial estimator of the marker effects. When compared with the EB score test in (2.7), the adaptive test in (2.8) has higher power when the signal is sparse and the between marker correlation is low. The null distribution of Inline graphic can be estimated via a simple Monte-Carlo procedure. In practice, we find that the null distributions of Inline graphic and Inline graphic can also be approximated well by skewed χ2 distributions with DF, Inline graphic and Inline graphic, respectively. Furthermore, under the local alternative, the distributions of Inline graphic and Inline graphic can be approximated by non-central χ2 distributions with non-centrality parameters, Inline graphic and Inline graphic, respectively. However, providing theoretical justification for why such an approximation works well for Inline graphic is challenging and warrants further research.

The χ2 approximations allow us to assess the relative performance of these two testing procedures by comparing Inline graphic to Inline graphic under various settings. In simulation studies (results not reported) with a compound symmetry correlation structure for V with correlation ℘, we compared how ℘ and signal sparsity affect the results. The adaptive test is most effective when ℘ is low and sparsity is high, since under such settings Inline graphic but Inline graphic. As ℘ increases, Inline graphic decreases quickly, but Inline graphic remains almost constant. As the sparsity of the signal increases, Inline graphic decreases while Inline graphic increases but the magnitude of change in the non-centrality parameters appears to be slower with larger ℘, particularly for Inline graphic. Thus, when the correlation increases, the EB-based test gains power by maintaining low DF while the adaptive test pays the price for having higher DF, in part due to the increased difficulty in estimating β0.

To get more intuition behind these relative performances, we consider the local alternative β0=n−1/2b0 and the setting with orthogonal normal design, U=1, and σ2=1. Due to the complexity of the power functions, we focus on the setting when p is not small for the ease of approximation. One can show that Inline graphic is approximately χ2 with

6.

while Inline graphic and Inline graphic, where Inline graphic. Comparing the non-sparse case with Inline graphic to the sparse case with b0=(b,0,…,0)T, one finds that the EB test has the same power at these two local alternatives since Inline graphic in both cases. On the other hand, the adaptive test has much greater power in the sparse setting since Inline graphic for the sparse case and Inline graphic in the non-sparse case. Hence when signal is sparse and b is not small, the adaptive test gains power by amplifying the strong signals, which is reflected in the increased non-centrality parameter Inline graphic. On the other hand, when signal is not sparse, b4−2pb2 could be substantially smaller than 0 and hence leads to a power loss when compared with the EB test.

In general, the ridge-rescaled adaptive test has more power than the test rescaled by the standard quasi-likelihood estimator, especially when the correlation among the V is high. The omnibus test which combines information from both the EB-based score test and the adaptive test appears to pick out the winner with relatively little price paid for selecting the better one, at least for the settings we have examined. It will be interesting to extend the proposed procedures to accommodate the rare variants from next generation sequence studies. When the minor allele frequencies of the rare variants are too low, the proposed weight vector Inline graphic based on simple initial estimators may be unstable. Alternative weights that account for rare variants and may increase power warrants further research.

As shown in the data example section, when the p-value is extremely small, it remains numerically difficult to obtain a good estimate of the tail probability for the omnibus test due to the requirement of a large number of perturbations. On the other hand, our proposed perturbation procedure would enable us to easily obtain the overall type I error-adjusted p-values when multiple marker sets are under investigation. By generating the same set of Inline graphic for all the marker sets, one can obtain the null distribution of the minimum p-value across all marker sets and compare the observed p-value to this null distribution to estimate the adjusted p-value. For approximating the tail probabilities, we find that a normal mixture works well for approximating the distribution of the minimum p-value, both under H0 and under the alternative.

Funding

Research was supported by grants from the National Institute of Health (R01-GM079330 to T.C.) and the National Science Foundation (DMS-0854970 to T.C.); the National Cancer Institute (R37-CA076404 and P01-CA134294 to X.L.); the National Cancer Institute (R37-CA057030 to R.J.C.) and Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST) to R.J.C.

Supplementary Material

Supplementary Data

Acknowledgments

Conflict of Interest: None declared.

References

  1. Baum A. E., Akula N., Cabanero M., Cardona I., Corona W., Klemens B., Schulze T. G., Cichon S., Rietschel M., Nöthen M. M. A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Molecular Psychiatry. 2007;13:197–207. doi: 10.1038/sj.mp.4002012. others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Breslow N. E., Clayton D. G. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
  3. Brown M. P. S., Grundy W. N., Lin D., Cristianini N., Sugnet C. W., Furey T. S., Ares M., Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences. 2000;97:262–267. doi: 10.1073/pnas.97.1.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Carlin B. P., Louis T. A. Bayes and empirical Bayes methods for data analysis. Statistics and Computing. 1997;7:153–154. [Google Scholar]
  5. Commenges D. Robust genetic linkage analysis based on a score test of homogeneity: the weighted pairwise correlation statistic. Genetic Epidemiology. 1994;11:189–200. doi: 10.1002/gepi.1370110208. [DOI] [PubMed] [Google Scholar]
  6. Curtis R. K., Oresic M., Vidal-Puig A. Pathways to the analysis of microarray data. TRENDS in Biotechnology. 2005;23:429–435. doi: 10.1016/j.tibtech.2005.05.011. [DOI] [PubMed] [Google Scholar]
  7. Efroni S., Schaefer C. F., Buetow K. H. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One. 2007;2:425. doi: 10.1371/journal.pone.0000425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Frayling T. M. Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nature Reviews Genetics. 2007;8:657–662. doi: 10.1038/nrg2178. [DOI] [PubMed] [Google Scholar]
  9. Goeman J. J., Oosting J., Cleton-Jansen A. M., Anninga J. K., van Houwelingen H. C. Testing association of a pathway with survival using gene expression data. Bioinformatics. 2005;21:1950–1957. doi: 10.1093/bioinformatics/bti267. [DOI] [PubMed] [Google Scholar]
  10. Green P. J. Penalized likelihood for general semi-parametric regression models. International Statistical Review/Revue Internationale de Statistique. 1987:245–259. [Google Scholar]
  11. Harris P., Peers H. W. The local power of the efficient scores test statistic. Biometrika. 1980;67:525. [Google Scholar]
  12. Hunter D. J., Kraft P., Jacobs K. B., Cox D. G., Yeager M., Hankinson S. E., Wacholder S., Wang Z., Welch R., Hutchinson A. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics. 2007;39:870–874. doi: 10.1038/ng2075. others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kwee L. C., Liu D., Lin X., Ghosh D., Epstein M. P. A powerful and flexible multilocus association test for quantitative traits. The American Journal of Human Genetics. 2008;82:386–397. doi: 10.1016/j.ajhg.2007.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lin D. Y. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–7. doi: 10.1093/bioinformatics/bti053. [DOI] [PubMed] [Google Scholar]
  15. Liu D., Ghosh D., Lin X. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics. 2008;9:292. doi: 10.1186/1471-2105-9-292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Liu D., Lin X., Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics. 2007;63:1079–1088. doi: 10.1111/j.1541-0420.2007.00799.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Martinez J. G., Carroll R. J., Muller S., Sampson J. N., Chatterjee N. A note on the effect on power of score tests via dimension reduction by penalized regression under the null. The International Journal of Biostatistics. 2010;6:12. doi: 10.2202/1557-4679.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. McCullagh P., Nelder J. A. Generalized Linear Models. New York: Chapman & Hall/CRC; 1989. [Google Scholar]
  19. Moskvina V., Schmidt K. M. On multiple-testing correction in genome-wide association studies. Genetic Epidemiology. 2008;32:567–573. doi: 10.1002/gepi.20331. [DOI] [PubMed] [Google Scholar]
  20. Nyholt D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. The American Journal of Human Genetics. 2004;74:765–769. doi: 10.1086/383251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  22. Rao C. R. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society. 1948;44:50–57. [Google Scholar]
  23. Rao C. R. Score test: Historical review and recent developments. Advances in Ranking and Selection, Multiple Comparisons, and Reliability. 2005:3–20. [Google Scholar]
  24. Rioux J. D., Xavier R. J., Taylor K. D., Silverberg M. S., Goyette P., Huett A., Green T., Kuballa P., Barmada M. M., Datta L. W., thers o. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genetics. 2007;39:596–604. doi: 10.1038/ng2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Sullivan P. F., de Geus E. J. C., Willemsen G., James M. R., Smit J. H., Zandbelt T., Arolt V., Baune B. T., Blackwood D., Cichon S., thers o. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Molecular Psychiatry. 2008;14:359–375. doi: 10.1038/mp.2008.125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Thomas D., Langholz B., Clayton D., Pitkäniemi J., Tuomilehto-wolf E., Tuomilehto J. Empirical Bayes methods for testing associations with large numbers of candidate genes in the presence of environmental risk factors, with applications to HLA associations in IDDM. Annals of Medicine. 1992;24:387–92. doi: 10.3109/07853899209147843. [DOI] [PubMed] [Google Scholar]
  27. Vo T. M., Phan J. H., Huynh K. N. T., Wang M. D. In Engineering in Medicine and Biology Society, 2007. 29th Annual International Conference of the IEEE. 2007. Reproducibility of differential gene detection across multiple microarray studies; pp. 4231–4234. [DOI] [PubMed] [Google Scholar]
  28. Wallace C., Newhouse S. J., Braund P., Zhang F., Tobin M., Falchi M., Ahmadi K., Dobson R. J., Marçano A. C. B., Hajat C., thers o. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. The American Journal of Human Genetics. 2008;82:139–149. doi: 10.1016/j.ajhg.2007.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wu M. C., Kraft P., Epstein M. P., Taylor D. M., Chanock S. J., Hunter D. J., Lin X. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics. 2010;86:929. doi: 10.1016/j.ajhg.2010.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Yeager M., Orr N., Hayes R. B., Jacobs K. B., Kraft P., Wacholder S., Minichiello M. J., Fearnhead P., Yu K., Chatterjee N. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature genetics. 2007;39:645–649. doi: 10.1038/ng2022. others. [DOI] [PubMed] [Google Scholar]
  31. Zaykin D. V., Zhivotovsky L. A., Westfall P. H., Weir B. S. Truncated product method for combining p-values. Genetic Epidemiology. 2002;22:170–185. doi: 10.1002/gepi.0042. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES