Abstract
Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases like breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk-Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis.
Keywords: Breast cancer, FGFR2 gene, Gene-level test, Generalized higher criticism, Sparse alternative
1. Introduction
Genome-Wide Association Studies (GWAS) have been successful in identifying the associations between thousands of Single Nucleotide Polymorphisms (SNPs) and a variety of complex traits (Manolio et al. 2009). A traditional GWAS analysis tests for the effect of each SNP separately, and this approach has shown that individual SNP effects are often weak across the genome (Visscher et al. 2012). Recently, set-based tests that jointly analyze a group of SNPs - e.g. SNPs in a gene, pathway, or network - have become increasingly popular as complementary tools that can boost analysis power in GWAS (Wu et al. 2010). These tests are also standard for rare variant analysis in whole genome sequencing studies (Lee et al. 2014).
SNPs can be aggregated into sets based on a variety of genomic features. For example, they can be grouped by physical position, such as location in a gene or Linkage Disequilibrium (LD) block, or similar biological functions, such as membership in a genetic pathway or protein network. Set-based analyses then allow for some natural advantages over individual SNP methods. Besides reducing the number of multiple comparisons across the genome, SNP-set methods can increase power by pooling sparse and weak effects into a stronger aggregated signal, as well as by incorporating biological information into the test (Wu et al. 2011). In addition, set-based interpretations of association may be more meaningful than their single-marker counterparts, such as in a gene-level or pathway-level analysis.
A number of set-based tests for genetic association studies have been developed in recent years, including burden tests (Li and Leal 2008), the Sequence Kernel Association Test (SKAT) (Wu et al. 2011), the minimum p-value test (MinP) (Conneely and Boehnke 2007), and the Generalized Higher Criticism (GHC) (Barnett et al. 2017). SKAT and burden tests are examples of methods more suitable for detecting dense signals. If the signals reside in only a few SNPs that are not correlated with noise SNPs, then the power of SKAT and burden tests will suffer.
While certain SNP-sets may contain a large number of signals, it is more common that genomic constructs formed with GWAS data will have only a few signal SNPs. Interestingly, within tests designed for this sparse alternative setting, there are still subtle differences in performance. Under extreme sparsity, as in the case of only one or very few signals in the entire set, the minimum p-value test and the GHC have good power for detecting a SNP-set effect. However GHC and MinP can lose power under moderate sparsity settings, which are relatively common in gene and pathway level analyses of GWAS data. For example, in the Cancer Genetic Markers of Susceptibility (CGEMS) GWAS for breast cancer risk (Hunter et al. 2007), four out of 42 SNPs in the FGFR2 gene, a suspected breast cancer risk locus, showed strong evidence of association without reaching genome-wide significance. It is hence of substantial interest to develop testing procedures that can reliably detect associations across a range of alternatives in the sparse signal regime.
When factors in a set are independent, several goodness-of-fit type methods have been proposed to perform set-based tests in the presence of sparse signals (Donoho and Jin 2004; Jager and Wellner 2007; Walther 2013). These methods test for the effect of a set by aggregating evidence from marginal test statistics, and they have been shown to possess attractive asymptotic properties when the size of a set goes to infinity. Specifically, they reach a so-called detection boundary when signals are sparse. In a certain sense, they are able to detect the weakest signals detectable by any statistical procedure under the sparse alternative. The class of tests with this ability includes the Higher Criticism (HC) (Donoho and Jin 2004) and the Berk-Jones (BJ) (Berk and Jones 1979). Compared to the HC and BJ tests, the minimum p-value test, for example, is known to attain the detection boundary over a smaller portion of the sparse regime. In terms of finite sample performance, it has been demonstrated through simulation that Berk-Jones outperforms Higher Criticism over a range of moderately sparse alternatives when marginal test statistics are independent (Walther 2013; Li and Siegmund 2015). Donoho and Jin (2004) and Moscovich-Eiger et al. (2016) provide explanations for this result by showing that HC disproportionately weights evidence from the most extreme observed marginal test statistic, at the cost of losing sensitivity to detect signals in other locations.
However, a direct application of BJ to SNP-set testing is not desirable, as standard p-value calculations for BJ rely on independence between elements in a set (Moscovich-Eiger and Nadler 2017). This assumption is violated by LD-induced correlation between neighboring SNPs in a SNP-set. In addition, we will see that even if we correct the inference procedure of the Berk-Jones, the power of BJ can fall dramatically in high LD settings.
To overcome the challenges posed by correlated SNPs, this paper proposes the Generalized Berk-Jones (GBJ) statistic for testing the association between a SNP-set and outcome. GBJ accounts for LD among SNPs in a set while still preserving the ability to detect moderately sparse and weak signals in finite samples. In fact, GBJ reduces to the Berk-Jones statistic when all SNPs in a set are independent. GBJ can also be applied to SNP-set tests using both individual-level genotype data or GWAS summary statistics from individual SNP analysis. To facilitate use, we additionally provide an analytic p-value calculation for GBJ. Our method is more computationally efficient than permutation and is shown to be accurate even at the extremely small levels required for genome-wide significance.
Additional insight into the strengths and weaknesses of GBJ is provided by studying the rejection regions of SNP-set tests developed for sparse alternatives. The rejection regions allow us to quantitively describe how the power of each test is susceptible to changes in parameters such as the amount of correlation between SNPs or the size of the SNP-set. Since in practice we never have knowledge of the type of alternative, we also propose an omnibus test that combines GBJ, GHC, MinP, and SKAT for added robustness to different degrees of sparsity. An extensive simulation study demonstrates that GBJ outperforms alternative methods in testing SNP-set effects when signals are weak and moderately sparse, and we also show that the omnibus test is robust to a wide range of sparsity levels.
The remainder of the paper is organized as follows. Section 2 discusses the SNP-set testing framework using both individual-level data and GWAS summary statistics. In Section 3 we propose the Generalized Berk-Jones statistic for testing the association between a SNP-set and outcome. We also provide an analytic p-value calculation for GBJ and develop the omnibus test. Section 4 compares the rejection regions of GBJ and other tests designed for the sparse regime. In Section 5 we demonstrate the finite sample performance of GBJ through simulation. Section 6 presents an analysis of the CGEMS data, and we conclude with a discussion in Section 7.
2. The SNP-Set Testing Framework
2.1. Individual-Level Genotype Data
We begin by describing the SNP-set testing framework using individual-level data on genotype, outcome, and other covariates for n total subjects. Suppose for subject i, i = 1,…,n, we observe the outcome Yi, a genotype vector Gi = (Gi1,…,Gid)T of d SNPs in a SNP-set, and a vector of q covariates Xi = (Xi1,…,Xiq)T. Assume that Yi conditional on (Gi, Xi) follows a distribution in the exponential family (McCullagh and Nelder 1989) with the density function f(Yi) = exp{(Yiθi − b(θi))/ai(ϕ)+c(Yi, ϕ)} where a(·), b(·), and c(·), are known functions, θi is a canonical parameter, and ϕ is a dispersion parameter. Let μi = E(Yi|Gi, Xi) denote the conditional mean of Yi and assume it follows the Generalized Linear Model (GLM)
where g(·) is a differentiable monotone link function. We here only consider canonical link functions for simplicity. In matrix notation, the data take the form Y = (Y1, …, Yn)T, Gn×d=[G1, …, Gn]T, and Xn×q=[X1, …, Xn]T.
The null hypothesis of no association between a SNP-set and outcome, after controlling for covariates, is given by H0 : β = 0. Both the size of the set d and the sparsity of signals can vary greatly between sets, e.g. from gene to gene, and the number of nonzero βj is unknown. Our aim is to develop a test suitable for different levels of sparsity while also accounting for the correlation among individual SNP test statistics.
A marginal score statistic for βj, j = 1,…,d, is
where G.j denotes the jth column vector of G, P = W − WX(XTWX)−1XTW is the projection matrix, , , α0 is the MLE of α under the null hypothesis, and ν(μi) = b″(θi) is the variance function.
Under the null, the d marginal score test statistics have an asymptotic multivariate normal distribution
where Σjj = 1 for all j, and for j ≠ k we can estimate (Barnett et al. 2017)
| (1) |
2.2. GWAS Summary Statistics
Many GWAS may not release individual-level data due to logistical challenges or data confidentiality agreements. Instead it is much more likely that a marginal test statistic for association with the outcome is available for each individual SNP (Pasaniuc and Price 2016). It is hence of great interest to be able to perform SNP-set testing using precomputed Zj from across the genome. Thus we propose an estimate of Σ for precomputed summary statistics.
Suppose that the original individual-level genotypes G and other covariates X are not available. We would like to approximate (1) with external data possessing a similar distribution. The large consortiums that generate many summary statistic datasets often include only principal components and a few other covariates in X. A substitute for the principal components and genotype data could be generated from publicly available reference panels, for example from the 1000 Genomes dataset (1000 Genomes Project Consortium 2015).
Assume now that we possess a matrix of reference genotype data from nr subjects of the same ethnicity at the same d SNPs in our SNP-set of interest. Denote this matrix by , where refers to the nr × 1 genotype vector for SNP j from the reference panel. Assume also that we are able to calculate or have been supplied with principal components derived from the reference panel. This external data can act as a proxy for the corresponding terms in Equation (1). That is, replace G.j and X with and X(r) where X(r) = [1, PC1, …, PCm], PC1, …, PCm are the first m principal component vectors calculated from the reference panel, and m is the same number of principal components as was used to control for population stratification (Price et al. 2006) in the original GWAS analysis of the data.
Then the last term to approximate is W, which we propose to estimate by setting equal to the sample mean of the outcome. For a normally distributed outcome, this is exact as v(·) = 1. For a binary outcome, population stratification is generally the primary confounder of the SNP-outcome relationship in GWAS, and μi0 often varies slowly with the principal components, so this approximation for W is practically reasonable for replicating the structure of the original data. Ultimately we are approximating in the numerator of (1) by , up to a scale parameter. The denominator is similarly approximated, with the scale terms cancelling out in the final result. We have found that this strategy performs well in practice (see Supplementary Materials) even without a substitute for the non-principal component covariates in X, as the genotypes and principal components often provide the main influences on Σ.
3. The Generalized Berk-Jones Test for SNP-Set Effects
3.1. The Berk-Jones Statistic
We briefly review the Berk-Jones statistic in this section to help introduce the Generalized Berk-Jones statistic in Section 3.2. The BJ statistic is designed to test for H0 : β = 0 against the alternative that a nonempty subset of the βj are nonzero, assuming the marginal test statistics are independent. Let denote the survival function of a standard normal random variable and Φ−1(t) denote its inverse. Let |Z|(j) denote the order statistics of the vector that results from applying the absolute value operator to each element of Z, so that |Z|(1) is the smallest value of Z in magnitude.
Set , which is the number of marginal test statistics with a magnitude greater than or equal to some threshold t. For a fixed t ≥ 0, and if for all j, then S(t) has a binomial distribution with size d and mean parameter . This viewpoint motivates the Berk-Jones statistic for independent observations as:
| (2) |
where solves the equation
| (3) |
and the second line uses the characterization of S(t} as a binomial random variable. BJ and similar tests are also commonly presented using the empirical distribution function of the p-values associated with Z1, …, Zd (Donoho and Jin 2004).
We see that BJ can roughly be explained as the maximum of a one-sided likelihood ratio test on the mean parameter of S(t), performed over the larger half of observed test statistic magnitudes. At t = |Z|(d−j+1) we have under the binomial likelihood null, and we have π = j/d under the binomial likelihood alternative. We say binomial likelihood null and alternative to make clear that we are talking about an interpretation of the Berk-Jones statistic and to distinguish from the actual set-based null and alternative hypotheses being tested.
Note that the Higher Criticism test differs from the Berk-Jones by replacing the likelihood ratio statistic in (2) with the Pearson Chi-square statistic . Let k be number of causal SNPs in a set. The sparse regime is designated as k < d1/2, and we call d1/4 < k < d1/2 moderately sparse, with k ≤ d1/4 referred to as extremely sparse. Donoho and Jin (2004) showed that, when the Zj are all mutually independent, both HC and BJ are able to reach the detection boundary over the entire sparse signal regime as d → ∞. Walther (2013) and Li and Siegmund (2015) showed that the BJ statistic generally has better power than HC when the size of the set d is finite and the signals are moderately sparse.
If Z1, …, Zd are correlated, as they will be for test statistics arising from neighboring SNPs in a gene, then S(t) no longer has a binomial distribution under the null. In this case, the standard Berk-Jones statistic no longer has a meaningful interpretation, and we may expect it to lose efficiency. In fact, we will show later that the rejection region of the Berk-Jones has a less desirable shape under various correlation structures, leading to a significant loss in power when the test statistics are not independent. Therefore we are interested in developing a modified BJ statistic that can account for correlation among the marginal test statistics in a set and thus possesses rejection regions that are more robust to arbitrary correlation structures.
3.2. The Generalized Berk-Jones Statistic
We now propose the Generalized Berk-Jones statistic for testing the association between a SNP-set and outcome. Following the spirit of Berk-Jones, GBJ considers a likelihood ratio type statistic on the mean parameter of S(t), but the key difference is GBJ explicitly accounts for the correlation structure of Z1, …, Zd. More precisely, we define the GBJ statistic as:
While BJ and GBJ are related to goodness-of-fit tests, in keeping with the signal detection setting of genetic association studies, we have focused on one-sided (Berk and Jones 1979) versions of the statistics intended to detect an overrepresentation of test statistics with large magnitude.
When the Zj are correlated, S(t) follows either an underdispersed or overdispersed binomial distribution instead of the standard binomial. However finding the exact distribution of S(t) when Σ ≠ I is difficult. For a general Σ, computing Pr{S(t) = m} requires iterating through d choose m terms and is very time consuming. In special cases, such as when Σ has an exchangeable correlation structure with Σ = (1−ρ)I + ρ11T the calculation is much easier (Tong 2012). However these scenarios occur rarely, if ever, in practice.
We propose to approximate the full distribution of S(t) using an Extended Beta-Binomial (EBB) distribution (Prentice 1986). The Extended Beta-Binomial is a reparameterization and extension of the standard Beta-Binomial (α, β) distribution with the standard Beta-Binomial being a special case of the EBB. A random variable V ~ EBB (d, λ, γ) has probability mass function
| (4) |
where we follow the convention for a < 0. The mean of V is given by E(V) = dλ and the variance is Var(V) = dλ(1-λ){1+(d-1)γ(1+γ)−1}.
The Extended Beta-Binomial distribution reduces to the Beta-Binomial distribution if we set λ = α(α+β)−1 and γ = (α+β)−1 for α,β> 0. Because the standard Beta-Binomial distribution requires α,β> 0, it cannot accommodate underdispersion and never reduces to the binomial distribution. In contrast, the EBB allows for both overdispersion and underdispersion, and it reduces exactly to the binomial distribution when γ = 0. This mechanism allows our GBJ statistic to reduce to the Berk-Jones when there is no correlation among the observations.
3.3. Calculation of the Generalized Berk-Jones Statistic
We now describe more precisely the mechanics of calculating the GBJ statistic. To begin, check if the condition is satisfied at any j ≤ d/2. If the condition is never satisfied, then the observed value of GBJ is 0 and we do not need to perform any more computation. The following steps should only be taken on indices j ≤ d/2 where the condition is satisfied.
At each qualifying j, we approximate the distribution of S(|Z|(d−j+1)) by an Extended Beta-Binomial random variable under both the binomial likelihood null and the binomial likelihood alternative. Denote these two variables by and . we solve for through moment matching equations
where E0 and Var0 denote the expectation and variance conditional on Z ~ MVN(0·Jd, Σ). Similarly, we solve for using the same equations except with Ea and Vara, which are the expectation and variance conditional on .
The first moment matching equation is simple to solve, since clearly and Ea{S(|Z|(d−j+1))} = j. The variance term in the second equation is more difficult. We can use Theorem 1 of Barnett et al. (2017) for Var0. For Vara, we need the following theorem:
Theorem 1. Let Z ~ MVN(μ·Jd, Σ), and take be the probabilists’ Hermite polynomials. Define , where ρk.l is the (k, l) element of Σ, and let ρk,k = 1 for all k = 1, 2, … d. Then the variance of S(t) is:
The proof of this theorem is given in the Supplementary Materials. The terms in the infinite sum shrink very quickly, and in practice we see good accuracy using only the first ten.
After matching all four parameters , we calculate
The maximum value of among all qualifying j is then the observed Generalized Berk-Jones statistic.
3.4. Analytic P-value Calculation
Let Gd be a general supremum-based global statistic such as the GBJ statistic. Suppose Gd is constructed using independent marginal test statistics Z1,…,Zd. Denote the observed value of this statistic by g, where higher values of g indicate more evidence for the alternative. As noted by Moscovich-Eiger and Nadler (2017), the p-value for g can often be written
where 0≤b1≤b2≤…≤bd are ‘boundary points’ that come from inversion of the test statistic. The points b1, b2, …, bd will depend on g and d and will be different for different choices of a global statistic, but for the sake of presentation, we will suppress this dependency in the notation. Moscovich-Eiger and Nadler (2017) proposed a method that can calculate the p-value of Gd very quickly if Z1,…,Zd are independent. However when Z1,…,Zd are correlated, their techniques for a fast calculation are not applicable.
An exact p-value for GBJ, and for any global test applied to correlated observations, must take into account the covariance structure of Z. The p-value for these tests is then
| (5) |
where bj is understood to additionally depend on Σ as well. We are unaware of any computationally feasible expressions to calculate the joint distribution of the order statistics |Z|(1),…,|Z|(d) when d is moderate or large. The Supplementary Materials provides a procedure to compute this probability by partitioning the region into d! separate sections, but the method is very computationally intensive and not feasible for use with d > 10.
However, an alternative way to write the rejection region of Equation (5) is
| (6) |
The right hand side of (6) suggests that the quantity can be calculated recursively. Indeed, define b0 = 0 and . The quantity in (6) is just qd,0 and can be calculated recursively as
| (7) |
for j > 1, with q1,a = Pr{S(b1) = a|S(b0) = d}. The last line of (7) becomes an equality when all the test statistics are independent, and the approximation appears effective for the genetic data considered in our manuscript (see Supplementary Materials). We further use an EBB approximation for the distribution of S(bj) conditional on S(bj−1) = m, with the equations
to match the moments. Finally, set Pr{S(bj) = a | S(bj−1) = m} := Pr(Vj = a) where Vj ~ EBB(m, λj, γj). Evaluation of Pr(|Zk|, |Zl| ≥ bj) follows from steps similar to the proof of Theorem 1.
Note that we can generalize the scheme described above to calculate p-values for many different supremum-based global tests by adopting the general approach of Moscovich-Eiger and Nadler (2017). As long as the test statistic can be inverted to create the bounds b1, …, bd, we can use equation (7) to calculate its p-value when applied to correlated observations. In particular, we can use this procedure to perform p-value calculations for the HC, GHC, BJ, and GBJ statistics.
3.5. The Omnibus Test
While we will show that the Generalized Berk-Jones test possesses an attractive finite sample rejection region when signals are moderately sparse, GBJ may also lose power in the presence of very sparse or dense signals. As SNP-set inference involves testing for a composite alternative H1 : β ≠ 0, there is no uniformly optimal test for both sparse and dense alternatives. Since signal sparsity varies between genes, the best test will also change from gene to gene, but it is unknown prior to scanning the genome. Thus we propose an omnibus test that offers robust power over a range of different sparsity levels.
The omnibus test is constructed by combining the SKAT, GBJ, GHC, and minimum p-value statistics. The motivation for choosing these four methods is to combine tests that are known to have good power when signals are dense, moderately sparse, very sparse, and the sparsest possible, respectively. The MinP method uses the set’s largest marginal test statistic in magnitude |Z|(d) as a test statistic. When the Zj are independent, Donoho and Jin (2004) showed that MinP asymptotically reaches the same detection boundary as HC and BJ in the very sparse regime k ≤ d1/4 but not the moderately sparse regime d1/4 < k < d1/2. In finite samples, MinP can have better power than the other three tests when there are only one or two causal SNPs. SKAT aggregates individual-level genotypes through a kernel function and performs a variance component test treating the SNP effects β as random effects. In our notation, the unweighted linear kernel SKAT test statistic can be written as . SKAT is known to demonstrate high power when signals in a SNP-set are dense.
The omnibus test first applies each of the four tests to the same SNP-set, and then it carries forward the smallest p-value from the four tests as a test statistic. Specifically, the omnibus test statistic is defined as:
where pGBJ, pGHC, pSKAT, and pMinP denote the p-values of the four tests applied on the same SNP-set. As these tests are applied to the same data, the four p-values will be correlated.
Calculations of the p-value for OMNI must again account for the correlation between tests. We employ a Gaussian copula approximation for the joint distribution of the inverse-normal transformed p-values:
where φM(·;R) denotes the joint cumulative distribution function of a multivariate normal distribution with mean vector zero and correlation matrix R. The correlation matrix R of the four component test statistics is estimated through parametric bootstrap under the null. For each subject i in the study, we simulate a new outcome based on the null mean . When individual-level data are not available, we take to be the same constant for all subjects as an approximation. Then each of the four tests are applied with the simulated outcome instead of the original one. The original design matrix, or approximated design matrix if working with summary statistics, is used each time. Each of the four p-values is subtracted from 1 and then inverse-normal transformed; under the null hypothesis the four transformed values have marginal normal distributions with mean zero. As we only need to estimate the correlation matrix R, only a small number of parametric bootstrap samples are needed. In practice, this procedure is repeated 100 times, and then we set R equal to the sample correlations of the inverse-normal transformed statistics. We will see that this omnibus test performs well across a variety of settings.
4. Rejection Region Analysis of Different SNP-Set Tests
We study in this section the finite sample rejection regions for the BJ, GBJ, HC, and GHC tests, and we advocate for viewing these statistics as boundary-defining algorithms. Consider a fixed SNP-set s with size d = ds and correlation structure Σ = Σs and suppose we wish to conduct inference at level α = 0.01. Using the p-value calculation from above, we can employ standard root-finding routines to find the observed value gs that would result in a GBJ p-value of 0.01 for a SNP-set with these characteristics. Then inverting gs to find boundary points b1, b2, …, bds as in Section 3.4 constructs a rejection region in terms of |Z|(1), …, |Z|(ds). That is, if the observed value of |Z|(j) were larger than bj for any j = 1, 2, …, ds, then the GBJ p-value for this set would be less than 0.01. Plotting the bounds for various tests using different values of α, d, and Σ shows us exactly what types of signals a given test is well-powered to detect at level α. For the same setting, a test with smaller bounds is preferred, as it will provide more finite sample power.
To numerically compare the rejection boundaries, consider a simplified model of SNP-set correlation structure where the set is partitioned into only two sections. Let one section be the independence section, where all SNPs in this portion are completely independent of all other SNPs in the set. Let the other section be the correlated section, where all SNPs in this portion have common pairwise correlation ρ with other SNPs in the section. For our numerical study, ρ = 0.3 for the correlated section. We investigate SNP-sets of size d = 20 and 100, correlated sections which contain 50% and 75% of the SNPs, and tests at size α = 0.01. These parameters are chosen to represent reasonable boundaries on the correlation structures seen in common GWAS data; Dawson et al. (2002) estimated that the average r2 between SNPs separated by 100kb is around 0.1.
The rejection regions for each SNP-set are plotted in Figure 1. At the jth coordinate on the x-axis, if the observed |Z|(j) lies above the boundary of a particular test at that coordinate, then we would reject the null for that test at level α = 0.01. The lines on the graph are added to aid in visualization, but there should be no interpretation of interpolation between the points. It does not make sense to think of the boundary at |Z|(2.5), for example. While standard methods for inference on HC and BJ are invalid in the presence of correlation, valid p-values for these tests can be computed with the same ideas we have introduced for GBJ inference, specifically following equations (5)–(7). Thus we can show that HC and BJ sometimes have much less desirable rejection regions when SNPs in a set are correlated.
Fig. 1.
Rejection region of Berk-Jones, Generalized Berk-Jones, Higher Criticism, and Generalized Higher Criticism tests, plotted according to the order statistics of the absolute values of the test statistics. If the jth smallest test statistic in magnitude is greater than the boundary point for a given test at any point j on the x-axis, then we would reject the null using that test at level α = 0.01. The difference between BJ and GBJ becomes much more pronounced as both the size of the set and the amount of correlation increase.
One of the clearest trends from Figure 1 is that the HC and GHC boundaries are lower for a small region around |Z|(d), and then the BJ and GBJ boundaries quickly become smaller as we move left. This behavior indicates that HC and GHC are better at detecting the sparsest alternatives with only a very few signals, as those signals would almost always manifest as the test statistics with the largest magnitude. In contrast, the plots demonstrate that BJ and GBJ can have more power to detect weaker, less sparse signals that may be more easily found by examining the test statistic which is, say, fifth or tenth largest in magnitude. The boundaries of HC and GHC can drop below BJ and GBJ again for the smallest observed magnitudes, but signals would only be found in these observations if they are particularly dense, a setting that is not the focus of our efforts. The intuition we can glean from this figure is closely aligned with the theoretical development of Donoho and Jin (2004) and the simulations of Li and Siegmund (2015) when the marginal test statistics Zj are independent. These authors showed that HC is attuned to detect sparse signals arising at the very tail of the observed distribution, while BJ has more power as the number of signals rises.
These results also show why BJ is likely to have low power for detecting sparse signals when the level of correlation is high. When 75% of the SNPs are correlated, the rejection boundary for BJ at the largest few observations is the highest by multiple orders of magnitude on the p-value scale. It would not be desirable to apply BJ in these types of settings, as the test loses an extremely large amount of sensitivity to detect signals in the most outlying values. BJ is still likely to be suitable for detecting dense signals in these situations. Here, GBJ acts as a compromise between BJ and GHC under high correlation. GBJ provides a much lower boundary than BJ at the tail in exchange for slightly higher boundaries near the middle. Thus, GBJ can detect both sparse and dense signals in this example. On the other hand, GBJ provides a slightly higher boundary than GHC at the tail in exchange for lower boundaries past the tail, so it trades some power in the extremely sparse regime for more power to detect moderately sparse signals.
We see that choosing a different statistic is essentially choosing a different boundary-setting algorithm, and this choice should ideally be informed by parameters such as the amount of correlation and estimated sparsity level. Ultimately these plots illustrate that there is no single best global test for all types of alternatives. A genome-wide analysis strategy using the omnibus test will be likely to demonstrate robust power across different sparsity settings, correlation structures, and SNP-set sizes. An alternative visualization with plots on the p-value scale is provided in the Supplementary Materials.
5. Simulation Results
5.1. Type I Error of the Generalized Berk-Jones Test
We first illustrate that our p-value calculations for the GBJ and omnibus tests are accurate enough to control the Type I error rate at levels required to declare genome-wide significance of a SNP-set. To replicate the setting of traditional GWAS data, we perform the size simulation on random regions across chromosome 5 that correspond to known gene sizes. We also conduct the size simulation on a high-LD subset of the FGFR2 gene and a low-LD subset of the FGFR2 gene to parse LD-related effects. We choose FGFR2 because it contains both high and low LD regions and because it will later be the most significant gene in our analysis of the CGEMS data. All SNP-sets contain genotypes simulated with HAPGEN2 (Su et al. 2011) using the CEU population from HapMap3 as a reference panel.
In all simulations the outcome is generated as Yi ~ N(0, 1) for i = 1, 2, …, 2000, and we fit the linear regression model (1) with β = 0 and Xi = 1. Each simulation is repeated 40 million times, and we report the Type I error down to 10−6. Table 1 shows that our GBJ p-value calculation is accurate and protects the correct size for correlation structures seen in actual data. The p-value calculation for the omnibus test is slightly anti-conservative at the most stringent significance levels and slightly conservative at larger significance levels.
Table 1.
Type I error of GBJ computed over 40 million simulations. The strong LD setting denoted by SLD refers to eight SNPs from FGFR2 that are highly correlated. The weak LD setting denoted by WLD refers to eight SNPs from FGFR2 that demonstrate only a small amount of correlation with each other. The Chr5 setting refers to over 500 regions on chromosome 5 corresponding to known gene sizes.
| Nominal α | GBJ, WLD | GBJ, SLD | GBJ, Chr5 | OMNI, WLD | OMNI, SLD | OMNI, Chr5 |
|---|---|---|---|---|---|---|
| 1·10−2 | 9.67·10−3 | 8.50·10−3 | 8.45·10−3 | 7.04·10−3 | 8.13·10−3 | 7.04·10−3 |
| 1·10−3 | 9.83·10−4 | 9.19·10−4 | 8.38·10−4 | 7.07·10−4 | 8.87·10−4 | 6.92·10−4 |
| 1·10−4 | 9.84·10−5 | 9.73·10−5 | 8.55·10−5 | 8.26·10−5 | 1.06·10−4 | 7.69·10−5 |
| 1·10−5 | 9.75·10−6 | 1.04·10−5 | 9.25·10−6 | 1.09·10−5 | 1.36·10−5 | 9.87·10−6 |
| 1·10−6 | 1.02·10−6 | 1.10·10−6 | 1.17·10−6 | 1.50·10−6 | 1.65·10−6 | 1.58·10−6 |
5.2. Power of GBJ Under Varying Hypothetical Correlation Structures and Sparsity Settings
To study the power of GBJ, we first conduct simulations under a variety of hypothetical correlation structures and sparsity settings. The performance of the GBJ is compared to the minimum p-value test, GHC, SKAT, and the omnibus test described in Section 3.5. The MinP test p-value is calculated by casting MinP as a boundary-defining test with bj = |Z|(d) for all j, and then computation proceeds through the methods described in Section 3.4. GBJ and GHC similarly use the p-value calculation described above. For SKAT, we run the corresponding R package.
To study how power is impacted by different correlation structures between the SNPs, we utilize block correlation structures that are slightly more complex than those used for the rejection region analysis in Section 4. Specifically, consider a set of causal SNPs that are correlated amongst themselves with common pairwise correlation ρ1. All other SNPs are then non-causal, and we allow half of them to have an exchangeable correlation structure with correlation ρ3; the other half of the non-causal SNPs are completely independent of all other non-causal SNPs. Finally the pairwise correlation between a causal SNP and a non-causal SNP is set at ρ2. The three correlations ρ1, ρ2, ρ3 will vary between 0 and 0.3. All SNPs are generated to have minor allele frequency of 0.3.
We demonstrate the effects of signal sparsity by using a large SNP-set of d=100 SNPs and varying the number of causal SNPs from k = 1 to k = 10. This allows us to examine power profiles in the very sparse regime (one to three causal SNPs), in the moderately sparse regime (four to nine causal SNPs), and at the edge of the dense regime (ten or more causal SNPs). The true disease model is
| (8) |
where all the βj are the same and depend on the number of causal SNPs k. We reduce βj slightly as k increases in order to keep the power of each test below one throughout the entire sparse regime. The full details on effect size for each simulation are available in the Supplementary Materials. Figure 2 considers the case where the noise SNPs are independent, and Figure 3 considers the case where the noise SNPs are correlated. We perform 500 simulations at each different value of k and test at α = 0.01. All the power curves are smoothed to show empirical power.
Fig. 2.
Power of set-based tests when noise SNPs are independent. On the left, all SNPs are completely independent of each other. On the right, causal SNPs are correlated within themselves at ρ1 = 0.3. As the number of causal SNPs increases, we slightly decrease the effect size to keep power bounded away from one.
Fig. 3.
Power of set-based tests when there is correlation within noise SNPs (left) and across all SNPs (right). The correlation structure on the right is slightly simpler than the previous three structures, as we switch to an exchangeable correlation matrix in order to accommodate ρ2 = 0.3 while still ensuring the correlation matrix is positive definite. As the number of causal SNPs increases, we decrease the effect size to keep power bounded away from one.
The first significant trend appearing in Figure 2 is the effect of sparsity on power. Among the non-omnibus tests, we see that GHC and MinP perform well when the number of causal SNPs is low, as these tests often have the most power in the very sparse regime. In both panels of Figure 2, the transition to GBJ having more power than GHC and minP occurs in the moderately sparse regime. Then as the number of causal SNPs increases into the dense regime, SKAT begins to catch up and eventually becomes the most powerful test. This behavior matches our intuition as well as previously published simulation results. GHC and minimum p-value place excess weight on the most outlying observations, so they are well-tuned to detect the very sparse signals. The rejection region of GBJ is better-suited to find moderately sparse signals, and SKAT is known to perform well with dense signals.
The relationship between sparsity and power can be modified by the total amount of correlation. In the left panel of Figure 3 we set ρ1 = ρ3 = 0.3, which corresponds to the situation where causal SNPs and non-causal SNPs are correlated within themselves, but the two groups are independent of each other. In this case, MinP and GHC become the top-performing tests for a larger range of sparsity settings, with GBJ losing some of its advantage in the moderately sparse regime. SKAT has almost no power in these situations, as the signals are sparse and there is no correlation between causal and non-causal SNPs. It appears that a large amount of correlation between the non-causal SNPs is detrimental to the performance of GBJ. An explanation for this behavior can be found in the rejection region analysis of Figure 1. We see that the bounds of GBJ appear less favorable compared to GHC when the amount of correlation is high. Since over half of the SNPs in Figure 3 are correlated with ρ1 = ρ3 = 0.3, these settings represent a much larger amount of total correlation than was present in Figure 2.
In the right panel of Figure 3 we investigate the setting of ρ2 ≠ 0 by using an exchangeable correlation structure, and SKAT dominates as the most powerful test across almost all sparsity levels. Here we break slightly from the above framework by using exchangeable correlation to accommodate ρ2 = 0.3 while still ensuring the correlation matrix of the SNPs is positive definite. GBJ is a close second to SKAT under most sparsity settings with these parameters. SKAT is known to have good performance in the presence of LD between causal SNPs and noise SNPs, which makes signals appear to be dense. The increased density of signals also buoys the performance of GBJ compared to GHC and minimum p-value, which perform the worst under exchangeable correlation.
Never losing too much power compared to the best test, the omnibus test’s largest advantage appears to be its robustness to LD structure and sparsity. This behavior is expected as our omnibus approach integrates information from tests that perform well across multiple sparsity settings. Thus we would anticipate that OMNI is more resilient than any single test.
GBJ also demonstrates good power in a variety of situations, and its overall strength and robustness across the entire sparse regime are attractive properties. In particular, GBJ is the best-performing test when the level of sparsity is moderate and there is weak correlation among the noise SNPs. GBJ is disadvantaged against minP and GHC when signals are extremely sparse or there is excess correlation among noise SNPs, but it outperforms these tests as signals become more dense. In contrast, GBJ provides slightly less power than SKAT as signals reach the dense regime or when there is moderate correlation between causal and non-causal SNPs, but GBJ is also much more robust than SKAT when signals are sparse and not correlated with noise SNPs. Similar to OMNI, GBJ does not fall far behind the best test in any given situation, which suggests that GBJ is a good choice to use when the signal sparsity is unknown. Power for the standard BJ is not plotted in the interest of space, but it behaves like a dense test, similar to SKAT, under correlation. This behavior again matches Figure 1, which showed that BJ is more suited to detect dense signals as the amount of correlation increases. The formidable effect of correlation structure on power is further explored in the Supplementary Materials, where we note the number of causal SNPs needed to produce a power of at least 80% when we hold constant the effect size at βj = 0.1 and βj = 0.15. For example, under exchangeable correlation and βj = 0.1, all tests reach 80% power with only three causal SNPs, but under independence the best tests require seven causal SNPs to reach 80% power.
5.3. Power of GBJ under Actual Chromosome 5 Correlation Structures
We conduct one final simulation to investigate the power of Generalized Berk-Jones under the unstructured LD patterns found in real GWAS data. In this simulation, we choose blocks of 40 SNPs at random locations on chromosome 5, and then genotype data are generated using HAPGEN2. We choose 40 to again approximately match the characteristics of FGFR2. There are 2000 blocks chosen for each sparsity level, and 10 simulations are performed on each block, for a total of 20000 at each number of causal SNPs. Again the effect size is decreased as the number of causal SNPs increases, with the outcome still generated from equation (8). Testing is performed at α = 1·10−5 to mimic a practical analysis.
We see in the left panel of Figure 4 that GBJ, GHC, and the omnibus test all have very similar power curves in this setting, while SKAT and minP lag slightly behind. As the number of causal SNPs increases, GBJ demonstrates the best power by a small amount. These results are rather homogenous because sparsity levels are more coarse and because the parameters are a mix of the values defined in Figures 2 and 3.
Fig. 4.
Power of set-based tests with correlation structures found in actual chromosome 5 data. Left panel presents all simulations and right panel shows only SNP-sets where median |ρ3|< 0.1. The effect size begins at βj = 0.15 and falls to βj = 0.1 so that power remains below one as the number of causal SNPs increases. GBJ is the best-performing test in both panels as the number of causal SNPs increases.
Recall that GBJ extended its advantage over GHC when there was less correlation among noise SNPs. When restricting our analysis to the blocks that have median |ρ3| < 0.1, we see a more substantial superiority of GBJ over GHC in the moderately sparse regime, similar to Figure 2. Thus it is possible to recover the patterns of the structured simulation in real genotype data. Obviously median |ρ3| is not a perfect summary measure, as it cannot single-handedly capture all the parameters in a 40 × 40 correlation matrix. Further parsing of the data would be necessary to see larger differences in performance. In a practical setting, we might switch between tests based on certain SNP-set characteristics, such as applying GBJ when the set is large and likely to have moderately sparse signals. These results do again demonstrate the robustness of GBJ across multiple situations, as it provides the most power across a large portion of the sparse regime.
6. Gene-Level Analysis of the CGEMS GWAS Data
The CGEMS breast cancer dataset contains a case-control sample of 1145 breast cancer cases, all postmenopausal women with European ancestry, and 1142 controls recruited from the Nurses’ Health Study. These women were genotyped at approximately 550000 SNPs with the Illumina HumanHap500 array. The dataset was originally analyzed by Hunter et al. (2007) in the single-marker GWAS approach. The authors did not find any individual SNPs to reach the genome-wide significance level of 5×10−8, but they highlighted FGFR2 as a strong candidate for future studies based on four SNPs in the gene that showed suggestive association with breast cancer. Such a situation succinctly illustrates the burden of adjusting for multiple comparisons when testing individual SNPs. Gene-level analysis provides an attractive alternative strategy that can reduce the number of comparisons and also aggregate evidence of signals across multiple SNPs in a gene. Here we perform a gene-level analysis to study the association between genes and breast cancer risk.
Since individual-level genotype data were available for this study, we first calculated the marginal test statistics for each SNP using the model in Section 2.1. Specifically, we fit a logistic regression model with four covariates - age and the first three genotype principal components to control for population structure (Price et al. 2006). Then, for each of 14991 genes, we collected the marginal test statistics for all SNPs located within the region defined by that gene. Each gene with more than one marginal SNP test statistic was analyzed with GBJ, GHC, SKAT, MinP, and the omnibus test.
In Table 2, we rank the top ten genes according to the smallest p-value produced by any of the five tests. Diagnostic QQ-plots are available in the Supplementary Materials. In this sample, GBJ provides the strongest evidence of association for the top four genes and five of the top ten. Most of these genes are ranked highly by multiple other methods, however no other method provides the lowest p-value for more than two of the top ten genes. In fact, GHC and MinP produce the smallest p-value only once between the two of them. One possible explanation for the underperformance of GHC and MinP is that there may be multiple tagged SNPs surrounding the true causal loci for each of these genes, which could create a lack of extremely sparse alternatives.
Table 2.
Top significant genes in gene-level analysis of CGEMS breast cancer GWAS data, ranked by minimum p-value produced by any of the five tests. The test that produces the smallest p-value for each gene is highlighted in bold.
| Gene | GHC | GBJ | MinP | SKAT | OMNI | d |
|---|---|---|---|---|---|---|
| FGFR2 | 2.84·10−5 | 4.58·10−6 | 8.20·10−5 | 3.32·10−5 | 2.58·10−5 | 35 |
| CNGA3 | 3.00·10−4 | 4.04·10−5 | 1.75·10−3 | 8.34·10−5 | 1.84·10−4 | 26 |
| PTCD3 | 1.21·10−4 | 5.50·10−5 | 3.16·10−4 | 1.87·10−4 | 6.83·10−5 | 12 |
| POLR1A | 9.58·10−5 | 6.19·10−5 | 4.62·10−4 | 4.23·10−4 | 3.87·10−4 | 17 |
| ZNF263 | 4.89·10−4 | 3.90·10−4 | 8.09·10−4 | 1.26·10−3 | 6.84·10−5 | 3 |
| VWA3B | 4.20·10−4 | 2.32·10−4 | 1.43·10−3 | 1.48·10−4 | 4.87·10−4 | 51 |
| TBK1 | 7.04·10−4 | 3.35·10−4 | 1.27·10−3 | 1.48·10−4 | 6.05·10−4 | 11 |
| ABCA1 | 3.74·10−3 | 1.65·10−4 | 7.92·10−3 | 4.99·10−4 | 2.26·10−4 | 63 |
| MMRN1 | 2.31·10−4 | 5.51·10−4 | 1.72·10−4 | 3.34·10−2 | 7.73·10−4 | 10 |
| TIGD7 | 5.79·10−4 | 3.78·10−4 | 1.32·10−3 | 1.33·10−3 | 2.05·10−4 | 4 |
The lowest p-value for any gene over all five tests is produced by testing FGFR2 with GBJ, supporting the conclusions of Hunter et al. (2007). Since FGFR2 appears to have signals coming from at least four different SNPs and contains 35 SNPs in total, it would seem to fall into the category of moderate signal sparsity, where GBJ has good performance. Thus we may have expected beforehand that GBJ would be the most powerful test for this gene. FGFR2 has been further validated as a breast cancer associated locus in multiple follow-up studies (Meyer et al. 2008; Liang et al. 2008).
Besides FGFR2, genes such as PTCD3 and POLR1A have also been implicated as risk loci in independent investigations (Boehm et al. 2007; Jia et al. 2011). The overlap of our findings with other studies and other statistics provides a level of reassurance that GBJ performs well in identifying truly significant genes and not simply spurious associations. Alternately, ABCA1 is an example of a gene that may not have received further scrutiny if we were not utilizing the GBJ test. ABCA1 expression has been linked with breast cancer risk (Smith and Land 2012), but MinP and GHC do not provide the same strength of evidence that GBJ does. It seems likely that there are more than a few signal SNPs in ABCA1, especially since ABCA1 contains a relatively large number of SNPs compared to the other genes in this dataset.
Perhaps due to the limited sample size, no test produces a p-value low enough to be declared significant after Bonferroni correction for 14991 genes. Still, this analysis highlights the advantages of Generalized Berk-Jones compared to alternative tests. The GBJ p-value for FGFR2 does come very close to the Bonferroni-corrected level (3.34×10−6), and it certainly provides more evidence of association than the individual SNP statistics. Additionally, GBJ often gives the highest measure of significance, and never the lowest, in the genes displayed. While these results do not necessarily indicate better performance than the other methods presented, the findings are consistent with our observations from the simulation study, and we do see that GBJ demonstrates robustness across different set sizes and LD patterns in a real dataset. Power simulations using the parameters from Figure 4 suggest that the GBJ power to detect four causal SNPs at the Bonferroni-corrected level is approximately 23%, 53%, and 74% with sample sizes of n = 1000, 2000, and 4000, respectively (see Supplementary Materials).
7. Discussion
We have proposed the Generalized Berk-Jones statistic to test for association between a SNP-set and an outcome. Our GBJ generalizes the standard Berk-Jones by modifying the BJ statistic to directly account for the correlation between individual SNPs. This modification results in a test that is more powerful when SNPs are in LD. We also provide an analytic p-value calculation for GBJ and generalize it to a class of supremum-based global tests, allowing valid inference for HC, GHC, BJ, and other methods when these procedures are applied as SNP-set tests using correlated marginal test statistics. Both the p-value calculation and the Generalized Berk-Jones test are implemented in the R package GBJ, available on the CRAN repository. Rejection region analysis demonstrates that GBJ can be described as a compromise between Berk-Jones and Higher Criticism-type tests in terms of finite sample performance.
While our numerical analysis shows situations where GBJ does not set the lowest boundary at either |Z|(d) or |Z|(d/2), GBJ generally comes very close to the lowest boundary at both locations, which affords it both robustness to signal sparsity and power to detect moderately sparse signals. GHC and HC often set the lowest boundary around |Z|(d), but in return they concede a large amount of volume past the first few most extreme observations, which lowers power in the moderately sparse regime. BJ frequently sets the lowest boundary past the tail, but its tail boundary can be orders of magnitude larger than that of GBJ, HC, and GHC. Bounds in the expected signal regions must be viewed holistically, so slightly lower bounds at a few locations are not necessarily desirable if the price is much higher bounds in other signal locations, as in the case of BJ. Thus GBJ offers good power to detect moderately sparse effects without losing too much power when individual SNP signals are extremely sparse.
Simulation results reinforce the conclusions we find from examining the rejection regions of GHC and GBJ. Additionally we see that the MinP test performance is quite good when signals are very sparse, similar to GHC, but MinP does not perform as well as GHC when signals become more dense. SKAT has a unique power profile, as it can be particularly powerful when signals are dense or there is correlation between causal and non-causal SNPs, but it is also not robust to different correlation structures and will often have very little power in sparse settings where there is no correlation between causal and non-causal SNPs. The omnibus test offers robust power across different sparsity levels, and while it is rarely the best test, it also never shows the worst power. When applied to data from the CGEMS study, we see that GBJ often produces the most significant p-values, perhaps owing to its versatility across different parameter settings.
In practice, it is often important to estimate power and perform sample size calculations prior to conducting an association study. An exact power calculation for GBJ relies on computing an expression similar to Equation (5) and is difficult when the size of a set is not small. However, as mentioned in Section 6, power can be quickly approximated by conducting simulations similar to those in Figures 2, 3, and 4. To evaluate power for a genome-wide analysis, genes can be drawn at random and their correlation structures can be determined using data from publicly available reference panels or any array of interest. An exact analytical power calculation for larger sets is currently in development.
In demonstrating that the BJ statistic can be adapted for increased robustness to correlation, we have also demonstrated that these types of boundary-defining algorithms can be modified to increase finite sample power under specific set-level parameters. It would be of interest to develop different boundary-defining methods that offer more favorable rejection regions in narrow but well-defined settings. For example, it may be possible to define tests that outperform GHC or minP over finer partitions of the extremely sparse regime. In a similar vein, it would be interesting to understand the boundary shapes for other previously proposed boundary algorithms (Jager and Wellner 2007) in the class of Berk-Jones and Higher Criticism. While many of these algorithms share the same asymptotic guarantees of BJ and HC, little is known about their comparative finite sample performance, especially when observations in a set are correlated. These other methods might also have great value in specific settings as mentioned above. It would additionally be very valuable to develop SNP-set tests that are optimal in certain senses for arbitrary sparsity and correlation.
As genomic data collection techniques continue to evolve, it may be necessary to adapt the GBJ as well. In particular, the rise of whole genome sequencing and fine mapping studies is leading to the discovery of more SNPs with extremely rare minor alleles. Marginal test statistics generated from these SNPs are known to be non-Gaussian in finite samples, and thus they will not possess the distribution we assume for GBJ. GBJ will need to account for null distributions that are not standard normal before SNP-sets containing rare variants can be tested.
Supplementary Material
Acknowledgments
This work was supported by the National Institutes of Health grants R35-CA197449, P01-CA134294, U19-CA203654, and R01-HL113338. The authors would like to thank the editor, associate editor, and referees for helpful comments that have improved the paper.
Footnotes
SUPPLEMENTARY MATERIAL
The supplementary materials provide the proof of Theorem 1 from Section 3.3, offer further details on how to calculate the exact p-value from Equation (5) in Section 3.4, demonstrate the accuracy of the p-value calculation from Section 3.4, give an alternative visualization of the rejection region plots from Section 4, list the exact simulation parameters and provide further power results from Section 5, show diagnostic QQ-plots from the analysis of Section 6, and evaluate the accuracy of the summary statistic correlation approximation using data from Section 6.
Contributor Information
Ryan Sun, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
Xihong Lin, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115; Department of Statistics, Harvard University, Cambridge, MA 02138.
References
- 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature, 526(7571 ):68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barnett I, Mukherjee R, and Lin X (2017). The generalized higher criticism for testing SNP-set effects in genetic association studies. Journal of the American Statistical Association, 112(517):64–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berk RH and Jones DH (1979). Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift fr Wahrscheiniichkeitstheorie und verwandte Gebiete, 47(1):47–59. [Google Scholar]
- Boehm JS, Zhao JJ, Yao J, Kim SY, Firestein R, Dunn IF, Sjostrom SK, Garraway L, Weremowicz S, and Richardson A (2007). Integrative genomic approaches identify IKBKE as a breast cancer oncogene. Cell, 129:1065. [DOI] [PubMed] [Google Scholar]
- Conneely K and Boehnke M (2007). So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. The American Journal of Human Genetics, 81:1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, Dibling T, Tinsley E, Kirby S, Carter D, Papaspyridonos M, Livingstone S, Ganske R, Lhmussaar E, Zernant J, Tonisson N, Remm M, Mgi R, Puurand T, Vilo J, Kurg A, Rice K, Deloukas P, Mott R, Metspalu A, Bentley DR, Cardon LR, and Dunham I (2002). A first-generation linkage disequilibrium map of human chromosome 22. Nature, 418(6897):544–548. [DOI] [PubMed] [Google Scholar]
- Donoho D and Jin J (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics, 32(3):962–994. [Google Scholar]
- Hunter D, Kraft P, Jacobs K, Cox D, Yeager M, Hankinson S, Wacholder S, Wang Z, Welch R, Hutchinson A, and Wang J (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics, 39(7):870–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jager L and Wellner JA (2007). Goodness-of-fit tests via phi-divergences. The Annals of Statistics, 35(5):2018–2053. [Google Scholar]
- Jia P, Zheng S, Long J, Zheng W, and Zhao Z (2011). dmGWAS: dense module searching for genome-wide association studies in proteinprotein interaction networks. Bioinformatics, 27(1) :95–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee S, Abecasis GR, Boehnke M, and Lin X (2014). Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics, 95(1):5–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li B and Leal SM (2008). Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics, 83:311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J and Siegmund D (2015). Higher criticism: p-values and criticism. Annais of Statistics, 43(3): 1323–1350. [Google Scholar]
- Liang J, Chen P, Hu Z, Zhou X, Chen L, Li M, Wang Y, Tang J, Wang H, and Shen H (2008). Genetic variants in fibroblast growth factor receptor 2 (FGFR2) contribute to susceptibility of breast cancer in Chinese women. Carcinogenesis, 29(12):2341–2346. [DOI] [PubMed] [Google Scholar]
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, and McCarthy ΜI (2009). Finding the missing heritability of complex diseases. Nature, 461(7265):747–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCullagh P and Nelder JA (1989). Generalized Linear Models. CRC press. [Google Scholar]
- Meyer KB, Maia A-T, O’Reilly M, Teschendorff AE, Chin S-F, Caldas C, and Ponder BA (2008). Allele-specific up-regulation of FGFR2 increases susceptibility to breast cancer. PLoS Biology, 1(5):e108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moscovich-Eiger A and Nadler B (2017). Fast calculation of boundary crossing probabilities for poisson processes. Statistics & Probability Letters, 123:177–182. [Google Scholar]
- Moscovich-Eiger A, Nadler B, and Spiegelman C (2016). On the exact berk-jones statistics and their p-value calculation. Electronic Journal of Statistics, 10(2):2329–2354. [Google Scholar]
- Pasaniuc B and Price AL (2016). Dissecting the genetics of complex traits using summary association statistics. Nature Genetics Reviews, 18:117–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice RL (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81(394):321–327. [Google Scholar]
- Price AL, Patterson NJ, Plenge RM, Weinblatt ΜE, Shadick NA, and Reich D (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8):904–909. [DOI] [PubMed] [Google Scholar]
- Smith B and Land H (2012). Anticancer activity of the cholesterol exporter ABCA1 gene. Cell Reports, 23:580–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su Z, Marchini J, and Donnelly P (2011). HAPGEN2: simulation of multiple disease snps. Bioinformatics, 27(16):2304–2305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tong YL (2012). The multivariate normal distribution. Springer Science & Business Media. [Google Scholar]
- Visscher PM, Brown MA, McCarthy ΜI, and Yang J (2012). Five years of GWAS discovery. The American Journal of Human Genetics, 90(1):7–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walther G (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures In From Probability to Statistics and Back: High-Dimensional Models and Processes, volume 9, pages 317–326, Beachwood, OH: IMS. [Google Scholar]
- Wu MC, Kraft P, Epstein ΜP, Taylor DM, Chanock SJ, Hunter DJ, and Lin X (2010). Powerful SNP-set analysis for case-control genome-wide association studies. The American Journal of Human Genetics, 86(6):929–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu MC, Lee S, Cai T, Li Y, Boehnke M, and Lin X (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1):82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




