Abstract
We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal
-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.
Keywords: Genome-wide association study, Multiple testing, Nonconvex optimization, p-value weighting, Weighted Bonferroni method
1. Introduction
The research presented in this paper is motivated by the genetics of human longevity. Genome-wide association studies of longevity compare long-lived individuals with matched controls (Brooks-Wilson, 2013). More than 500 000 genetic variants have been tested for their association with longevity, which amounts to a large multiple hypothesis testing problem. In addition to multiplicity, the sample size is small, usually of the order of a few hundred. As a consequence, only a few loci have been replicably associated with human longevity, and they do not explain the heritability of the trait (Hjelmborg et al., 2006).
The multiplicity may be countered by testing only a few candidate variants selected based on prior scientific knowledge. In a separate work in preparation, led by the second author, we find that a more general genome-wide test helps to improve power in a study of longevity. We leverage prior information from genome-wide association studies of age-related diseases, such as coronary artery disease and diabetes. For this task, we develop a new large-scale method of frequentist multiple testing with Bayesian prior information. In this paper we provide the theory for this method.
Our method is a novel
-value weighting scheme;
-value weighting is a general methodology for multiple testing that leverages independent prior information to improve power (Roeder & Wasserman, 2009; Gui et al., 2012). Suppose that we test hypotheses
via the
-values
for
. For a significance level
, the weighted Bonferroni method declares the
th hypothesis to be significant if
. The weights
are based on independent data. The familywise error rate, the probability of making at least one error, is controlled if the weights average to 1, as it equals at most
.
In previous work, optimal weights have been found in a Gaussian model of hypothesis testing. Let the test statistics in the current study be
, where the
are the means, or effect sizes; we test the null hypotheses
against
. We have some information about the
from prior studies. Roeder & Wasserman (2009) and Rubin et al. (2006) considered a model where the
are known exactly from the prior data, and the weights are allowed to depend on the
. In such a model they found the optimal weights for the weighted Bonferroni method, which maximize the expected number of discoveries. We show that this amounts to solving a convex optimization problem.
The assumption that the
are known precisely is problematic: if they were known, there should be no follow-up study. In practice, empirical estimates of the
are used. However, the fixed-
weights do not take into account the uncertainty in the estimates. Instead, we account for uncertainty explicitly by considering the model with uncertain prior information in the form
. Only the prior means
and standard errors
are known from independent data, not the precise effect sizes. Finding the optimal weights, which we call Bayes weights, is then a nonconvex optimization problem.
Westfall et al. (1998) formulated a general framework that includes this problem as a special case and allows, for instance, for Student
-distributed priors. They used a direct numerical solver, a quasi-Newton optimization method which generally scales as
, to find the weights. Published examples using this approach are typically small (Westfall et al., 1998; Westfall & Soper, 2001). This method of computing the weights does not scale up for our problems, which involve more than 500 000 genetic variants. Further, the generic quasi-Newton method has no guarantee of finding the global optimum of the nonconvex problem.
Our key contribution here is to provide an efficient method of finding the weights that maximize average power for the weighted Bonferroni method, in the model with Gaussian priors. We solve the optimization problem exactly for small
, less than a problem-dependent value which is often between
and
. For larger
, we can solve the problem for a nearby
such that
. The cost per iteration of our algorithm is
in the first case and
in the second case. We observe that a nearly constant number of iterations is used, regardless of
. We find it remarkable that this problem admits a near-exact solution.
For large-scale problems, this approach leads to a method for multiple testing that controls a frequentist error measure while also taking into account Bayesian prior information. This method follows George Box's advice to be Bayesian when predicting but frequentist when testing (Box, 1980). Similar ideas were used previously by Carlin & Louis (1985); see §2. As mentioned, a more general formulation was also considered in Westfall et al. (1998) and Westfall & Soper (2001). We show that our approach is feasible for large-scale problems.
When prior information is uncertain, we show via simulations that the new method has more power and is more stable than competitors. We also show theoretically that weighting leads to substantially improved power. We apply the method to genome-wide association studies. By analysing several such datasets, we show that our method has advantages in terms of power and easier tuning compared to other methods.
With rapidly increasing volumes of data available as prior information for any given study, our method should be useful for other problems in biology and elsewhere. The data analysis and computational results in this paper are reproducible, and an open-source implementation of the method is available from the authors.
2. Related work
There is a large literature on statistical methods for multiple testing with prior information, some of which is reviewed in Roeder & Wasserman (2009) and Gui et al. (2012). Spjøtvoll (1972) devised optimal single-step multiple testing procedures maximizing average or minimal power and controlling the familywise error rate. Later it was recognized that Spjøtvoll's results are equivalent to optimal
-value weighting methods. For instance, Benjamini & Hochberg (1997) developed extensions of Spjøtvoll's methods for
-value weighting, allowing for weights also in the importance of the hypotheses.
Leveraging Spjøtvoll's results, Rubin et al. (2006) and Roeder & Wasserman (2009) found an explicit formula for optimal weights of the weighted Bonferroni method in the Gaussian model
, assuming that the effects are known exactly. In practice the effects are estimated, but the weights do not take this into account. These weights are optimal for average power, and this efficient method is suitable for large applications. Eskin (2008) and Darnell et al. (2012) applied the framework of Roeder & Wasserman (2009) to genome-wide association studies; they accounted for correlations between the tests but assumed that the effects are known exactly.
Another popular approach is to test the top candidates from a prior study, often known as two-stage testing or candidate study. It can be viewed as a
-value weighting method where some of the weights equal zero. A specific version for genome-wide association studies has been called the proxy-phenotype method (Rietveld et al., 2014).
In the literature on carcinogenicity trials, related methods have been devised to select tumour sites based on historical data (Carlin & Louis, 1985; Louis & Bailey, 1990); the methods are explicitly Bayesian with regard to historical data and frequentist in analysing current data. These models and methods differ from ours, and focus on pairwise comparisons based on Fisher's exact test (Louis & Bailey, 1990).
Westfall et al. (1998) considered a Gaussian model
for the effects in hypothesis testing where prior distributions are known for the means. They formulated the problem of finding the weights that maximize expected power for the weighted Bonferroni method, and this was followed up for binary data in Westfall & Soper (2001), motivated by carcinogenicity trials. As mentioned in §1, published studies using their optimization methods are typically small.
Less work exists on weighted methods beyond the single-step Bonferroni method, or beyond the control of the familywise error rate. The step-down method of Holm (1979) can use weights, and Westfall & Krishen (2001) and Westfall et al. (2004) discuss the choice of optimal weights. Genovese et al. (2006) showed that the weighted Benjamini–Hochberg procedure controls the false discovery rate, and Roquain & Van De Wiel (2009) proposed a method of choosing weights optimally, assuming fixed known effects. Peña et al. (2011) developed a general framework for optimal multiple decision functions for the control of familywise error rate and false discovery rate, assuming exact knowledge of the alternatives.
In this paper we focus on the familywise error rate, because it is the standard measure of error controlled in our motivating application, and because in this case it is already challenging to find the optimal weights accounting for uncertainty on a large scale. Extension of this work to the Benjamini–Hochberg procedure and to false discovery rate control is left for future research.
3. Theoretical results
3.1. Background
We work in the Gaussian means model of hypothesis testing: we observe test statistics
and test each null hypothesis
against
. The
-value for testing
is
, where
denotes the normal cumulative distribution function.
For a weight vector
and a significance level
, the weighted Bonferroni procedure rejects
if
. Usually this corresponds to
. For general weights, the expected number of false rejections, known as the per-family error rate, equals
. If
, the expected number of false rejections is at most
. By Markov's inequality, this implies that the familywise error rate is at most
. Hence the weighted Bonferroni method controls the familywise error rate. This result does not require independence of the
. We assume always that
, and usually that
. Without loss of generality, we restrict the weights to the interval
.
Let us denote the number of rejections by
, where
is the indicator function. The optimal weights maximizing the expected number of discoveries, assuming a priori known effects
, were found explicitly by Roeder & Wasserman (2009) and independently by Rubin et al. (2006). Denoting by
the expectation with respect to
, they solved the constrained optimization problem
![]() |
(1) |
It was not noted previously that this problem is convex. The objective is a sum of terms of the form
, whose concavity follows directly by differentiation. Yet, by simple Lagrangian optimization, the above papers showed that if all
, the optimal weights are
where
![]() |
(2) |
Here
is the unique normalizing constant such that the weights sum to
. Interestingly, the weights are not monotonic as a function of
, but are largest for intermediate values of
. As noted by Roeder & Wasserman (2009), formula (2) is a direct consequence of Spjøtvoll's theory of optimality in multiple testing (Spjøtvoll, 1972). Accordingly, we call these weights the Spjøtvoll weights.
3.2. Weighting leads to substantial power gain
To illustrate theoretically that
-value weighting can lead to increased power, we compare the power of optimal weighting with that of unweighted testing in a sparse mixture model.
First, we note that
-value weighting exploits the heterogeneity of the tests. In the simplest case there are only large and small negative effects, say
. We consider the
limit, and for simplicity we suppose that
. Let the fractions of large and small effects be
and
, respectively, so that
of the means equal
and the remaining
equal zero. We solve for the optimal weights.
Proposition 1 —
There is a set of optimal
-value weights that gives the same weights to the same means, i.e., weights
and
to means 0 and
, respectively, where
Further, the power of the optimal
-value weighting method is
If the absolute effect size
is small enough that
, all the weight is placed on the larger means, which is the behaviour we would expect intuitively. However, if
is large enough that
, then it is advantageous to place some weight on the small means, because a large absolute effect size
will be detected with high probability.
In Fig. 1(a) we plot the ratio
for
, where
is the power of unweighted Bonferroni testing. For most effect sizes
and for
, we see a power gain of at least 50% relative to unweighted Bonferroni testing. Moreover, there is a hotspot where the power gain can be three- to four-fold. Optimal weighting can lead to a significant gain in power.
Fig. 1.
Power gain and nonconvexity: (a) contour plot of the power ratio of optimal to unweighted testing for sparse means; (b) plots of four different instances of the function that is summed in the optimization objective; the nonconvex summand
with
is plotted for the pairs
(solid),
(dashed),
(dotted) and
(dot-dashed).
3.3. Weights with imperfect prior knowledge
In the previous sections it was assumed that the effects
are known precisely. We now assume that we have uncertain prior information in the form
.
Following Westfall et al. (1998), we maximize the expected power
averaged with respect to the random
and
. Introducing
, the optimization problem, which we call the Bayes weights problem, becomes
![]() |
(3) |
This objective function is not concave if any
. To help with visualization, the function
is plotted in Fig. 1(b) for four parameter pairs
. On the interval
, the function is first concave and then convex.
Our main contribution is to solve this problem efficiently for large
. The results in this respect are two-fold. First, we can solve the problem exactly in the special case where
is sufficiently small. Second, we have a nearly exact solution for arbitrary
. Starting with the simpler first case, we define
![]() |
A weighted one-sided test
can be written equivalently in terms of the critical values as
. It turns out that the critical values corresponding to the optimal Bayes weights can be expressed in terms of
, when
is small enough that
![]() |
(4) |
In our data analysis examples and simulations, this mild restriction requires that
be below values in the range
to
. In the next result we give the exact optimal weights for small
when all
.
Theorem 1 —
If the significance level
is small enough that (4) holds, then the optimal Bayes weights maximizing the average power (3) are
, where
is the unique constant such that
.
In the Supplementary Material, we solve this problem by maximizing the Lagrangian. Two key properties that we use are joint separability of the objective function and constraint, and analytic tractability of the Gaussian density.
Figure 2 displays an instance of the optimal weights
as a function of the prior mean
and the standard deviation
. In the theorem the weights are a function of
, but they can also be viewed as a function of
via the natural map
. As the standard error
becomes small, our weights tend to the Spjøtvoll weights.
Proposition 2 —
For any
and
, the Bayes weight function defined by
tends to the Spjøtvoll weight function defined in (2) as
.
Fig. 2.
Bayes weights: (a) surface plot and (b) contour plot of the Bayes weight function
defined in Theorem 1; Spjøtvoll weights are on the segment
.
With
, the weights are regularized: more extreme weights are shrunk towards a common value in a nonlinear way. For finite
, our weights can be viewed as a smooth interpolation between Spjøtvoll weights and uniform weights. It is reasonable to think at first that as all
, the best weight allocation becomes the uniform one. However, this is not the case: a symmetry-breaking phenomenon occurs due to nonconvexity.
Consider a weight vector
that equals
for
indices, and assume that
is not an integer. Distribute the remaining strictly positive weight equally among the remaining hypotheses. It is now easy to see that the hypotheses with weights equal to
are always rejected, so their power equals 1. For the remaining hypotheses the power
tends to
as
. This shows that the limiting power of this unbalanced weighting scheme is
. For uniform weighting, the power tends to
as
, for each hypothesis. This shows that the limiting power of uniform weighting is
. Hence, the power of the skewed weighting scheme is larger than that of uniform weighting. This illustrates the symmetry-breaking phenomenon caused by the extreme nonconvexity of the optimization problem.
Fortunately, the situation is better when condition (4) holds. In addition to being easy to check for any given parameters
and
, we now show that the constraint is mild. Often we want to keep
small even if
is large, because
is the number of false rejections that we tolerate. In this regime, the condition holds as long as there are a few average-sized negative prior means
. We denote by
the normal quantile function.
Proposition 3 —
Condition (4) holds if there are
distinct indices
with negative
, for which
If
, then
, so the simple condition holds provided that
For instance, if
and
, then
. If, moreover,
so that
, and
, then we need only ten effect sizes with
. This is a weak requirement.
When
is small, we use a damped Newton's method to find the right constant
from Theorem 1 via a one-dimensional line search. The function evaluations cost
per iteration, and empirically we find that the algorithm takes only a small number of iterations to converge, independently of
. We can solve problems involving more than two million tests in a few seconds on a desktop computer.
Now we present our result for the general case.
Theorem 2 —
For any
, the nonconvex Bayes weights problem can be solved for a nearby
for which
. The optimal weights and
can be found in
steps.
This result is relevant when
, the expected number of errors under the null hypothesis, is controlled at a threshold greater than 1/2. Our weights will be optimal for a
that is close to
. We see from the proof that even for large
,
often equals
. The method also returns the value of
, which the user can inspect. It is then the user's decision as to whether to perform multiple testing adjustment at the original level
or at the new level
.
The analysis of nonconvex optimization problems is challenging. It seems remarkable that the nonconvex Bayes weights problem admits a nearly exact solution.
4. Simulation studies
4.1. Bayes weights are more powerful than competing weighting schemes
We perform two simulation studies to explore the empirical performance of our method. First, we show that Bayes weights increase power more reliably than two other weighting schemes, namely exponential weights and filtering.
For Bayes weights, we multiply the variances by a dispersion factor
, i.e.,
. The default value for this tuning parameter is
and, as discussed in §5.3, we recommend use of the default value in most cases. The purpose of changing the dispersion is to explore the robustness of our method with respect to misspecification of the prior variances. The dispersion ranges from 0 to 4, and Spjøtvoll weights correspond to
.
Exponential weights with tilt parameter
are defined as
, where
. This weighting scheme was proposed by Roeder et al. (2006), who recommend
as the default value. We consider the range
. As noted by Roeder et al. (2006), exponential weights are sensitive to large means. To guard against this sensitivity, we truncate weights larger than
and redistribute their excess weight among the next largest weights.
Filtering methods test only the most significant effects
, using the unweighted Bonferroni method. These methods can be viewed as weighting schemes in which some weights are zero. Such methods are known under many names, such as two-stage testing, screening, or proxy-phenotype methods (Rietveld et al., 2014). We adopt the term filtering used by Bourgon et al. (2010), who filter based on independent information in the current dataset rather than prior information. The threshold
ranges from
to 0. If
is large and fewer than
hypotheses would be tested, then we instead test the most significant
hypotheses.
In the simulation, we generate
random means and variances independently according to
and
, and we set
. For any weight vector
, we calculate the power as the objective from (3) divided by
, to reflect the average power per test.
The results are shown in Fig. 3(a). Each method can improve the power over unweighted testing. However, Bayes weights yield more power than the other methods. The best power is attained when the dispersion
is equal to 1, but good power is reached in a large neighbourhood of
. Our weights are robust with respect to misspecification of the tuning parameter.
Fig. 3.
(a) Power of four
-value weighting methods plotted as a function of their parameter: unweighted (solid), Bayes (dashed) as a function of the dispersion
, exponential (dotted) as a function of
, and filtering (dot-dashed) as a function of
; the Spjøtvoll weights correspond to the point at the origin
on the Bayes weights curve. (b) Power comparison for sparse means: deterministic (left) and average (right) power plotted as a function of the proportion of large means
, for the unweighted (solid), Spjøtvoll (dashed) and Bayes (dotted) methods.
In particular, taking uncertainty into account helps. Spjøtvoll weights, which assume fixed and known effects, and are represented on the figure as regularized weights with
, have less power than Bayes weights with positive
, for a wide range of
.
The remaining two methods, filtering and exponential weights, have disadvantages. While filtering yields a gain in power for a thresholding parameter
, it also leads to a substantial power loss for
. For sufficiently large
the power equals
, because only the top
hypotheses are selected. Another significant disadvantage is that there seems to be no principled way to choose
a priori without additional assumptions. Similarly, exponential weighting leads to at most a small gain in power, and it usually leads to a power loss.
We conclude that Bayes weights are robust with respect to the choice of the tuning parameter and have uniformly good power. In contrast, exponential weighting and filtering are more sensitive, and their power can drop substantially.
4.2. Bayes weights have a worst-case advantage
We show that Bayes weights have a worst-case advantage compared to Spjøtvoll weights. We use the sparse means model and generate
means
distributed as
, where
and
. We set
and vary
from 0 to 0
1. We set all
to equal
and consider
or 1.
Spjøtvoll weights are optimal for
, while Bayes weights are optimal for
. We evaluate these weighting schemes by calculating the power that they do not maximize, i.e., the average power (3) for Spjøtvoll weights and the deterministic power (1) for Bayes weighting. We also compute the power of the unweighted Bonferroni method.
The results are displayed in Fig. 3(b). Bayes weights lose only a little power compared to the optimal Spjøtvoll weights. In contrast, Spjøtvoll weights lose a lot of power relative to Bayes weights, which maximize the worst-case power. Bayes weights show a maximin property. Further, as shown in the Supplementary Material, Spjøtvoll weights lose power near
because they set the weights equal to zero on the small means.
5. Application to genome-wide association studies
5.1. Review of genome-wide association studies
We adapt our framework to genome-wide association studies, relying on basic notions of quantitative genetics (see, e.g., Lynch & Walsh, 1998). In this section we present in detail the methodology for this application, while also illustrating the steps of using our framework for specific problems.
We study a quantitative trait
in a population, with the goal of understanding the effects of single nucleotide polymorphisms
on the trait. We assume that
has mean 0 and known variance; here
denotes the centred minor allele count of variant
for an individual. We rely on the linear model for the effect of the
th variant on the trait:
. In this model
is the phenotype of a randomly sampled individual from the population, so
is random,
is a fixed unknown constant, and
is the residual error. This error is a zero-mean random variable that is independent of
, with variance
.
Suppose that we observe a sample of
independent and identically distributed observations from this model. We use the standard linear regression estimate
, which for a large sample size has an approximate distribution
. To standardize, we divide by
, where
is the variance of
.
With these steps, we have framed our problem in the Gaussian means model. Writing
and
, we have
, which has the required form. Let us also define the standardized effect size
, which will be of key importance.
5.2. Prior information
To use prior information, assume that we also have a prior trait
which is measured independently on a different, independent sample from the same population. With the same assumptions on
, we can write
. Here
is a fixed unknown constant, and
is random. Suppose that we have independent samples of size
and
for the two traits. If we define
and
by analogy to the definitions for
, we can write
.
We model the relatedness of the two traits as a relation between the standardized effect sizes
, which do not depend on the sample size. If the two traits are closely related, the first-order approximation is equality, or
. This model captures the pleiotropy between the two traits (Solovieff et al., 2013).
The final step is to compute the distribution of
given the prior data
. For this we need to choose a prior for
, and for simplicity we will use a flat prior.
We now have all ingredients for the model of Gaussian hypothesis testing with uncertain information. Specifically, we have
, where
,
and
.
The uncertainty in
may be different from 1, and may exceed it due to overdispersion. This is one way to weaken the first-order approximation assumption. To allow for overdispersion, we recall the parameter
used in our simulation. We model the prior data as
, and then the variance becomes
. The default value
is recommended in most cases. Finally, we compute the Bayes weights
with parameters
and
, and we run the weighted Bonferroni method on the current
-values. This fully specifies the method, which is summarized in the following algorithm.
Algorithm 1 —
Bayes-weighted Bonferroni multiple testing in genome-wide association studies
Let
be the prior effect sizes for
.
Let
and
be the prior and current sample sizes.
Let
be the current
-values.
Let
be the significance threshold; the default value is
.
Let
be the dispersion; the default value is
.
Set the prior means and variances:
and
.
Compute the Bayes weights
, defined via (3), with parameters
and
.
Output indices
such that
.
5.3. Practical remarks
It is important that we retain Type I error control even when the modelling assumptions fail. The only requirement is that we have marginally valid
-values. We list two common deviations from our model. First, summary data for genome-wide association studies sometimes include only the magnitude of the effects and not their sign. In this case we have two choices: we could assume that the directions of effects are the same, and perform a one-tailed test of the current effect in the prior direction; alternatively, we could do a two-tailed test by including the tests with prior parameters
and
for each
, for a total of
tests. Large effects will often be in the same direction, whereas small effects may change direction between the prior and current studies. Our procedure for dealing with two-sided effects may lead to minor power loss while retaining Type I error control. On the other hand, in some cases the prior and current traits can be of different types; for instance, the prior trait could be binary and the current trait quantitative. In such a situation, the model
should be re-examined, but it is still convenient to use as a first approximation.
We recommend using the default value of the tuning parameter,
, in all but exceptional cases. This value was derived from a natural Bayesian model, and our simulations and data analysis show that it provides good performance in most cases. The same numerical results demonstrate that our method is not too sensitive to the choice of tuning parameter. If the relationship between the two traits is thought to be weak, one could use a larger
, such as
. If the uncertainty in the prior information is less than that suggested by the usual model, one could use a small
, such as
. If the value
was tried first, the results of that analysis should also be reported.
One may wish to use the weighted Benjamini–Hochberg method with our weights (Genovese et al., 2006); but in general this will be underpowered, as optimal weights for stepwise methods differ greatly from those for single-step methods (Westfall & Soper, 2001). However, in the special case of very small
, in our data analysis examples we have observed that weights often become monotonically increasing with the magnitude of the effect size, and thus are similar to the optimal weights for stepwise methods.
6. Data analysis
6.1. Data sources
We illustrate the application of our method by analysing data from publicly available genome-wide association studies. We use the
-values, recorded for 500 000 to 2
5 million genetic variants, from five studies: CARDIoGRAM and C4D for coronary artery disease (Schunkert et al., 2011; Coronary Artery Disease Genetics Consortium, 2011), blood lipids (Teslovich et al., 2010), schizophrenia (Schizophrenia Psychiatric Genome-Wide Association Study Consortium, 2011), and estimated glomerular filtration rate creatinine (Köttgen et al., 2010); see the Supplementary Material.
We analyse three pairs of datasets, with a specific motivation for each. First, we use CARDIoGRAM as prior information for C4D. This is a positive control for our method, since both studies measure coronary artery disease. We choose C4D as the target because it has a smaller sample; hence prior information may increase power more substantially.
Second, we use the blood lipids study as prior information for the schizophrenia study. Andreassen et al. (2013) demonstrated improved power with this pair. They used a fully Bayesian method, and our goal is to evaluate the power improvement using a frequentist method. There is a small overlap between the controls of the two studies.
Third, we use the creatinine study as prior information for the C4D study. Heart disease and renal disease are comorbid (Silverberg et al., 2004), so this set-up may improve power.
6.2. Methods and additional details
We run weighted Bonferroni multiple testing for each of five weighting schemes. The prior data is
, where
is the
th prior
-value. The familywise error rate is controlled at
, so that the
-value thresholds are approximately
to
.
The first four weighting schemes are: unweighted Bonferroni testing, where all weights equal unity; Spjøtvoll weights with parameters
; Bayes weights with dispersion
or 10; and exponential weights (Roeder et al., 2006), introduced in §4.1, with tilt
or 4.
The fifth and last weighting scheme is filtering, which selects the smallest
-values from the prior study and tests their hypotheses in the current study. We use three
-value thresholds,
,
and
. Rietveld et al. (2014) proposed a method for choosing the optimal
-value threshold for filtering, which requires the genotypic correlation between the two traits and the additive heritability of the current trait. For complex traits, these parameters are usually estimated with large uncertainty, and substantial domain expertise is needed to specify them.
We prune the significant single nucleotide polymorphisms for linkage disequilibrium using the DistiLD database (Palleja et al., 2012). Specifically, for each weighting scheme we select one locus from each linkage disequilibrium block that contains significant loci. Our data analysis pipeline is given in the Supplementary Material.
We compute a score
for each weighting scheme
with parameters
, on each dataset
. This is defined as
1 if the weighting scheme increases the number of detections relative to unweighted testing, 0 if it leaves the number unchanged, and
1 otherwise. The score
of a weighting scheme
with parameters
is the sum of scores across datasets. The total
of the weighting scheme
is the sum of scores
across parameters.
6.3. Results
Table 1 shows the number of significant loci for each pair of studies and for each weighting scheme. We also present the results pruned for linkage disequilibrium, which act as a proxy for the number of independent loci found.
Table 1.
Number of significant loci for five methods on three examples: the top portion of the table shows results pruned for linkage disequilibrium, the middle part shows results without pruning, and the bottom portion reports the score of each method
Bayes![]() |
Exp![]() |
Filter![]() |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Parameter | Un | Spjot | ![]() |
1 | 10 | 1 | 2 | 4 | 2 | 4 | 6 |
| Pruned | |||||||||||
CG C4D |
4 | 11 | 10 | 8 | 4 | 4 | 5 | 4 | 10 | 10 | 6 |
Lipids SCZ |
4 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | 2 | 2 | 2 |
eGFRcrea C4D |
4 | 2 | 2 | 4 | 4 | 4 | 5 | 4 | 1 | 0 | 1 |
| Unpruned | |||||||||||
CG C4D |
29 | 45 | 44 | 39 | 29 | 32 | 34 | 27 | 40 | 48 | 34 |
Lipids SCZ |
116 | 214 | 214 | 223 | 123 | 92 | 0 | 0 | 217 | 96 | 39 |
eGFRcrea C4D |
29 | 18 | 18 | 23 | 29 | 29 | 28 | 19 | 1 | 0 | 1 |
| Scoring | |||||||||||
| Score | 0 | 0 | 0 | 1 | 1 | 0 | 0 | -1 | 0 | -1 | -1 |
| Total | 0 | 0 | ![]() |
![]() |
|
||||||
Un, unweighted; Spjot, Spjøtvoll; Bayes
, Bayesian with
or 10; Exp
, exponential
with
or 4; Filter
, filtering with
or 6; CG, CARDIoGRAM; SCZ, schizophrenia
study; eGFRcrea, creatinine study.
The results are somewhat inconclusive. In the positive control example, all weighting schemes except exponential weighting detect more loci than unweighted testing. Spjøtvoll weighting and filtering lead to the largest number of loci. In the blood lipids example, the methods generally detect fewer pruned loci, except for Bayes weights with
. The methods can detect both a larger and a smaller number of unpruned loci, except in the case of Bayes weights, which uniformly increase the number of loci. For the eGFR creatinine example, exponential weights produce the best behaviour. We also see that the default
never performs worse than both unweighted testing and Spjøtvoll weights, and for the unpruned lipids example it is better.
If we allow tuning of parameters for the three weighting schemes that have such a parameter, Bayes weights show good performance: they are either first or second in all examples. This shows that our method is robust with respect to the choice of tuning parameter.
Finally, only Bayes weights with
or 10 have a positive score. The total score, summed across parameter settings, is also positive only for Bayes weights. Judging from these results, our method shows promise. However, from this analysis alone we cannot establish conclusively the relative merits of the methods. In future work it will be necessary to evaluate
-value weighting methods on more datasets.
Supplementary material
Supplementary material available at Biometrika online includes proofs of the theoretical results, software implementations in R and MATLAB, and code to reproduce the simulations and data analysis results.
Acknowledgments
Kristen Fortney and Stuart Kim are also affiliated with the Department of Genetics, Stanford University. This research was partially supported by the U.S. National Science Foundation and National Institutes of Health. We are grateful for the reviewers' constructive comments, which have helped to improve the paper.
References
- Andreassen O. A., Djurovic S., Thompson W. K., Schork A. J., Kendler K. S., O'Donovan M. C., Rujescu D., Werge T., van de Bunt M. & Morris A. P. et al. (2013). Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. Am. J. Hum. Genet. 92, 197–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y. & Hochberg Y. (1997). Multiple hypotheses testing with weights. Scand. J. Statist. 24, 407–18. [Google Scholar]
- Bourgon R., Gentleman R. & Huber W. (2010). Independent filtering increases detection power for high-throughput experiments. Proc. Nat. Acad. Sci. 107, 9546–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Box G. E. P. (1980). Sampling and Bayes' inference in scientific modelling and robustness (with Discussion). J. R. Statist. Soc. A 143, 383–430. [Google Scholar]
- Brooks-Wilson A. R. (2013). Genetics of healthy aging and longevity. Hum. Genet. 132, 1323–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlin B. J. & Louis T. A. (1985). Controlling error rates by using conditional expected power to select tumor sites. In Proc. Biopharm. Sect., Am. Statist. Assoc. Alexandria, Virginia: American Statistical Association, pp. 11–8.
- Coronary Artery Disease Genetics Consortium (2011). A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nature Genet. 43, 339–44. [DOI] [PubMed] [Google Scholar]
- Darnell G., Duong D., Han B. & Eskin E. (2012). Incorporating prior information into association studies. Bioinformatics 28, i147–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eskin E. (2008). Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information. Genome Res. 18, 653–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
-
Genovese C. R., Roeder K. & Wasserman L. (2006). False discovery control with
-value weighting. Biometrika
93, 509–24. [Google Scholar] - Gui J., Tosteson T. D. & Borsuk M. E. (2012). Weighted multiple testing procedures for genomic studies. BioData Mining 5, article no. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hjelmborg J., Iachine I., Skytthe A., Vaupel J. W., McGue M., Koskenvuo M., Kaprio J., Pedersen N. L. & Christensen K. (2006). Genetic influence on human lifespan and longevity. Hum. Genet. 119, 312–21. [DOI] [PubMed] [Google Scholar]
- Holm S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70. [Google Scholar]
- Köttgen A., Pattaro C., Böger C. A., Fuchsberger C., Olden M., Glazer N. L., Parsa A., Gao X., Yang Q. & Smith A. V. et al. (2010). New loci associated with kidney function and chronic kidney disease. Nature Genet. 42, 376–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Louis T. A. & Bailey J. K. (1990). Controlling error rates using prior information and marginal totals to select tumor sites. J. Statist. Plan. Infer. 24, 297–316. [Google Scholar]
- Lynch M. & Walsh B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland: Sinauer Associates. [Google Scholar]
- Palleja A., Horn H., Eliasson S. & Jensen L. J. (2012). DistiLD Database: Diseases and traits in linkage disequilibrium blocks. Nucleic Acids Res. 40, D1036–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peña E. A., Habiger J. D. & Wu W. (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39, 556–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rietveld C. A., Esko T., Davies G., Pers T. H., Turley P., Benyamin B., Chabris C. F., Emilsson V., Johnson A. D. & Lee J. J. et al. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Nat. Acad. Sci. 111, 13790–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roeder K. & Wasserman L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24, 398–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roeder K., Bacanu S.-A., Wasserman L. & Devlin B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78, 243–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roquain E. & Van De Wiel M. A. (2009). Optimal weighting for false discovery rate control. Electron. J. Statist. 3, 678–711. [Google Scholar]
- Rubin D., Dudoit S. & Van der Laan M. (2006). A method to increase the power of multiple testing procedures through sample splitting. Statist. Applic. Genet. Molec. Biol. 5, 1–19. [DOI] [PubMed] [Google Scholar]
- Schizophrenia Psychiatric Genome-Wide Association Study Consortium (2011). Genome-wide association study identifies five new schizophrenia loci. Nature Genet. 43, 969–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schunkert H., König I. R., Kathiresan S., Reilly M. P., Assimes T. L., Holm H., Preuss M., Stewart A. F., Barbalic M. & Gieger C. et al. (2011). Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature Genet. 43, 333–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silverberg D., Wexler D., Blum M., Schwartz D. & Iaina A. (2004). The association between congestive heart failure and chronic renal disease. Curr. Opin. Nephrol. Hypertens. 13, 163–70. [DOI] [PubMed] [Google Scholar]
- Solovieff N., Cotsapas C., Lee P. H., Purcell S. M. & Smoller J. W. (2013). Pleiotropy in complex traits: Challenges and strategies. Nature Rev. Genet. 14, 483–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spjøtvoll E. (1972). On the optimality of some multiple comparison procedures. Ann. Math. Statist. 43, 398–411. [Google Scholar]
- Teslovich T. M., Musunuru K., Smith A. V., Edmondson A. C., Stylianou I. M., Koseki M., Pirruccello J. P., Ripatti S., Chasman D. I. & Willer C. J. et al. (2010). Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westfall P. H. & Krishen A. (2001). Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plan. Infer. 99, 25–40. [Google Scholar]
- Westfall P. H. & Soper K. A. (2001). Using priors to improve multiple animal carcinogenicity tests. J. Am. Statist. Assoc. 96, 827–34. [Google Scholar]
- Westfall P. H., Krishen A. & Young S. S. (1998). Using prior information to allocate significance levels for multiple endpoints. Statist. Med. 17, 2107–19. [DOI] [PubMed] [Google Scholar]
- Westfall P. H., Kropf S. & Finos L. (2004). Weighted FWE-controlling methods in high-dimensional situations. In Recent Developments in Multiple Comparison Procedures, Y. Benjamini, F. Bretz and S. Sarkar, eds. Beachwood, Ohio: Institute of Mathematical Statistics, pp. 143–54.






























































