Multiple Testing under Dependence via Semiparametric Graphical Models

Jie Liu; Chunming Zhang; Elizabeth Burnside; David Page

. Author manuscript; available in PMC: 2014 Dec 31.

Published in final edited form as: JMLR Workshop Conf Proc. 2014 Dec 31;2014:955–963.

Multiple Testing under Dependence via Semiparametric Graphical Models

Jie Liu ¹, Chunming Zhang ², Elizabeth Burnside ³, David Page ⁴

PMCID: PMC4190841 NIHMSID: NIHMS612860 PMID: 25309970

Abstract

It has been shown that graphical models can be used to leverage the dependence in large-scale multiple testing problems with significantly improved performance (Sun & Cai, 2009; Liu et al., 2012). These graphical models are fully parametric and require that we know the parameterization of f₁ — the density function of the test statistic under the alternative hypothesis. However in practice, f₁ is often heterogeneous, and cannot be estimated with a simple parametric distribution. We propose a novel semiparametric approach for multiple testing under dependence, which estimates f₁ adaptively. This semiparametric approach exactly generalizes the local FDR procedure (Efron et al., 2001) and connects with the BH procedure (Benjamini & Hochberg, 1995). A variety of simulations show that our semiparametric approach outperforms classical procedures which assume independence and the parametric approaches which capture dependence.

1. Introduction

High-throughput computational biology studies, such as gene expression analysis and genome-wide association studies, often involve large-scale multiple testing problems which exhibit dependence in the sense that whether the null hypothesis of one test is true or not depends on the ground truth of other tests. Recently, new multiple testing procedures have been proposed with such dependence explicitly captured by graphical models such as hidden Markov models (Sun & Cai, 2009) and Markov-random-field-coupled mixture models (Liu et al., 2012). These graphical models are fully parametric, and they assume that we know not only the parameterization form of f₀, but also the parameterization form of f₁.¹ Eventually, a fully parametric graphical model is learned, and the multiple testing problem becomes an inference problem on the graphical model. This parametric approach is effective in some simple situations, but the assumptions for f₁ often make it impractical, as discussed next.

A long tradition in hypothesis testing is to derive test statistics and calculate P-values all under the null hypothesis ℋ₀. The distribution of the test statistic under ℋ₁ sometimes can be difficult to derive. Take for instance a two-proportion z-test, which tests whether two Bernoulli variables have the same parameter (i.e. P(head) in coin-flippings); the two-proportion z-test is widely used in case-control studies (e.g. comparing the minor allele frequencies in cases and controls). Under ℋ₀ (the two proportions are the same), the test statistic X asymptotically follows a standard normal 𝒩(0, 1). Under ℋ₁ (the two proportions are different), X asymptotically follows a standardized non-centered normal 𝒩(μ, 1) (μ ≠ 0) where μ depends on the odds-ratio of this genetic marker. When there are multiple genetic markers to be tested, f₀ remains 𝒩(0, 1), but f₁ becomes a mixture of Gaussians because these associated markers can have different odds-ratios and therefore different μ values (i.e. different effect sizes). In this situation, f₁ is no longer a simple parametric distribution. In a real-world genome-wide association study on breast cancer, we plot the estimated f₁ in Figure 1; obviously it is inappropriate to estimate f₁ with a simple parametric distribution. Note that this is not a problem for classical multiple testing procedures such as the BH procedure, whose calculations of P-values are done under ℋ₀, but this is a serious problem for the graphical-model-based procedures which require f₁ to be estimated parametrically. Therefore, the key question is whether we can still make use of the graphical models to leverage the dependence among the hypotheses without making assumptions about f₁.

Estimated f₁ in a real-world genome-wide association study on breast cancer.

In this paper, we propose a semiparametric graphical model to leverage the dependence among the hypotheses. In our model, f₁ is estimated nonparametrically and the remaining parts are estimated parametrically. More algorithmic details are introduced in Section 3 after we summarize the terminology in Section 2. Section 4 shows that the two widely-used multiple testing procedures, the BH procedure (Benjamini & Hochberg, 1995) and the local FDR procedure (Efron et al., 2001), estimate their parameters in the same semiparametric way to avoid assumptions about f₁. This unification demonstrates that the most appropriate way of using graphical models to capture the dependence is the semiparametric model in our paper rather than the fully parametric models (Sun & Cai, 2009; Liu et al., 2012). Simulations in Section 5 show that our semiparametric approach controls false discovery rate and reduces false non-discovery rate, compared with the baseline procedures. We apply the procedure to a real-world genome-wide association study on breast cancer in Section 6 and identify a number of genetic variants.

2. Preliminaries

FDR, FNR, Validity and Efficiency

When we test m hypotheses simultaneously, various outcomes can be described by Table 1 based on their ground truth and whether the hypotheses are rejected. False discovery rate (FDR), E(N₁₀/R|R>0)P(R>0), is the expected proportion of incorrectly rejected null hypotheses (Benjamini & Hochberg, 1995). False non-discovery rate (FNR), E(N₀₁/S|S>0)P(S>0), is the expected proportion of false non-rejections in those tests whose null hypotheses are not rejected (Genovese & Wasserman, 2002). An FDR procedure is valid if it controls FDR at a nominal level α. One valid procedure is more efficient than another if it has a smaller FNR. In multiple testing problems, we would like to control FDR at the nominal level and reduce FNR as much as possible.

Table 1.

Classification of tested hypotheses

	Retained	Rejected	Total
ℋ₀ IS TRUE	N₀₀	N₁₀	m₀
ℋ₀ IS FALSE	N₀₁	N₁₁	m₁
Total	S	R	m

Open in a new tab

Dependence in Multiple Testing

Classical multiple testing procedures usually assume independence among the hypotheses. The effects of dependence on multiple testing have been investigated with a focus on the validity issue, namely how to control FDR at the nominal level when dependence exists (Benjamini & Yekutieli, 2001; Finner & Roters, 2002; Reiner et al., 2003; Owen, 2005; Sarkar, 2006; Efron, 2007; Farcomeni, 2007; Romano et al., 2008; Strimmer, 2008; Wu, 2008; Blanchard & Roquain, 2009). Despite FDR-control challanges, dependence also brings opportunities for decreasing FNR. This efficiency issue has been investigated (Yekutieli & Benjamini, 1999; Genovese et al., 2006; Benjamini & Heller, 2007; Zhang et al., 2011), indicating FNR could be decreased by leveraging the dependence among hypotheses. Several approaches have been proposed, such as dependence kernels (Leek & Storey, 2008), factor models (Friguet et al., 2009) and principal factor approximation (Fan et al., 2012). Sun & Cai (2009) use a hidden Markov model to explicitly leverage chain dependence structures (Sun & Cai, 2009). Liu et al. (2012) extend such graphical-model-based approaches to general dependence structures via a Markov-random-field-coupled mixture model. Capturing the dependence in multiple testing in such an explicit manner is innovative, but it relies on the strong assumption that we know the parameterization of f₁, which is unrealistic in all but the simplest situations. Improper assumption of f₁ may make the testing procedure too liberal, e.g. Figure 4 of Sun & Cai (2009), or conservative, e.g. Figure 3 of Liu et al. (2012). In this paper, we build on the approach of Liu et al. (2012) and take the major step of relaxing this assumption by estimating f₁ adaptively.

3. Methods

3.1. Graphical models for Multiple Testing

Let x = (x₁, …, x_m) be a vector of test statistics from hypotheses (ℋ₁, …, ℋ_m) with their ground truth denoted by a latent Bernoulli vector θ = (θ₁, …, θ_m) ∈ {0, 1}^m, with θ_i = 0 denoting that the hypothesis ℋ_i is null and θ_i = 1 denoting that the hypothesis ℋ_i is non-null. In the work of Liu et al. (2012), the dependence among these hypotheses is represented as a binary Markov random field (MRF) on θ. The structure of the MRF is assumed to be known, and described by an undirected graph 𝒢(𝒱, ℰ) with the node set 𝒱 and the edge set ℰ. The dependence between ℋ_i and ℋ_j is denoted by an edge connecting node_i and node_j. The strength of dependence is captured by a potential function (parametrized by ϕ_ij, 0<ϕ_ij<1) on this edge. The degree of prior belief that ℋ_i is null is captured by the node potential function (parametrized by π_i, 0<π_i<1). Suppose that the probability density function of the test statistic x_i|θ_i=0 is f₀, and the density of x_i|θ_i=1 is f₁. Then (x, θ; π, ϕ, f₀, f₁) forms an MRF-coupled mixture model where π and ϕ are node potential functions and edge potential functions in the MRF. In the MRF-coupled mixture model, x is observed, and θ is hidden. We also need to estimate π, ϕ and f₁.²

For the reasons discussed in Section 1, it is often difficult to estimate f₁ with a simple parametric distribution. In order to avoid the f₁ assumption, we estimate f₁ adaptively via an indirect, nonparametric way, as introduced in Section 3.2. Then we estimate π and ϕ via a contrastive divergence style algorithm, as introduced in Section 3.3. Therefore the graphical model is learned semiparametrically — f₁ is learned nonparametrically and the MRF part is learned by estimating parameters π and ϕ. Finally, we perform marginal inference of θ|x with the learned model and reject hypotheses with a step-up procedure to control FDR, as introduced in Section 3.4. Figure 2³ shows the semiparametric MRF-coupled mixture model for the three dependent hypotheses ℋ_i, ℋ_j and ℋ_k.

The semiparametric graphical model for hypotheses ℋ_i, ℋ_j and ℋ_k with observed test statistics (*x_i, x_j, x_k*) and latent ground truth (θ_i, θ_j, θ_k).

3.2. Nonparametric Estimation of f₁

We cannot directly estimate f₁ from observed x because the ground truth θ is hidden. However, we can estimate f from observed x nonparametrically via kernel density estimation. Therefore, we can estimate f₁ indirectly using the rule of total probability

f (x) = p_{0} f_{0} (x) + (1 - p_{0}) f_{1} (x),

(1)

where p₀ is the proportion of null hypotheses. Since we know f₀ in advance (e.g. 𝒩(0,1)), we only need to estimate f and p₀ so as to estimate f₁.

Estimating p₀: We can estimate p₀ with the method in the work of Storey (2002), namely

{\hat{p}}_{0} (λ) = \frac{W (λ)}{(1 - λ) m},

(2)

where λ ∈ [0, 1) is a tuning parameter, and W(λ) is the total number of hypotheses whose P-values are above λ. The motivation of this estimation is that the P-values of null hypotheses are uniformly distributed on the interval (0, 1). If we assume all the hypotheses with P-values greater than λ are from null hypotheses, then W(λ)/(1 − λ) is the total number of null hypotheses. Therefore the right hand side of (2) is an estimate of p₀. Obviously, p̂₀(λ) overestimates p₀ because there might be nonnull hypotheses whose P-values are greater than λ, especially when λ is small. Therefore, a bias-variance trade-off presents in the choice of λ — a larger λ value yields less bias but brings in more variance. Storey et al. (2004) showed that the BH procedure coupled with p̂₀(λ) maintains strong control of FDR under mild conditions. In simulations, we test different λ values, and the results show that the performance of our multiple testing procedure is insensitive to different choices of λ.

Estimating f: Since we can observe all the test statistics x, we can estimate f directly via kernel density estimation (Rosenblatt, 1956). One may choose any kernel function and bandwidth parameter as long as they provide a reasonable estimate. A Gaussian kernel would be a natural choice. Nevertheless in our experiments, we use the Epanechnikov kernel because its computation burden is low, and it is optimal in a minimum variance sense (Epanechnikov, 1969). Finally we can get f̂, the nonparametric estimate of f.

Estimating f₁: With the estimated p̂₀ and f̂, we estimate f₁ as

{\hat{f}}_{1} (x) = \frac{\hat{f} (x) - {\hat{p}}_{0} f_{0} (x)}{1 - {\hat{p}}_{0}} .

(3)

3.3. Parametric Estimation of ϕ and π

The pairwise potential functions ϕ and the node potential functions π parametrize the Markov random field part of the model. In the simulations, we tie all the pairwise potential functions together, i.e. ϕ={ϕ}. In the real-world application in Section 6, we assume there are three types of edges (high correlation edges, medium correlation edges and low correlation edges), and there are three parameters, ϕ={ϕ_h, ϕ_m, ϕ_l}, corresponding to the three levels of correlation. We also tie all the node potentials in both the simulations and the real-world application, i.e. π={π}.

Parameter learning for MRFs is generally difficult due to the partition function. So far, the state-of-the-art parameter learning algorithms are based on contrastive divergence (Hinton, 2002), such as the persistent contrastive divergence (PCD) algorithm (Tieleman, 2008). Contrastive divergence algorithms are iterative algorithms which gradually update parameters by generating particles based on current estimates of parameters and then comparing the moments from the particles with the moments from the data. Contrastive divergence is related to pseudo-likelihood (Besag, 1975) and ratio matching (Hyvärinen, 2007a;b). However, contrastive divergence algorithms cannot be directly applied to our model because θ is hidden. Therefore, we modify the PCD algorithm as follows. Suppose we already generate particles for θ in the normal PCD algorithm. We further generate the particles for x using f₀ and f̂₁ conditional on the generated particles for θ. Then we update the parameters by comparing the moments from particles for x and the moments from the observed x. One systematic review of learning parameters in hidden Markov random fields is in the prior work of Liu et al. (2014).

3.4. Inference of θ and FDR Control

After we estimate f₁, ϕ and π, the MRF-coupled mixture model is fully specified, and the next importance step is to calculate the posterior probability that ℋ_i is null given all the observed statistics x, namely P(θ_i=0|x) for i = 1, …, m. This quantity is termed the local index of significance (LIS) (Sun & Cai, 2009), which reduces to local false discovery rate P(θ_i=0|x_i) when the hypotheses are independent. In our simulations and the real-world application, we use a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference for P(θ_i=0|x).

After we calculate the posterior marginal probabilities of θ (i.e. LIS), we use a step-up procedure (Sun & Cai, 2009) to decide which of the hypotheses should be rejected so as to control FDR at the nominal level α. We first sort LIS from the smallest value to the largest value. Suppose LIS₍₁₎, LIS₍₂₎, …, and LIS_(m) are the ordered LIS, and the corresponding hypotheses are ℋ₍₁₎, ℋ₍₂₎,…, and ℋ_(m). Let

k = max {i : \frac{1}{i} \sum_{j = 1}^{i} L I S_{(j)} \leq α} .

(4)

Then we reject ℋ_(i) for i = 1, …, k.

4. Connections with Classical Multiple Testing Procedures

We show that both the local FDR procedure (Efron et al., 2001) and the BH procedure (Benjamini & Hochberg, 2000; Genovese & Wasserman, 2004) can be regarded as semiparametric graphical models which do not consider dependence among the hypotheses. The local FDR procedure uses Bayes Theorem to calculate the posterior probability that ℋ_i is null given its observed test statistic x_i, namely

P (ℋ_{i} is null | X_{i} = x_{i}) = \frac{p_{0} f_{0} (x_{i})}{p_{0} f_{0} (x_{i}) + p_{1} f_{1} (x_{i})} .

(5)

This posterior probability is termed the local false discovery rate (Efron & Tibshirani, 2002). Note that our LIS reduces to local false discovery rate under the assumption of independence. Efron & Tibshirani (2002) recommend using empirical Bayes inference (Robbins, 1956) to calculate local false discovery rate as

P (ℋ_{i} is null | X_{i} = x_{i}) = \frac{{\hat{p}}_{0} f_{0} (x_{i})}{\hat{f} (x_{i})},

(6)

where f̂ is the empirical density of the test statistic and p̂₀ is an estimate of p₀. If we use θ_i to denote the ground truth of ℋ_i, its local false discovery rate is P(θ_i = 0|X_i=x_i). Therefore, we can use the graphical model in Figure 3(a) to denote it. Obviously, this model is exactly our semiparametric model in Figure 2, except that there are no pairwise potentials capturing the dependence because the local FDR procedure assumes independence among the hypotheses. The model for the local FDR procedure is also semiparametric because f₁ is nonparametrically estimated. Also note that the parameter π in our model reduces to the prior parameter p₀ in this simplified model.

The plate presentation of the semiparametric graphical models for local FDR and the BH procedure.

The following shows that the BH procedure is also a semiparametric model, but the observed statistic is modeled by a cumulative distribution function (CDF). Let P₍₁₎<…<P_(m) be the ordered P-values from the m tests and P₍₀₎=0. The BH procedure rejects any hypothesis whose P-value satisfies P ≤ P* with

P^{*} = max {P_{(i)} | P_{(i)} \leq \frac{i}{m} \frac{α}{p_{0}}},

(7)

which controls FDR at the level α (Benjamini & Hochberg, 1995; Storey, 2002; Genovese & Wasserman, 2002). The inequality in (7) can be rewritten as

\frac{p_{0} P_{(i)}}{i / m} \leq α .

(8)

Because a P-value is the CDF of f₀ at the value of its test statistic x, and i/m is the empirical CDF of f at the test statistic of ℋ_(i), (8) is further rewritten as

\frac{p_{0} F_{0} (x)}{\hat{F} (x)} \leq α,

(9)

where F₀ and F are the CDFs of f₀ and f respectively, and F̂ is an empirical version of F. Note that the left hand side of (9) is also an empirical Bayes inference, similar to (6). Therefore, both the BH procedure and the local FDR procedure can be interpreted as empirical Bayes inference, and the difference is that the BH procedure uses the CDFs whereas the local FDR procedure uses the density functions. Thus, we can present the BH procedure as the graphical model in Figure 3(b). This model is also semiparametric because F₁ is nonparametrically estimated. Therefore, both the local FDR procedure and the BH procedure are semiparametric graphical models which do not consider dependence among the hypotheses.

5. Simulations

We explore the empirical performance of our multiple testing procedure and three baseline procedures, including the local FDR procedure (Efron et al., 2001), the BH procedure (Benjamini & Hochberg, 2000; Genovese & Wasserman, 2004) and the procedure based on a parametric graphical model (Liu et al., 2012). Because we have the ground truth parameters, we have two versions of our multiple testing approach, namely an oracle procedure and a data-driven procedure. The oracle procedure knows the true parameters in the graphical model (including ϕ, π and f₁), whereas the data-driven procedure does not and has to estimate the graphical model in the semiparametric way introduced in Sections 3.2 and 3.3. Both the BH procedure and the local FDR procedure need an estimate of p₀; we use the same estimating method in Section 3.2 for a fair comparison. The local FDR procedure also needs an estimate of f, and we estimate it in the same way as in our data-driven procedure.

We choose the setup to be consistent with previous work (Sun & Cai, 2009; Liu et al., 2012) when possible. We consider two dependence structures, namely a chain structure and a grid structure. For the chain structure, we choose the number of hypotheses m=10,000. For the grid structure, we choose a 100×100 grid, which also yields 10,000 hypotheses. We test two levels of dependence strength, i.e. ϕ=0.8 and ϕ=0.6. We set π to be 0.4. We first simulate the ground truth of the hypotheses θ from P(θ; ϕ, π) and then simulate the test statistics x from P(x|θ; f₀, f₁). We assume that the observed x_i under the null hypothesis (namely θ_i=0) is from a standard normal 𝒩(0, 1). We test two different models for x_i under the alternative hypothesis (namely θ_i=1) as follows.

Model 1

x_i|θ_i=1 comes from a mixture of Gaussians

\frac{1}{3} 𝒩 (1, 1) + \frac{1}{3} 𝒩 (μ, 1) + \frac{1}{3} 𝒩 (5, 1) .

(10)

In total, we test nine values for μ, namely 1.4, 1.8, 2.2, 2.6, 3.0, 3.4, 3.8, 4.2 and 4.6. Different μ values yield different f₁ with different shapes.

Model 2

x_i|θ_i=1 comes from a Gaussian 𝒩(μ, 1) and μ has a prior of Gamma(2.0, β) where β is the scale parameter. We test six different values for β, namely 1.0, 1.2, 1.4, 1.6, 1.8 and 2.0. This model is designed to mimic the common situation in GWAS that common genetic variants have small effect sizes and rare genetic variants have large effect sizes (Manolio et al., 2009).

We compare three measures from these procedures. First, we check whether these procedures are valid, namely whether the FDR yielded from these procedures is controlled at the nominal level α. The nominal FDR level α is 0.10, which is consistent with the multiple testing literature (Efron, 2010). Second, we compare the FNR yielded from these procedures. The third measure is the average number of true positives (ATP) of these procedures. Valid procedures with a lower FNR and a higher ATP are considered to be more efficient (or powerful). In the simulations, each experiment is replicated 500 times and the average results are reported.

Performance under chain structure

The performance of the five procedures under the chain dependence structure is shown in Figures 4 and 5, which correspond to Model 1 and Model 2, respectively. It is observed that all five procedures are valid. The parametric procedure (Liu et al., 2012) is conservative, which agrees with the observations in Figure 3(1d) of Liu et al. (2012). Our semiparametric data-driven procedure, the BH procedure and the local FDR procedure are slightly conservative. The oracle procedure slightly outperforms the semiparametric data-driven procedure based the plots for FNR and ATP. These two completely dominate the three baselines, which indicates the benefit of leveraging dependence among the hypotheses via the semiparametric graphical model. We also observe that the advantage of the oracle procedure and our semiparametric data-driven procedure over the local FDR procedure is larger when ϕ = 0.8 than when ϕ = 0.6. The reason is that as ϕ decreases from 0.8 to 0.6, the dependence strength among the hypotheses decreases, and we benefit less from leveraging the dependence. When ϕ = 0.5, the edge potentials in our graphical model are no longer informative, and the node potentials become the priors in the local FDR procedure, and our procedure exactly reduces to the local FDR procedure.

Performance of the procedures under **Model 1** when (1) ϕ = 0.8 and (2) ϕ = 0.6 in terms of (a) FDR (b) FNR and (c) ATP when the dependence structure is chain.

Performance of the procedures under **Model 2** when (1) ϕ = 0.8 and (2) ϕ = 0.6 in terms of (a) FDR (b) FNR and (c) ATP when the dependence structure is chain.

Performance under grid structure

The performance of the five procedures under the grid dependence structure is shown in Figures 6 and 7, which correspond to Model 1 and Model 2, respectively. All five procedures are valid. The parametric procedure is considerably conservative, which agrees with the observations in Figure 3(3d) of Liu et al. (2012). Again, our semiparametric data-driven procedure significantly outperforms the three baselines in all the configurations, demonstrating the benefit of leveraging dependence among the hypotheses via the semiparametric graphical model. The difference between our semiparametric data-driven procedure and the baselines is even larger compared with simulations under the chain structure. The reason is that in the grid structure, each hypothesis has more neighbors than in the chain structure, and we can benefit more from leveraging the dependence among the hypotheses.

Performance of the procedures under **Model 1** when (1) ϕ = 0.8 and (2) ϕ = 0.6 in terms of (a) FDR (b) FNR and (c) ATP when the dependence structure is grid.

Performance of the procedures under **Model 2** when (1) ϕ = 0.8 and (2) ϕ = 0.6 in terms of (a) FDR (b) FNR and (c) ATP when the dependence structure is grid.

Robustness of λ

In the previous simulations, λ is fixed at 0.8. We test another two values for λ, namely 0.2 and 0.5, and repeat previous simulations. The performance of our semiparametric procedure under the chain dependence structure and Model 1 with ϕ = 0.8 is provided in Figure 8. Quite surprisingly, our data-driven semiparametric procedure is valid for the three values of λ and is slightly conservative for most of the configurations. However, the FNR and ATP of our data-driven procedure for the three different values of λ are almost the same. Therefore although our approach needs to pick a λ parameter, its performance is robust for different choices of λ. The robustness of λ was also observed in the work of Storey (2002). The sensitivity analysis of λ in other configurations yield similar observations, and is given in Appendix 1 (in the supplementary materials).

Performance of our procedure when λ = 0.2 (dotted lines), 0.5 (dashed lines) and 0.8 (solid lines).

Efficiency of Ranking

Although ranking the hypotheses by the probability that ℋ₀ is false is a secondary goal in multiple testing, readers may wonder how well our semiparametric procedure performs in terms of ranking the hypotheses. For the oracle procedure, the parametric procedure (Liu et al., 2012) and our semiparametric procedure, we rank the hypotheses by the posterior probability that ℋ₀ is false, namely 1 − LIS. For BH, we use 1−P-value. For local FDR procedure, we use 1 − l f dr. Here we plot the ROC curves and PR curves yielded by the five procedures in Figure 9 for μ = 1.4 and ϕ = 0.8 in the chain structure under model 1. We observe that the oracle procedure produces the most efficient ranking, followed by the semiparametric procedure and the parametric procedure. The rankings yielded by local FDR and BH procedure are less efficient. The ROC curves and PR curves of these procedures under other configurations show similar behavior, and are provided in Appendix 2 (in the supplementary materials).

Run Time

In the chain-structure simulations, it took our data-driven procedure about 10 hours to finish the 500 replications sequentially (for one μ value in (10)) on one 3GHz CPU. In the grid-structure simulations, it took our procedure around 30 hours to finish the 500 replications sequentially (for one μ value in (10)) on one 3GHz CPU.

6. Application

We apply our procedure to a real-world GWAS on breast cancer (Hunter et al., 2007) which involves 528,173 SNPs for 1,145 cases and 1,142 controls. In total, we test 528,173 hypotheses, and they are dependent because SNPs nearby tend to be highly correlated. We query the squared correlation coefficients (r² values) among the SNPs from HapMap (International HapMap Consortium, 2003), and build the dependence structure as follows. Each SNP becomes a node in the graph. For each SNP, we connect it with the SNP having the highest r² value with it. We further categorize the edges into a high correlation edge set ℰ_h (r² above 0.8), a medium correlation edge set ℰ_m (r² between 0.5 and 0.8) and a low correlation edge set ℰ_l (r² between 0.25 and 0.5). We have three parameters (ϕ_h, ϕ_m, and ϕ_l) for the three sets of edges.

When we apply our procedure on the dataset, the individual test is a two-proportion z-test. We set λ=0.8, and the value of p₀ is estimated to be 0.978, which means that about 2.2% of the SNPs are associated to breast cancer. The estimated f₁ in this study is plotted in Figure 1. The whole experiment takes around 30 hours on a single processor. Our procedure reports 20 SNPs with LIS value below 0.01. There are five clusters covering 18 of them. All 18 SNPs have very small P-values from the two-proportion z-test and locate near one another in the same cluster. The first cluster on Chr2, the cluster on Chr4, the cluster on Chr9 and the cluster on Chr10 are identified in the studies of Hunter et al. (2007) and Satrom et al. (2009). The second cluster on Chr2 is associated to a telomere and telomeres are known to be related to breast cancer (Svenson et al., 2008). We further use a second cohort to validate the 18 SNPs, and 16 of them show a moderate level of association on the second cohort. More details are provided in Appendix 3 (in the supplementary materials). We also would like to mention that there is some work on estimating less conservative significance thresholds for controlling family-wise error rate in GWAS (Salyakina et al., 2005; Han & Eskin, 2010).

7. Conclusion

We propose a novel semiparametric graphical model to leverage the dependence in multiple testing problems. Although our semiparametric approach seems incremental over previous fully parametric approach (Sun & Cai, 2009; Liu et al., 2012) from the viewpoint of graphical models, such a modification is nontrivial to the multiple testing area, for both a methodological reason and an application reason. From the methodological standpoint, our semiparametric approach naturally generalizes the local FDR procedure and connects with the BH procedure — we show that both the BH procedure and the local FDR procedure estimate their parameters in the same semiparametric way to avoid assumptions about f₁. The methodological unification demonstrates that such a modification is necessary for multiple testing. From the application aspect, our semiparametric approach no longer requires the investigators to know the parameterization of f₁, which is generally unknown in practical problems. Improper parameterization assumptions for f₁ can make the fully parametric approach either too liberal which makes the procedure invalid, or too conservative which makes the procedure lose power, as illustrated by both our simulations and previous work (Sun & Cai, 2009; Liu et al., 2012). Our semiparametric approach better controls FDR and is more powerful. For these reasons, we suggest that investigators choose the semiparametric approach for their large-scale multiple testing problems if (i) they speculate that there exists dependence among the hypotheses, and (ii) there is no suitable parametric distribution for f₁.

Supplementary Material

Supplement

NIHMS612860-supplement-Supplement.pdf^{(338.9KB, pdf)}

Acknowledgements

The authors gratefully acknowledge the support of NIH grants R01GM097618, R01LM011028, R01CA165229, R01LM010921, P30CA014520, UL1TR000427, NSF grants DMS-1106586 and DMS-1308872, and Wisconsin Alumni Research Foundation.

Footnotes

Proceedings of the 31^st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32.

f₀ and f₁ are the probability density functions of the test statistic under the null hypothesis ℋ₀ and the alternative hypothesis ℋ₁, respectively. In the HMM model (Sun & Cai, 2009) and the MRF-coupled mixture model (Liu et al., 2012), f₀ and f₁ are the emitting probabilities for state 0 and state 1 respectively.

f₀ is usually known to us in hypothesis testing.

We slightly modify Figure 1 of Liu et al. (2012).

Contributor Information

Jie Liu, Email: JIELIU@CS.WISC.EDU, Department of Computer Sciences, University of Wisconsin-Madison.

Chunming Zhang, Email: CMZHANG@STAT.WISC.EDU, Department of Statistics, University of Wisconsin-Madison.

Elizabeth Burnside, Email: EBURNSIDE@UWHEALTH.ORG, Department of Radiology, University of Wisconsin-Madison.

David Page, Email: PAGE@BIOSTAT.WISC.EDU, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison.

References

Benjamini Yoav, Heller Ruth. False discovery rates for spatial signals. JASA. 2007;102:1272–1281. [Google Scholar]
Benjamini Yoav, Hochberg Yosef. Controlling the false discovery rate: A practical and powerful approach to multiple testing. JRSS-B. 1995;57(1):289–300. [Google Scholar]
Benjamini Yoav, Hochberg Yosef. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J EDUC BEHAV STAT. 2000;25(1):60–83. [Google Scholar]
Benjamini Yoav, Yekutieli Daniel. The control of the false discovery rate in multiple testing under dependency. ANN STAT. 2001;29:1165–1188. [Google Scholar]
Besag Julian. Statistical analysis of non-lattice data. JRSS-D. 1975;24(3):179–195. [Google Scholar]
Blanchard Gilles, Roquain Étienne. Adaptive false discovery rate control under independence and dependence. J MACH LEARN RES. 2009;10:2837–2871. [Google Scholar]
Efron Bradley. Correlation and large-scale simultaneous significance testing. JASA. 2007;102(477):93–103. [Google Scholar]
Efron Bradley. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press; 2010. [Google Scholar]
Efron Bradley, Tibshirani Robert. Empirical Bayes methods and false discovery rates for microarrays. GENET EPIDEMIO. 2002;23(1):70–86. doi: 10.1002/gepi.1124. [DOI] [PubMed] [Google Scholar]
Efron Bradley, Tibshirani Robert, Storey John D, Tusher Virginia. Empirical Bayes analysis of a microarray experiment. JASA. 2001;96:1151–1160. [Google Scholar]
Epanechnikov VA. Non-parametric estimation of a multivariate probability density. THEOR PROBAB APPL. 1969;14(1):153–158. [Google Scholar]
Fan Jianqing, Han Xu, Gu Weijie. Control of the false discovery rate under arbitrary covariance dependence. JASA. 2012;107(499):1019–1045. doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farcomeni Alessio. Some results on the control of the false discovery rate under dependence. SCAND J STAT. 2007;34(2):275–297. [Google Scholar]
Finner H, Roters M. Multiple hypotheses testing and expected number of type I errors. ANN STAT. 2002;30:220–238. [Google Scholar]
Friguet Chloé, Kloareg Maela, Causeur David. A factor model approach to multiple testing under dependence. JASA. 2009;104(488):1406–1415. [Google Scholar]
Genovese Christopher, Wasserman Larry. Operating characteristics and extensions of the false discovery rate procedure. JRSS-B. 2002;64:499–517. [Google Scholar]
Genovese Christopher, Wasserman Larry. A stochastic process approach to false discovery control. ANN STAT. 2004;32:1035–1061. [Google Scholar]
Genovese Christopher, Roeder Kathryn, Wasserman Larry. False discovery control with p-value weighting. BIOMETRIKA. 2006;93:509–524. [Google Scholar]
Han Buhm, Eskin Eleazar. Multiple testing in genetic epidemiology. Encyclopedia of Life Sciences. 2010 [Google Scholar]
Hinton Geoffrey. Training products of experts by minimizing contrastive divergence. NEURAL COMPUT. 2002;14:1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]
Hunter David J, Kraft Peter, Chanock Stephen J. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. NAT GENET. 2007;39(7):870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hyvärinen Aapo. Connections between score matching, contrastive divergence, and pseudolikelihood for continuousvalued variables. IEEE T NEURAL NETWOR. 2007a;18(5):1529–1531. doi: 10.1109/tnn.2007.895819. [DOI] [PubMed] [Google Scholar]
Hyvärinen Aapo. Some extensions of score matching. COMPUT STAT DATA AN. 2007b;51(5):2499–2512. [Google Scholar]
International HapMap Consortium. The international HapMap project. NATURE. 2003;426:789–796. doi: 10.1038/nature02168. [DOI] [PubMed] [Google Scholar]
Leek Jeffrey T, Storey John D. A general framework for multiple testing dependence. P NATL ACAD SCI USA. 2008;105(48):18718–18723. doi: 10.1073/pnas.0808709105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Jie, Zhang Chunming, McCarty Catherine, Peissig Peggy, Burnside Elizabeth, Page David. UAI. 2012. Graphical-model based multiple testing under dependence, with applications to genome-wide association studies. [PMC free article] [PubMed] [Google Scholar]
Liu Jie, Zhang Chunming, Burnside Elizabeth, Page David. AISTATS. 2014. Learning heterogeneous hidden Markov random fields. [PMC free article] [PubMed] [Google Scholar]
Manolio Teri A, Collins Francis S, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
Owen Art B. Variance of the number of false discoveries. JRSSB. 2005;67:411–426. [Google Scholar]
Reiner Anat, Yekutieli Daniel, Benjamini Yoav. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics. 2003;19(3):368–375. doi: 10.1093/bioinformatics/btf877. [DOI] [PubMed] [Google Scholar]
Robbins Herbert. The 3rd Berkeley Symposium I. 1956. An empirical Bayes approach to statistics; pp. 157–163. [Google Scholar]
Romano Joseph, Shaikh Azeem, Wolf Michael. Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST. 2008;17:417–442. [Google Scholar]
Rosenblatt Murray. Remarks on some nonparametric estimates of a density function. ANN MATH STAT. 1956;27(3):832–837. [Google Scholar]
Salyakina Daria, Seaman Shaun R, Browning Brian L, Dudbridge Frank, Müller-Myhsok Bertram. Evaluation of nyholts procedure for multiple testing correction. Human heredity. 2005;60(1):19–25. doi: 10.1159/000087540. [DOI] [PubMed] [Google Scholar]
Sarkar Sanat K. False discovery and false nondiscovery rates in single-step multiple testing procedures. ANN STAT. 2006;34(1):394–415. [Google Scholar]
Satrom Pal, Biesinger Jacob, Larson Garrett P. A risk variant in an mir-125b binding site in bmpr1b is associated with breast cancer pathogenesis. CANCER RES. 2009;69(18):7459–7465. doi: 10.1158/0008-5472.CAN-09-1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
Storey John D. A direct approach to false discovery rates. JRSSB. 2002;64:479–498. [Google Scholar]
Storey John D, Taylor Jonathan E, Siegmund David. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. JRSS-B. 2004;66(1):187–205. [Google Scholar]
Strimmer Korbinian. A unified approach to false discovery rate estimation. BMC bioinformatics. 2008;9(1):303. doi: 10.1186/1471-2105-9-303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun Wenguang, Cai T Tony. Large-scale multiple testing under dependence. JRSS-B. 2009;71:393–424. [Google Scholar]
Svenson Ulrika, Nordfjall Katarina, Roos Goran. Breast cancer survival is associated with telomere length in peripheral blood cells. CANCER RES. 2008;68(10):3618–3623. doi: 10.1158/0008-5472.CAN-07-6497. [DOI] [PubMed] [Google Scholar]
Tieleman Tijmen. ICML. 2008. Training restricted Boltzmann machines using approximations to the likelihood gradient; pp. 1064–1071. [Google Scholar]
Wu Wei Biao. On false discovery control under dependence. ANN STAT. 2008;36(1):364–380. [Google Scholar]
Yekutieli Daniel, Benjamini Yoav. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J STAT PLAN INFER. 1999;82:171–196. [Google Scholar]
Zhang Chunming, Fan Jianqing, Yu Tao. Multiple testing via FDRL for large-scale imaging data. ANN STAT. 2011;39(1):613–642. doi: 10.1214/10-AOS848SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS612860-supplement-Supplement.pdf^{(338.9KB, pdf)}

[R1] Benjamini Yoav, Heller Ruth. False discovery rates for spatial signals. JASA. 2007;102:1272–1281. [Google Scholar]

[R2] Benjamini Yoav, Hochberg Yosef. Controlling the false discovery rate: A practical and powerful approach to multiple testing. JRSS-B. 1995;57(1):289–300. [Google Scholar]

[R3] Benjamini Yoav, Hochberg Yosef. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J EDUC BEHAV STAT. 2000;25(1):60–83. [Google Scholar]

[R4] Benjamini Yoav, Yekutieli Daniel. The control of the false discovery rate in multiple testing under dependency. ANN STAT. 2001;29:1165–1188. [Google Scholar]

[R5] Besag Julian. Statistical analysis of non-lattice data. JRSS-D. 1975;24(3):179–195. [Google Scholar]

[R6] Blanchard Gilles, Roquain Étienne. Adaptive false discovery rate control under independence and dependence. J MACH LEARN RES. 2009;10:2837–2871. [Google Scholar]

[R7] Efron Bradley. Correlation and large-scale simultaneous significance testing. JASA. 2007;102(477):93–103. [Google Scholar]

[R8] Efron Bradley. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press; 2010. [Google Scholar]

[R9] Efron Bradley, Tibshirani Robert. Empirical Bayes methods and false discovery rates for microarrays. GENET EPIDEMIO. 2002;23(1):70–86. doi: 10.1002/gepi.1124. [DOI] [PubMed] [Google Scholar]

[R10] Efron Bradley, Tibshirani Robert, Storey John D, Tusher Virginia. Empirical Bayes analysis of a microarray experiment. JASA. 2001;96:1151–1160. [Google Scholar]

[R11] Epanechnikov VA. Non-parametric estimation of a multivariate probability density. THEOR PROBAB APPL. 1969;14(1):153–158. [Google Scholar]

[R12] Fan Jianqing, Han Xu, Gu Weijie. Control of the false discovery rate under arbitrary covariance dependence. JASA. 2012;107(499):1019–1045. doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Farcomeni Alessio. Some results on the control of the false discovery rate under dependence. SCAND J STAT. 2007;34(2):275–297. [Google Scholar]

[R14] Finner H, Roters M. Multiple hypotheses testing and expected number of type I errors. ANN STAT. 2002;30:220–238. [Google Scholar]

[R15] Friguet Chloé, Kloareg Maela, Causeur David. A factor model approach to multiple testing under dependence. JASA. 2009;104(488):1406–1415. [Google Scholar]

[R16] Genovese Christopher, Wasserman Larry. Operating characteristics and extensions of the false discovery rate procedure. JRSS-B. 2002;64:499–517. [Google Scholar]

[R17] Genovese Christopher, Wasserman Larry. A stochastic process approach to false discovery control. ANN STAT. 2004;32:1035–1061. [Google Scholar]

[R18] Genovese Christopher, Roeder Kathryn, Wasserman Larry. False discovery control with p-value weighting. BIOMETRIKA. 2006;93:509–524. [Google Scholar]

[R19] Han Buhm, Eskin Eleazar. Multiple testing in genetic epidemiology. Encyclopedia of Life Sciences. 2010 [Google Scholar]

[R20] Hinton Geoffrey. Training products of experts by minimizing contrastive divergence. NEURAL COMPUT. 2002;14:1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]

[R21] Hunter David J, Kraft Peter, Chanock Stephen J. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. NAT GENET. 2007;39(7):870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Hyvärinen Aapo. Connections between score matching, contrastive divergence, and pseudolikelihood for continuousvalued variables. IEEE T NEURAL NETWOR. 2007a;18(5):1529–1531. doi: 10.1109/tnn.2007.895819. [DOI] [PubMed] [Google Scholar]

[R23] Hyvärinen Aapo. Some extensions of score matching. COMPUT STAT DATA AN. 2007b;51(5):2499–2512. [Google Scholar]

[R24] International HapMap Consortium. The international HapMap project. NATURE. 2003;426:789–796. doi: 10.1038/nature02168. [DOI] [PubMed] [Google Scholar]

[R25] Leek Jeffrey T, Storey John D. A general framework for multiple testing dependence. P NATL ACAD SCI USA. 2008;105(48):18718–18723. doi: 10.1073/pnas.0808709105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Liu Jie, Zhang Chunming, McCarty Catherine, Peissig Peggy, Burnside Elizabeth, Page David. UAI. 2012. Graphical-model based multiple testing under dependence, with applications to genome-wide association studies. [PMC free article] [PubMed] [Google Scholar]

[R27] Liu Jie, Zhang Chunming, Burnside Elizabeth, Page David. AISTATS. 2014. Learning heterogeneous hidden Markov random fields. [PMC free article] [PubMed] [Google Scholar]

[R28] Manolio Teri A, Collins Francis S, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Owen Art B. Variance of the number of false discoveries. JRSSB. 2005;67:411–426. [Google Scholar]

[R30] Reiner Anat, Yekutieli Daniel, Benjamini Yoav. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics. 2003;19(3):368–375. doi: 10.1093/bioinformatics/btf877. [DOI] [PubMed] [Google Scholar]

[R31] Robbins Herbert. The 3rd Berkeley Symposium I. 1956. An empirical Bayes approach to statistics; pp. 157–163. [Google Scholar]

[R32] Romano Joseph, Shaikh Azeem, Wolf Michael. Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST. 2008;17:417–442. [Google Scholar]

[R33] Rosenblatt Murray. Remarks on some nonparametric estimates of a density function. ANN MATH STAT. 1956;27(3):832–837. [Google Scholar]

[R34] Salyakina Daria, Seaman Shaun R, Browning Brian L, Dudbridge Frank, Müller-Myhsok Bertram. Evaluation of nyholts procedure for multiple testing correction. Human heredity. 2005;60(1):19–25. doi: 10.1159/000087540. [DOI] [PubMed] [Google Scholar]

[R35] Sarkar Sanat K. False discovery and false nondiscovery rates in single-step multiple testing procedures. ANN STAT. 2006;34(1):394–415. [Google Scholar]

[R36] Satrom Pal, Biesinger Jacob, Larson Garrett P. A risk variant in an mir-125b binding site in bmpr1b is associated with breast cancer pathogenesis. CANCER RES. 2009;69(18):7459–7465. doi: 10.1158/0008-5472.CAN-09-1201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] Storey John D. A direct approach to false discovery rates. JRSSB. 2002;64:479–498. [Google Scholar]

[R38] Storey John D, Taylor Jonathan E, Siegmund David. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. JRSS-B. 2004;66(1):187–205. [Google Scholar]

[R39] Strimmer Korbinian. A unified approach to false discovery rate estimation. BMC bioinformatics. 2008;9(1):303. doi: 10.1186/1471-2105-9-303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Sun Wenguang, Cai T Tony. Large-scale multiple testing under dependence. JRSS-B. 2009;71:393–424. [Google Scholar]

[R41] Svenson Ulrika, Nordfjall Katarina, Roos Goran. Breast cancer survival is associated with telomere length in peripheral blood cells. CANCER RES. 2008;68(10):3618–3623. doi: 10.1158/0008-5472.CAN-07-6497. [DOI] [PubMed] [Google Scholar]

[R42] Tieleman Tijmen. ICML. 2008. Training restricted Boltzmann machines using approximations to the likelihood gradient; pp. 1064–1071. [Google Scholar]

[R43] Wu Wei Biao. On false discovery control under dependence. ANN STAT. 2008;36(1):364–380. [Google Scholar]

[R44] Yekutieli Daniel, Benjamini Yoav. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J STAT PLAN INFER. 1999;82:171–196. [Google Scholar]

[R45] Zhang Chunming, Fan Jianqing, Yu Tao. Multiple testing via FDRL for large-scale imaging data. ANN STAT. 2011;39(1):613–642. doi: 10.1214/10-AOS848SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multiple Testing under Dependence via Semiparametric Graphical Models

Jie Liu

Chunming Zhang

Elizabeth Burnside

David Page

Abstract

1. Introduction

Figure 1.

2. Preliminaries

FDR, FNR, Validity and Efficiency

Table 1.

Dependence in Multiple Testing

3. Methods

3.1. Graphical models for Multiple Testing

Figure 2.

3.2. Nonparametric Estimation of f1

3.3. Parametric Estimation of ϕ and π

3.4. Inference of θ and FDR Control

4. Connections with Classical Multiple Testing Procedures

Figure 3.

5. Simulations

Model 1

Model 2

Performance under chain structure

Figure 4.

Figure 5.

Performance under grid structure

Figure 6.

Figure 7.

Robustness of λ

Figure 8.

Efficiency of Ranking

Figure 9.

Run Time

6. Application

7. Conclusion

Supplementary Material

Acknowledgements

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.2. Nonparametric Estimation of f₁