Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 3.
Published in final edited form as: Uncertain Artif Intell. 2012;2012:511–522.

Graphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies

Jie Liu 1, Peggy Peissig 2, Chunming Zhang 3, Elizabeth Burnside 4, Catherine McCarty 5, David Page 6
PMCID: PMC4184466  NIHMSID: NIHMS393555  PMID: 25285046

Abstract

Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random-field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.

1 Introduction

Observations from large-scale multiple testing problems often exhibit dependence. For instance, in genome-wide association studies, researchers collect hundreds of thousands of highly correlated genetic markers (single-nucleotide polymorphisms, or SNPs) with the purpose of identifying the subset of markers associated with a heritable disease or trait. In functional magnetic resonance imaging studies of the brain, thousands of spatially correlated voxels are collected while subjects are performing certain tasks, with the purpose of detecting the relevant voxels. The most popular family of large-scale multiple testing procedures is the false discovery rate analysis, such as the p-value thresholding procedures (Benjamini & Hochberg, 1995, 2000; Genovese & Wasserman, 2004), the local false discovery rate procedure (Efron et al., 2001), and the positive false discovery rate procedure (Storey, 2002, 2003). However, all these classical multiple testing procedures ignore the correlation structure among the individual factors, and the question is whether we can reduce the false non-discovery rate by leveraging the dependence, while still controlling the false discovery rate in multiple testing.

Graphical models provide an elegant way of representing dependence. With recent advances in graphical models, especially more efficient algorithms for inference and parameter learning, it is feasible to use these models to leverage the dependence between individual tests in multiple testing problems. One influential paper (Sun & Cai, 2009) in the statistics community uses a hidden Markov model to represent the dependence structure, and has shown its optimality under certain conditions and its strong empirical performance. It is the first graphical model (and the only one so far) used in multiple testing problems. However, their procedure can only deal with a sequential dependence structure, and the dependence parameters are homogenous. In this paper, we propose a multiple testing procedure based on a Markov-random-field-coupled mixture model which allows arbitrary dependence structures and heterogeneous dependence parameters. This extension requires more sophisticated algorithms for parameter learning and inference. For parameter learning, we design an EM algorithm with MCMC in the E-step and persistent contrastive divergence algorithm (Tieleman, 2008) in the M-step. We use the MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance or LIS). Finally, the false discovery rate can be controlled by thresholding the LIS.

Section 2 introduces related work and our procedure. Sections 3 and 4 evaluate our procedure on a variety of simulations, and the empirical results show that the numerical performance can be improved substantially by using our procedure. In Section 5, we apply the procedure to a real-world genome-wide association study (GWAS) on breast cancer, and we identify several SNPs with strong association evidence. We finally conclude in Section 6.

2 Method

2.1 Terminology and Previous Work

Suppose that we carry out m tests whose results can be categorized as in Table 1. False discovery rate (FDR), defined as E(N10/R|R > 0) P(R > 0), depicts the expected proportion of incorrectly rejected null hypotheses (Benjamini & Hochberg, 1995). False non-discovery rate (FNR), defined as E(N01/S|S > 0) P(S > 0), depicts the expected proportion of false non-rejections in those tests whose null hypotheses are not rejected (Genovese & Wasserman, 2002). An FDR procedure is valid if it controls FDR at a nominal level, and optimal if it has the smallest FNR among all the valid FDR procedures (Sun & Cai, 2009). The effects of correlation on multiple testing have been discussed, under different assumptions, with a focus on the validity issue (Benjamini & Yekutieli, 2001; Finner & Roters, 2002; Owen, 2005; Sarkar, 2006; Efron, 2007; Farcomeni, 2007; Romano et al., 2008;Wu, 2008; Blanchard & Roquain, 2009). The efficiency issue has also been investigated (Yekutieli & Benjamini, 1999; Genovese et al., 2006; Benjamini & Heller, 2007; Zhang et al., 2011), indicating FNR could be decreased by considering dependence in multiple testing. Several approaches have been proposed, such as dependence kernels (Leek & Storey, 2008), factor models (Friguet et al., 2009) and principal factor approximation (Fan et al., 2012). Sun & Cai (2009) explicitly use a hidden Markov model (HMM) to represent the dependence structure and analyze the optimality under the compound decision framework (Sun & Cai, 2007). However, their procedure can only deal with sequential dependence, and it uses only a single dependence parameter throughout. In this paper, we replace HMM with a Markov-random-field-coupled mixture model, which allows richer and more exible dependence structures. The Markov-random-field-coupled mixture models are related to the hidden Markov random field models used in many image segmentation problems (Zhang et al., 2001; Celeux et al., 2003; Chatzis & Varvarigou, 2008).

Table 1.

Classification of tested hypotheses

Not rejected Rejected Total

Null N00 N10 m0
Non-null N01 N11 m1
Total S R m

2.2 Our Multiple Testing Procedure

Let x = (x1, …, xm) be a vector of test statistics from a set of hypotheses (ℋ1, …, ℋm). The ground truth of these hypotheses is denoted by a latent Bernoulli vector θ = (θ1, …, θm) ∈ {0, 1}m, with θi = 0 denoting that the hypothesis ℋi is null and θi = 1 denoting that the hypothesis ℋi is non-null. The dependence among these hypotheses is represented as a binary Markov random field (MRF) on θ. The structure of the MRF can be described by an undirected graph 𝒢(𝒱, ℰ) with the node set 𝒱 and the edge set ℰ. The dependence between ℋi and ℋj is denoted by an edge connecting nodei and nodej in ℰ, and the strength of dependence is parameterized by the potential function on the edge. Suppose that the probability density function of the test statistic xi given θi = 0 is f0, and the density of xi given θi = 1 is f1. Then, x is an MRF-coupled mixture. The mixture model is parameterized by a parameter set 𝓋 = (ϕ, ψ), where ϕ parameterizes the binary MRF and ψ parameterizes f0 and f1. For example, if f0 is standard normal 𝒩(0, 1) and f1 is noncentered normal 𝒩(µ, 1), then ψ only contains parameter µ. Figure 1 shows the MRF-coupled mixture model for three dependent hypotheses ℋi, ℋj and ℋk.

Figure 1.

Figure 1

The MRF-coupled mixture model for three dependent hypotheses ℋi, ℋj and ℋk with observed test statistics (xi, xj and xk) and latent ground truth (θij and θk). The dependence is captured by potential functions parameterized by ϕijjk and ϕik, and coupled mixtures are parameterized by ψ.

In our MRF-coupled mixture model, x is observable, and θ is hidden. With the parameter set 𝓋 = (ϕ, ψ), the joint probability density over x and θ is

P(x,θ|ϕ,ψ)=P(θ;ϕ)i=1mP(xi|θi;ψ).

Define the marginal probability that ℋi is null given all the observed statistics x under the parameters in 𝓋, P𝓋i = 0|x), to be the local index of significance (LIS) for ℋi (Sun & Cai, 2009). If we can accurately calculate the posterior marginal probabilities of θ (or LIS), then we can use a step-up procedure to control FDR at the nominal level α as follows (Sun & Cai, 2009). We first sort LIS from the smallest value to the largest value. Suppose LIS(1), LIS(2), …, and LIS(m) are the ordered LIS, and the corresponding hypotheses are ℋ(1), ℋ(2),…, and ℋ(m). Let

k=max{i:1ij=1iLIS(j)α}.

Then we reject ℋ(i) for i = 1, …, k.

Therefore, the key inferential problem that we need to solve is that of computing the posterior marginal distribution of the hidden variables θi given the test statistics x, namely P𝓋i = 0|x), for i = 1, …, m. It is a typical inference problem if the parameters in 𝓋 are known. Section 2.3 provides possible inference algorithms for calculating P𝓋i = 0|x) for given 𝓋. However, 𝓋 is usually unknown in real-world applications, and we need to estimate it. Section 2.4 provides a novel EM algorithm for parameter learning in our MRF-coupled mixture model.

2.3 Posterior Inference

Now we are interested in calculating P𝓋i = 0|x) for a given parameter set 𝓋. One popular family of inference algorithms is the sum-product family (Kschischang et al., 2001), also known as belief propagation (Yedidia et al., 2000). For loop-free graphs, belief propagation algorithms provide exact inference results with a computational cost linear in the number of variables. In our MRF-coupled mixture model, the structure of the latent MRF is described by a graph 𝒢(𝒱, ℰ). When 𝒢 is chain structured, the instantiation of belief propagation is the forward-backward algorithm (Baum et al., 1970). When 𝒢 is tree structured, the instantiation of belief propagation is the upward-downward algorithm (Crouse et al., 1998). For graphical models with cycles, loopy belief propagation (Murphy et al., 1999; Weiss, 2000) and the tree-reweighted algorithm (Wainwright et al., 2003a) can be used for approximate inference. Other inference algorithms for graphical models include junction trees (Lauritzen & Spiegelhalter, 1988), sampling methods (Gelfand & Smith, 1990), and variational methods (Jordan et al., 1999). Recent papers (Schraudolph & Kamenetsky, 2009; Schraudolph, 2010) discuss exact inference algorithms on binary Markov random fields which allow loops. In our simulations, we use belief propagation when the graph 𝒢 has no loops. When 𝒢 has loops (e.g. in the simulations on genetic data and the real-world application), we use a Markov chain Monte Carlo (MCMC) algorithm to perform inference for P𝓋i = 0|x).

2.4 Parameters and Parameter Learning

In our procedure, the dependence among these hypotheses is represented by a graphical model on the latent vector θ parameterized by ϕ, and observed test statistics x are represented by the coupled mixture parameterized by ψ. In Sun and Cai's work on HMMs, ϕ is the transition parameter and ψ is the emission parameter. One implicit assumption in their work is that the transition parameter and the emission parameter stay the same for i(i = 1, …, m). Our extension to MRFs also allows us to untie these parameters. In the second set of basic simulations in Section 3, we make ϕ and ψ heterogeneous and investigate how this affects the numerical performance. In the simulations on genetic data in Section 4 and the real-world GWAS application in Section 5, we have different parameters for SNP pairs with different levels of correlation.

In our model, learning (ϕ, ψ) is difficult for two reasons. First, learning parameters is difficult by nature in undirected graphical models due to the global normalization constant (Wainwright et al., 2003b; Welling & Sutton, 2005). State-of-the-art MRF parameter learning methods include MCMC-MLE (Geyer, 1991), contrastive divergence (Hinton, 2002) and variational methods (Ganapathi et al., 2008). Several new sampling methods with higher efficiency have been recently proposed, such as persistent contrastive divergence (Tieleman, 2008), fast-weight contrastive divergence (Tieleman & Hinton, 2009), tempered transitions (Salakhutdinov, 2009), and particle-filtered MCMC-MLE (Asuncion et al., 2010). In our procedure, we use the persistent contrastive divergence algorithm to estimate parameters ϕ. Another difficulty is that θ is latent and we only have one observed training sample x. We use an EM algorithm to solve this problem. In the E-step, we run our MCMC algorithm in Section 2.3 to infer the latent θ based on the currently estimated parameters 𝓋 = (ϕ, ψ). In the M-step, we run the persistent contrastive divergence (PCD) algorithm (Tieleman, 2008) to estimate ϕ from the currently inferred θ. Note that PCD is also an iterative algorithm, and we run it until it converges in each M-step. In the M-step, we also do a maximum likelihood estimation of ψ from the currently inferred θ and observed x. We run the EM algorithm until both ϕ and ψ converge. Although this EM algorithm involves intensive computation in both E-step and M-step, it converges very quickly in our experiments.

3 Basic Simulations

In the basic simulations, we investigate the numerical performance of our multiple testing approach on different fabricated dependence structures where we can control the ground truth parameters. We first simulate θ from P(θ; ϕ) and then simulate x from P(x|θ; ψ) under a variety of settings of 𝓋 = (ϕ, ψ). Because we have the ground truth parameters, we have two versions of our multiple testing approach, namely the oracle procedure (OR) and the data-driven procedure (LIS). The oracle procedure knows the true parameters 𝓋 in the graphical models, whereas the data-driven procedure does not and has to estimate 𝓋. The baseline procedures include the BH procedure (Benjamini & Hochberg, 1995) and the adaptive p-value procedure (AP) (Benjamini & Hochberg, 2000; Genovese & Wasserman, 2004) which are compared by Sun & Cai (2009). We include another baseline procedure, the local false discovery rate procedure (localFDR) (Efron et al., 2001). The adaptive p-value procedure requires a consistent estimate of the proportion of the true null hypotheses. The localFDR procedure requires a consistent estimate of the proportion of the true null hypotheses and the knowledge of the distribution of the test statistics under the null and under the alternative. In our simulations, we endow AP and localFDR with the ground truth values of these in order to let these baseline procedures achieve their best performance.

In the simulations, we assume that the observed xi under the null hypothesis (namely θi = 0) is standard-normally distributed and that xi under the alternative hypothesis (namely θi = 1) is normally distributed with mean µ and standard deviation 1.0. We choose the setup and parameters to be consistent with the work of Sun & Cai (2009) when possible. In total, we consider three MRF models, namely a chain-structured MRF, tree-structured MRF and grid-structured MRF. For chain-MRF, we choose the number of hypotheses m = 3, 000. For tree-MRF, we choose perfect binary trees of height 12 which yields a total number of 8, 191 hypotheses. For grid-MRF, we choose the number of rows and the number of columns to be 100 which yields a total number of 10, 000 hypotheses. In all the experiments, we choose the number of replications N = 500 which is also the same as the work of Sun & Cai (2009). In total, we have three sets of simulations with different goals as follows.

Basic simulation 1

We stay consistent with Sun & Cai (2009) in the simulations except that we use the three MRF models. In all three structures, (θi)1m is generated from the MRFs whose potentials on the edges are (ϕ1ϕ1ϕϕ). Therefore, ϕ only contains parameter ϕ, and ψ only includes parameter µ.

Basic simulation 2

One assumption in basic simulation 1 is that the parameters ϕ and µ are homogeneous in the sense that they stay the same for i(i = 1, …, m). This assumption is carried down from the work of Sun & Cai (2009). However in many real-world applications, the transition parameters can be different across the multiple hypotheses. Similarly, the test statistics for the non-null hypotheses, although normally distributed and standardized, could have different µ values. Therefore, we investigate the situation where the parameters can vary in different hypotheses. The simulations are carried out for all three different dependence structures aforementioned. In the first set of simulations, instead of fixing ϕ, we choose ϕ’s uniformly distributed on the interval (0.8−Δ(ϕ)/2, 0.8+Δ(ϕ)/2). In the second set of simulations, instead of fixing µ, we choose µ’s uniformly distributed on the interval (2.0−Δ(µ)/2, 2.0+Δ(µ)/2). The oracle procedure knows the true parameters. The data-driven procedure does not know the parameters, and assumes the parameters are homogeneous.

Basic simulation 3

Another implicit assumption in basic simulation 1 is that each individual test in the multiple testing problem is exact. Many widely used hypothesis tests, such as Pearson’s χ2 test and the likelihood ratio test, are asymptotic in the sense that we only know the limiting distribution of the test statistics for large samples. As an example, we simulate the two-proportion z-test in this section and show how the sample size affects the performance of the procedures when the individual test is asymptotic. Suppose that we have n samples (half of them are positive samples and half of them are negative samples). For each sample, we have m Bernoulli distributed attributes. A fraction of the attributes are relevant. If the attribute A is relevant, then the probability of “heads” in the positive samples (pA+) is different from that in the negative samples (pA). pA+ and pA are the same if A is non-relevant. For each individual test, the null hypothesis is that the attribute is not relevant, and the alternative hypothesis is otherwise. The two-proportion z-test can be used to test whether pA+pA is zero, which yields an asymptotic 𝒩(0, 1) under the null and 𝒩(µ, 1) under the alternative (µ is nonzero). In the simulations, we fix µ, but vary the sample size n, and apply the aforementioned tree-MRF structure (m = 8, 191). The oracle procedure and localFDR only know the limiting distribution of the test statistics and assume the test statistics exactly follow the limiting distributions even when the sample size is small.

Figure 2 shows the numerical results in basic simulation 1. Figures (1a)-(1f) are for the chain structure. Figures (2a)-(2f) are for tree structure. Figures (3a)-(3f) are for the grid structure. In Figures (1a)-(1c), (2a)-(2c) and (3a)-(3c), we set µ = 2 and plot FDR, FNR and the average number of true positives (ATP) when we vary ϕ between 0.2 and 0.8. In Figures (1d)-(1f), (2d)-(2f) and (3d)-(3f), we set ϕ = 0.8 and plot FDR, FNR and ATP when we vary µ between 1.0 and 4.0. The nominal FDR level is set to be 0.10. From Figure 2, we can observe comparable numerical results between the chain structure and tree structure. The FDR levels of all five procedures are controlled at 0.10 and BH is conservative. From the plots for FNR and ATP, we can observe that the data-driven procedure performs almost the same as the oracle procedure, and they dominate the p-value thresholding procedures BH and AP. The oracle procedure and the data-driven procedure also dominate localFDR except when ϕ = 0.5, when they perform comparably. This is to be expected because the dependence structure is no longer informative when ϕ is 0.5. In this situation when the hypotheses are independent, our procedure reduces to the localFDR procedure. As ϕ departs from 0.5 and approaches either 0 or 1.0, the difference between OR/LIS and the baselines gets larger. When the individual hypotheses are easy to test (large µ values), the differences between them are not substantial. When we turn to the grid structure, the numerical performance is similar to that in the chain structure and the tree structure except for two observations. First, the data-driven procedure does not appear to control the FDR at 0.1 when µ is small (e.g. µ = 1.0), although the oracle procedure does, which indicates the parameter estimation in the EM algorithm is difficult when µ is small. In other words, with a limited number of hypotheses, it is difficult to estimate the pairwise potential parameters if the test statistics of the non-nulls do not look much different from the test statistics of the nulls. The second observation is that the slopes of the FNR curve and ATP curve for the grid structure are different from those in the chain and tree structures. The reason is that the connectivity in the grid structure is higher than that in the chain and tree. Therefore we can observe that even when the individual hypotheses are difficult to test (small µ values), the FNR is still low because each individual hypothesis has more neighbors in the grid than in the chain or tree, and the neighbors are informative.

Figure 2.

Figure 2

Comparison of BH(○), AP(Δ), localFDR(×), OR (+), and LIS (□) in basic simulation 1: (1) chain-MRF, (2) tree-MRF, (3) grid-MRF, (a) FDR vs ϕ, (b) FNR vs ϕ, (c) ATP vs ϕ, (d) FDR vs µ, (e) FNR vs µ, (f) ATP vs µ.

Figure 3.

Figure 3

Comparison of BH(○), AP(Δ), localFDR(×), OR (+), and LIS (□) in basic simulation 2: (1) chain-MRF, (2) tree-MRF, (3) grid-MRF, (a) FDR vs Δ(ϕ), (b) FNR vs Δ(ϕ), (c) ATP vs Δ(ϕ), (d) FDR vs Δ(µ), (e) FNR vs Δ(µ), (f) ATP vs Δ(µ).

Figure 3 shows the numerical performance in basic simulation 2. Figures (1a)-(1f), (2a)-(2f), and (3a)-(3f) correspond to the chain structure, the tree structure and the grid structure respectively. In Figures (1a)-(1c), (2a)-(2c), and (3a-3c), we set µ = 2 and vary Δ(ϕ) between 0 and 0.4. In Figures (1d)-(1f), (2d)-(2f), and (3d)-(3f), we set ϕ = 0.8 and vary Δ(µ) between 0 and 4.0. Again, the nominal FDR level is set to be 0.10. From Figure 3, we observe that all five procedures control FDR at the nominal level and BH is conservative when the transition parameter µ is heterogeneous. However, the data-driven procedure becomes more and more conservative as we increase the variance of ϕ in the grid-structure. Nevertheless, the data-driven procedure does not lose much efficiency compared with the oracle procedure based on FNR and ATP. Both the data-driven procedure and the oracle procedure dominate the three baselines. When the µ parameter is heterogeneous, all five procedures are still valid, but the data-driven procedure becomes more and more conservative as we increase the variance of µ. The data-driven procedure can be more conservative than the BH procedure when Δ(µ) is large enough. The conservativeness appears most severe in the grid-structure. However when we look at the FNR and ATP, the data-driven procedure still dominates BH, AP and localFDR substantially in all the situations, although the data-driven procedure loses a certain amount of efficiency compared with the oracle procedure when the variance of µ gets large.

Figure 4 shows the results from basic simulation 3. The oracle procedure and localFDR are liberal when the sample size is small. This is because when the sample size is small, there exists a discrepancy between the true distribution of the test statistic and the limiting distribution. Quite surprisingly, the data-driven procedure stays valid. The reason is that the data-driven procedure can estimate the parameters from data. The data-driven procedure and the oracle procedure still have comparable performance and enjoy a much lower level of FNR compared with the baselines. For all the basic simulations, we set the nominal FDR level to be 0.10. We have also replicated the basic simulations by setting the nominal level to be 0.05, and similar conclusions can be made.

Figure 4.

Figure 4

Comparing BH(○), AP(Δ), localFDR(×), OR(+), and LIS(□) in basic simulation 3: (a)FDR vs n, (b)FNR vs n, (c)ATP vs n.

4 Simulations on Genetic Data

Unlike the fabricated dependence structures in the basic simulations in Section 3, the dependence structure in the simulations on genetic data in this section is real. We simulate the linkage disequilibrium structure of a segment on human chromosome 22, and treat a test of whether a SNP is associated as one individual test. We follow the simulation settings in the work of Wu et al. (2010). We use HAPGEN2 (Su et al., 2011) and the CEU sample of HapMap (The International HapMap Consortium, 2003) (Release 22) to generate SNP genotype data at each of the 2, 420 loci between bp 14431347 and bp 17999745 on Chromosome 22. A total of 685 out of 2, 420 SNPs can be genotyped with the Affymetrix 6.0 array. These are the typed SNPs that we use for our simulations. Within the overall 2, 420 SNPs, we randomly select 10 SNPs to be the causal SNPs. All the SNPs on the Affymetrix 6.0 array whose r2 values, according to HapMap, with any of the causal SNPs are above t are set to be the associated SNPs. In the simulations, we report results for three different t values, namely 0.8, 0.5 and 0.25. We also simulate three different genetic models (additive model, dominant model, and recessive model) with different levels of relative risk (1.2 and 1.3). In total, we simulate 250 cases and 250 controls. The experiment is replicated for 100 times and the average result is provided. With the simulated data, we apply our multiple testing procedure (LIS) and three baseline procedures: the BH procedure, the adaptive p-value procedure (AP), and the local false discovery rate procedure (localFDR). Because the dependence structure is real and the ground truth parameters are unknown to us, we do not have the oracle procedure in the simulations on genetic data.

With the simulated genetic data, we use two commonly used tests in genetic association studies, namely two-proportion z-test and Cochran-Armitage’s trend test (CATT) (Cochran, 1954, Armitage, 1955, Slager & Schaid, 2001, Freidlin et al., 2002) as the individual tests for the association of each SNP. CATT also yields an asymptotic 𝒩(0, 1) under the null and 𝒩(µ, 1) under the alternative (µ is nonzero). Therefore, we parameterize ψ=(μ1,σ12) where µ1 and σ12 are the mean and variance of the test statistics under alternative. The graph structure is built as follows. Each SNP becomes a node in the graph. For each SNP, we connect it with the SNP with the highest r2 value with it. There are in total 490 edges in the graph. We further categorize the edges into a high correlation edge set ℰh (r2 above 0.8), medium correlation edge set ℰm (r2 between 0.5 and 0.8) and low correlation edge set ℰl (r2 between 0.25 and 0.5). We have three different parameters (ϕh, ϕm, and ϕl) for the three sets of edges. Then the density of θ in formula (1) taks the form

P(θ;ϕ)exp{(i,j)hϕhI(θi=θj)+(i,j)mϕmI(θi=θj)+(i,j)lϕlI(θi=θj)},

, where ii = θj) is an indicator variable that indicates whether θi and θj take the same value. In the MCMC algorithm, we run the Markov chain for 20, 000 iterations with a burn-in of 100 iterations. In the PCD algorithm, we generate 100 particles. In each iteration of PCD learning, the particles move forward for 5 iterations (the n parameter in PCD-n). The learning rate in PCD gradually decreases as suggested by Tieleman (2008). The EM algorithm converges after about 10 to 20 iterations, which usually take less than 10 minutes on a 3.00GHz CPU.

Figure 5 shows the performance of the procedures in the additive models with the homozygous relative risk set to 1.2 and 1.3. The test statistics are from a two-proportion z-test. We have also replicated the simulations on Cochran-Armitage’s trend test, and the results are almost the same. In Figure 5, table (1) summarizes the empirical FDR and the total number of true positives (#TP) of our LIS procedure, BH, AP and localFDR (lfdr), in the additive models with different (homozygous) relative risk levels, when we vary t and when we vary the nominal FDR level α. We regard a SNP having r2 above t with any causal SNP as an associated SNP, and we regard a rejection of the null hypothesis for an associated SNP as a true positive. Our LIS procedure and localFDR are valid while being conservative. BH and AP appear liberal in some of the configurations. In any of the circumstances, our LIS procedure can identify more associated SNPs than the baselines. We can find a clue to why our procedure LIS is being conservative from the results in Figure 3. In basic simulation 2, we observe that when the parameters µ and ϕ are heterogeneous and we carry out the data-driven procedure under the homogeneous parameter assumption, the data-driven procedure is conservative. The discrepancy between the nominal FDR level and the empirical FDR level increases as the parameters move further away from homogeneity. Although we assign three different parameters ϕh, ϕm, and ϕl to ℰh, ℰm and ℰl respectively, the edges within the same set (e.g. ℰl) may still be heterogeneous. The fact that the LIS procedure recaptures more true positives than the baselines while remaining more conservative in many configurations indicates that the local indices of significance provide a ranking more efficient than the ranking provided by the p-values from the individual tests. Therefore, we further plot the ROC curves and precision-recall (PR) curves when we rank SNPs by LIS and by the p-values from the two-proportion z-test. The ROC curve and PR curve are vertically averaged from 100 replications. Subfigures (2a)-(2f) are for the additive model with homozygous relative risk level set to be 1.2. Subfigures (3a)-(3f) are for the additive model with homozygous relative risk level set to be 1.3. It is observed that the curves from LIS dominate those from the p-values from individual tests in most places, which further suggests that LIS provides a more efficient ranking of the SNPs than the individual tests.

Figure 5.

Figure 5

Comparison of BH, AP, localFDR and LIS in the additive models when we vary relative risk rr, t and the nominal FDR level α. Table (1) summarizes results. Subfigures (2a)-(2f) shows ROC and PR curves of LIS (solid red lines) and individual p-values (dashed green lines) with rr = 1.2. Subfigures (3a)-(3f) shows ROC and PR curves of LIS (solid red lines) and individual p-values (dashed green lines) with rr = 1.3.

Figure 6 shows the performance of the procedures in the dominant model and the recessive model with the homozygous relative risk set to be 1.2. The test statistics are from a two-proportion z-test. In Figure 6, table (1) summarizes the empirical FDR and the total number of true positives (#TP) of our LIS procedure, BH, AP and localFDR (lfdr) in the dominant model and the recessive model when we vary t and when we vary the nominal FDR level α. Our LIS procedure and localFDR are valid while being conservative in all configurations, and they appear more conservative in the recessive model than in the dominant model. On the other hand, BH and AP appear liberal in the recessive model. Our LIS procedure still confers an advantage over the baselines in the dominant model. The LIS procedure also recaptures almost the same number of true positives as BH and AP while maintaining a much lower FDR in the recessive model. Again, we further plot the ROC curves and precision-recall curves when we rank SNPs by LIS and by the p-values from individual tests. Subfigures (2a)-(2f) are for the dominant model. Subfigures (3a)-(3f) are for the recessive model. It is also observed that the curves from LIS dominate those from the p-values from individual tests in most places, which also suggests that LIS provides a more efficient ranking.

Figure 6.

Figure 6

Comparison of BH, AP, localFDR and LIS in the dominant model and the recessive model with different t values and different nominal FDR α values. Table (1) summarizes results. Subfigures (2a)-(2f) shows ROC and PR curves of LIS (solid red lines) and individual p-values (dashed green lines) in the dominant model. Subfigures (3a)-(3f) shows ROC and PR curves of LIS and individual p-values in the recessive model.

5 Real-world Application

Our primary GWAS dataset on breast cancer is from NCI’s Cancer Genetics Markers of Susceptibility (CGEMS) (Hunter et al., 2007). 528, 173 SNPs for 1, 145 cases and 1, 142 controls are genotyped on the Illumina HumanHap500 array. Our secondary GWAS dataset comes from Marshfield Clinic. The Personalized Medicine Research Project (McCarty et al., 2005), sponsored by Marshfield Clinic, was used as the sampling frame to identify 162 breast cancer cases and 162 controls. The project was reviewed and approved by the Marshfield Clinic IRB. Subjects were selected using clinical data from the Marshfield Clinic Cancer Registry and Data Warehouse. Cases were defined as women having a confirmed diagnosis of breast cancer. Both the cases and controls had to have at least one mammogram within 12 months prior to having a biopsy. The subjects also had DNA samples that were genotyped using the Illumina HumanHap660 array, as part of the eMERGE (electronic MEdical Records and Genomics) network by McCarty et al. (2011).

We apply our multiple testing procedure on the CGEMS data. The settings of the procedure are the same as in the simulations on genetic data in Section 4. The individual test is two-proportion z-test. Our procedure reports 32 SNPs with LIS value of 0.0 (an estimated probability 1.0 of being associated). We further calculate the per-allele odds-ratio of these SNPs on the Marshfield data, and 14 of them have an odds-ratio around 1.2 or above. The details about the 14 SNPs are given in supplementary material. There are two clusters among them. First, rs3870371, rs7830137 and rs920455 (on chromosome 8) locate near each other and near the gene hyaluronan synthase 2 (HAS2) which has been shown to be associated with invasive breast cancer by many studies (Udabage et al., 2005; Li et al., 2007; Bernert et al., 2011). The other cluster includes rs11200014, rs2981579, rs1219648, and rs2420946 on chromosome 10. They are exactly the 4 SNPs reported byHunter et al. (2007). Their associated gene FGFR2 is also well known to be associated with breast cancer. SNP rs4866929 on chromosome 5 is also very likely to be associated because it is highly correlated (r2=0.957) with SNP rs981782 (not included in our data) which was identified from a much larger dataset (4, 398 cases and 4, 316 controls and a follow-up confirmation stage on 21, 860 cases and 22, 578 controls) by Easton et al. (2007).

6 Conclusion

In this paper, we use an MRF-coupled mixture model to leverage the dependence in multiple testing problems, and show the improved numerical performance on a variety of simulations and its applicability in a real-world GWAS problem. A theoretical question of interest is whether this graphical model based procedure is optimal in the sense that it has the smallest FNR among all the valid procedures. The optimality of the oracle procedure can be proved under the compound decision framework (Sun & Cai, 2007, 2009), as long as an exact inference algorithm exists or an approximate inference algorithm can be guaranteed to converge to the correct marginal probabilities. The asymptotic optimality of the data-driven procedure (the FNR yielded by the data-driven procedure approaches the FNR yielded by the oracle procedure as the number of tests m → ∞) requires consistent estimates of the unknown parameters in the graphical models. Parameter learning in undirected models is more complicated than in directed models due to the normalization constant. To the best of our knowledge, asymptotic properties of parameter learning for hidden MRFs and MRF-coupled mixture models have not been investigated. Therefore, we cannot prove the asymptotic optimality of the data-driven procedure so far, although we can observe its close-to-oracle performance in the basic simulations.

Supplementary Material

Supplementary Material

Acknowledgements

The authors acknowledge the support of the Wisconsin Genomics Initiative, NCI grant R01CA127379-01 and its ARRA supplement 3R01CA127379-03S1, NIGMS grant R01GM097618-01, NLM grant R01LM011028-01, NIEHS grant 5R01ES017400-03, eMERGE grant 1U01HG004608-01, NSF grant DMS-1106586 and the UW Carbone Cancer Center.

Contributor Information

Jie Liu, Computer Sciences, UW-Madison.

Peggy Peissig, Marshfield Clinic Research Foundation.

Chunming Zhang, Statistics, UW-Madison.

Elizabeth Burnside, Radiology, UW-Madison.

Catherine McCarty, Essentia Institute of Rural Health.

David Page, BMI & CS, UW-Madison.

References

  1. Armitage P. Tests for linear trends in proportions and frequencies. BIOMETRICS. 1955;11:375C386. [Google Scholar]
  2. Asuncion AU, Liu Q, Ihler AT, Smyth P. Particle filtered MCMC-MLE with connections to contrastive divergence. ICML. 2010 [Google Scholar]
  3. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. ANN MATH STAT. 1970;41(1):164–171. [Google Scholar]
  4. Benjamini Y, Heller R. False discovery rates for spatial signals. J AM STAT ASSOC. 2007;102:1272–1281. [Google Scholar]
  5. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J ROY STAT SOC B. 1995;57(1):289–300. [Google Scholar]
  6. Benjamini Y, Hochberg Y. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J EDUC BEHAV STAT. 2000;25(1):60–83. [Google Scholar]
  7. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. ANN STAT. 2001;29:1165–1188. [Google Scholar]
  8. Bernert B, Porsch H, Heldin P. Hyaluronan synthase 2 (HAS2) promotes breast cancer cell invasion by suppression of tissue metalloproteinase inhibitor 1 (TIMP-1) J BIOL CHEM. 2011;286(49):42349–42359. doi: 10.1074/jbc.M111.278598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Blanchard G, Roquain E. Adaptive false discovery rate control under independence and dependence. J MACH LEARN RES. 2009;10:2837–2871. [Google Scholar]
  10. Celeux G, Forbes F, Peyrard N. EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognition. 2003;36:131–144. [Google Scholar]
  11. Chatzis SP, Varvarigou TA. A fuzzy clustering approach toward hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Transactions on Fuzzy Systems. 2008;16:1351–1361. [Google Scholar]
  12. Cochran WG. Some methods for strengthening the common chi-square tests. BIOMETRICS. 1954;10:417–451. [Google Scholar]
  13. Crouse MS, Nowak RD, Baraniuk RG. Wavelet-based statistical signal processing using hidden Markov models. IEEE T SIGNAL PROCES. 1998;46(4):886–902. [Google Scholar]
  14. Easton DF, Pooley KA, Dunning AM, Pharoah PDP, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen C-Y, Wu P-E, Wang H-C, Eccles D, Evans GD, Peto J, Fletcher O, Johnson N, Seal S, Stratton MR, Rahman N, Chenevix-Trench G, Bojesen SE, Nordestgaard BG, Axelsson CK, Garcia-Closas M, Brinton L, Chanock S, Lissowska J, Peplonska B, Nevanlinna H, Fagerholm R, Eerola H, Kang D, Yoo K-Y, Noh D-Y, Ahn S-H, Hunter DJ, Hankinson SE, Cox DG, Hall P, Wedren S, Liu J, Low Y-L, Bogdanova N, Schürmann P, Dörk T, Tollenaar RAEM, Jacobi CE, Devilee P, Klijn JGM, Sigurdson AJ, Doody MM, Alexander BH, Zhang J, Cox A, Brock IW, Macpherson G, Reed MWR, Couch FJ, Goode EL, Olson JE, Meijers-Heijboer H, van den Ouweland A, Uitterlinden A, Rivadeneira F, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Hopper JL, Mccredie M, Southey M, Giles GG, Schroen C, Justenhoven C, Brauch H, Hamann U, Ko Y-D, Spurdle AB, Beesley J, Chen X, Mannermaa A, Kosma V-M, Kataja V, Hartikainen J, Day NE, Cox DR, Ponder BAJ. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087–1093. doi: 10.1038/nature05887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Efron B. Correlation and large-scale simultaneous significance testing. J AM STAT ASSOC. 2007;102(477):93–103. [Google Scholar]
  16. Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J AM STAT ASSOC. 2001;96:1151–1160. [Google Scholar]
  17. Fan J, Han X, Gu W. Control of the false discovery rate under arbitrary covariance dependence (to appear) J AM STAT ASSOC. 2012 doi: 10.1080/01621459.2012.720478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Farcomeni A. Some results on the control of the false discovery rate under dependence. SCAND J STAT. 2007;34(2):275–297. [Google Scholar]
  19. Finner H, Roters M. Multiple hypotheses testing and expected number of type I errors. ANN STAT. 2002;30:220–238. [Google Scholar]
  20. Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. HUM HERED. 2002;53(3):146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
  21. Friguet C, Kloareg M, Causeur D. A factor model approach to multiple testing under dependence. J AM STAT ASSOC. 2009;104(488):1406–1415. [Google Scholar]
  22. Ganapathi V, Vickrey D, Duchi J, Koller D. Constrained approximate maximum entropy learning of Markov random fields. UAI. 2008 [Google Scholar]
  23. Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J AM STAT ASSOC. 1990;85(410):398–409. [Google Scholar]
  24. Genovese C, Wasserman L. Operating characteristics and extensions of the false discovery rate procedure. J ROY STAT SOC B. 2002;64:499–517. [Google Scholar]
  25. Genovese C, Wasserman L. A stochastic process approach to false discovery control. ANN STAT. 2004;32:1035–1061. [Google Scholar]
  26. Genovese C, Roeder K, Wasserman L. False discovery control with p-value weighting. BIOMETRIKA. 2006;93:509–524. [Google Scholar]
  27. Geyer CJ. Markov chain Monte Carlo maximum likelihood. COMP SCI STAT. 1991:156–163. [Google Scholar]
  28. Hinton G. Training products of experts by minimizing contrastive divergence. NEURAL COM- PUT. 2002;14:1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]
  29. Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, Wang J, Yu K, Chatterjee N, Orr N, Willett WC, Colditz GA, Ziegler RG, Berg CD, Buys SS, Mccarty CA, Feigelson HS, Calle EE, Thun MJ, Hayes RB, Tucker M, Gerhard DS, Fraumeni JF, Hoover RN, Thomas G, Chanock SJ. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. NAT GENET. 2007;39(7):870–874. doi: 10.1038/ng2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Jordan MI, Ghahramani Z, Jaakkola T, Saul LK. An introduction to variational methods for graphical models. MACH LEARN. 1999;37:183–233. [Google Scholar]
  31. Kschischang F, Frey B, Loeliger H-A. Factor graphs and the sum-product algorithm. IEEE T INFORM THEORY. 2001;47(2):498–519. [Google Scholar]
  32. Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. J ROY STAT SOC B. 1988;50(2):157–224. [Google Scholar]
  33. Leek JT, Storey JD. A general framework for multiple testing dependence. P NATL ACAD SCI USA. 2008;105(48):18718–18723. doi: 10.1073/pnas.0808709105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Li Y, Li L, Brown TJ, Heldin P. Silencing of hyaluronan synthase 2 suppresses the malignant phenotype of invasive breast cancer cells. INT J CANCER. 2007;120(12):2557–2567. doi: 10.1002/ijc.22550. [DOI] [PubMed] [Google Scholar]
  35. McCarty C, Wilke R, Giampietro P, Wesbrook S, Caldwell M. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. PERS MED. 2005;2:49–79. doi: 10.1517/17410541.2.1.49. [DOI] [PubMed] [Google Scholar]
  36. McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, Li R, Masys DR, Ritchie MD, Roden DM, Struewing JP, Wolf WA eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC MED GENET. 2011;4(1):13. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Murphy KP, Weiss Y, Jordan MI. Loopy belief propagation for approximate inference: An empirical study. UAI. 1999:467–475. [Google Scholar]
  38. Owen AB. Variance of the number of false discoveries. J ROY STAT SOC B. 2005;67:411–426. [Google Scholar]
  39. Romano J, Shaikh A, Wolf M. Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST. 2008;17:417–442. [Google Scholar]
  40. Salakhutdinov R. Learning in Markov random fields using tempered transitions. In NIPS. 2009:1598–1606. [Google Scholar]
  41. Sarkar SK. False discovery and false nondiscovery rates in single-step multiple testing procedures. ANN STAT. 2006;34(1):394–415. [Google Scholar]
  42. Schraudolph NN. Polynomial-time exact inference in NP-hard binary MRFs via reweighted perfect matching. In AISTATS. 2010 [Google Scholar]
  43. Schraudolph NN, Kamenetsky D. Efficient exact inference in planar Ising models. NIPS. 2009 [Google Scholar]
  44. Slager SL, Schaid DJ. Case-control studies of genetic markers: power and sample size approximations for Armitage’s test for trend. HUM HERED. 2001;52(3):149–153. doi: 10.1159/000053370. [DOI] [PubMed] [Google Scholar]
  45. Storey JD. A direct approach to false discovery rates. J ROY STAT SOC B. 2002;64:479–498. [Google Scholar]
  46. Storey JD. The positive false discovery rate: A Bayesian interpretation and the q-value. ANN STAT. 2003;31(6):2013–2035. [Google Scholar]
  47. Su Z, Marchini J, Donnelly P. HAP-GEN2: simulation of multiple disease SNPs. BIOIN- FORMATICS. 2011 doi: 10.1093/bioinformatics/btr341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sun W, Cai TT. Oracle and adaptive compound decision rules for false discovery rate control. J AM STAT ASSOC. 2007;102(479):901–912. [Google Scholar]
  49. Sun W, Cai TT. Large-scale multiple testing under dependence. J ROY STAT SOC B. 2009;71:393–424. [Google Scholar]
  50. The International HapMap Consortium. The international HapMap project. NATURE. 2003;426:789–796. doi: 10.1038/nature02168. [DOI] [PubMed] [Google Scholar]
  51. Tieleman T. Training restricted Boltzmann machines using approximations to the likelihood gradient. ICML. 2008:1064–1071. [Google Scholar]
  52. Tieleman T, Hinton G. Using fast weights to improve persistent contrastive divergence. In ICML. 2009:1033–1040. [Google Scholar]
  53. Udabage L, Brownlee GR, Nilsson SK, Brown TJ. The over-expression of HAS2, Hyal-2 and CD44 is implicated in the invasiveness of breast cancer. EXP CELL RES. 2005;310(1):205–217. doi: 10.1016/j.yexcr.2005.07.026. [DOI] [PubMed] [Google Scholar]
  54. Wainwright MJ, Jaakkola TS, Willsky AS. Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE T INFORM THEORY. 2003a;49:2003. [Google Scholar]
  55. Wainwright MJ, Jaakkola TS, Willsky AS. Tree-reweighted belief propagation algorithms and approximate ML estimation via pseudomoment matching. In AISTATS. 2003b [Google Scholar]
  56. Weiss Y. Correctness of local probability propagation in graphical models with loops. NEURAL COMPUT. 2000;12(1):1–41. doi: 10.1162/089976600300015880. [DOI] [PubMed] [Google Scholar]
  57. Welling M, Sutton C. Learning in Markov random fields with contrastive free energies. AIS-TATS. 2005 [Google Scholar]
  58. Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X. Powerful SNP-set analysis for case-control genomewide association studies. AM J HUM GENET. 2010;86(6):929–942. doi: 10.1016/j.ajhg.2010.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Wu WB. On false discovery control under dependence. ANN STAT. 2008;36(1):364–380. [Google Scholar]
  60. Yedidia JS, Freeman WT, Weiss Y. NIPS. MIT Press; 2000. Generalized belief propagation. pp. 689–695. [Google Scholar]
  61. Yekutieli D, Benjamini Y. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J STAT PLAN INFER. 1999;82:171–196. [Google Scholar]
  62. Zhang C, Fan J, Yu T. Multiple testing via FDRL for large-scale imaging data. ANN STAT. 2011;39(1):613–642. doi: 10.1214/10-AOS848SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging. 2001 doi: 10.1109/42.906424. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES