Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 11.
Published in final edited form as: Biometrika. 2015 Mar 2;102(2):247–266. doi: 10.1093/biomet/asu074

Testing Differential Networks with Applications to Detecting Gene-by-Gene Interactions

Yin Xia 1, Tianxi Cai 2, T Tony Cai 3
PMCID: PMC5426514  NIHMSID: NIHMS848446  PMID: 28502988

Summary

Model organisms and human studies have led to increasing empirical evidence that interactions among genes contribute broadly to genetic variation of complex traits. In the presence of gene-by-gene interactions, the dimensionality of the feature space becomes extremely high relative to the sample size. This imposes a significant methodological challenge in identifying gene-by-gene interactions. In the present paper, through a Gaussian graphical model framework, we translate the problem of identifying gene-by-gene interactions associated with a binary trait D into an inference problem on the difference of two high-dimensional precision matrices, which summarize the conditional dependence network structures of the genes. We propose a procedure for testing the differential network globally that is particularly powerful against sparse alternatives. In addition, a multiple testing procedure with false discovery rate control is developed to infer the specific structure of the differential network. Theoretical justification is provided to ensure the validity of the proposed tests and optimality results are derived under sparsity assumptions. A simulation study demonstrates that the proposed tests maintain the desired error rates under the null and have good power under the alternative. The methods are applied to a breast cancer gene expression study.

Keywords: Differential network, false discovery rate, Gaussian graphical model, gene-by-gene interaction, highdimensional precision matrix, large scale multiple testing

1. INTRODUCTION

High throughput technologies, enabling comprehensive monitoring of a biological system, have fundamentally transformed biomedical research. Studies using such technologies have led to successful molecular classifications of diseases into clinically relevant subtypes and genetic signatures predictive of disease progression and treatment response (van’t Veer et al., 2002; Gregg et al., 2008; Hu et al., 2009, e.g.). Irrespective of the technology used, analysis of high-throughput data typically considers one marker at a time and yields a list of differentially expressed genes or proteins. On the other hand, epistasis, or interactions between genes, has long been recognized as crucial to understanding the genetic architecture of disease phenotypes (Phillips, 2008; Eichler et al., 2010). Increasing empirical evidence from model organisms and human studies suggests that gene-by-gene interactions may make an important contribution to total genetic variation of complex traits (Zerba et al., 2000; Marchini et al., 2005). In this paper, we are specifically interested in gene-by-gene interactions with respect to the interactive effects of two genes on a binary disease trait D.

In the presence of gene-by-gene interactions, the dimensionality of the feature space becomes extremely high relative to the sample size. This, together with the variability of the data, imposes a significant methodological challenge in identifying gene-by-gene interactions using currently available studies, which typically have limited sample sizes and power. Recent development in interaction modeling has led to several useful methods including multi-factor dimensionality reduction (Ritchie et al., 2001; Moore, 2004), polymorphism interaction analysis (Mechanic et al., 2008), random forests (Breiman, 2001), various variations of logistic regression with interactive effects (Chatterjee et al., 2006; Chapman & Clayton, 2007; Kooperberg & Ruczinski, 2005; Kooperberg & LeBlanc, 2008) and sure independence screening (Fan & Lv, 2008). However, to overcome the high dimensionality, a majority of these methods use multistage procedures and marginal assessments of the effects of a gene pair without simultaneously accounting for the effects of other genes. Multistage procedures may have limited power in detecting genes that affect the outcome through interactions with other genes without strong main effects. The interactive effects detected through models that only consider one pair of genes at a time without conditioning on other genes may also result in false identification of interactions due to the discrepancy between conditional and unconditional effects. Furthermore, none of the existing methods provide false discovery rate control in the presence of interactions. Due to the large number of tests, the power of multiple testing procedures using the standard Bonferroni or naive false discovery rate corrections can dissipate quickly.

In this paper, through a Gaussian graphical model framework, we translate the problem of identifying gene-by-gene interactions associated with a binary trait D into the comparison of two high-dimensional precision matrices. Let G denote a p × 1 vector of genomic markers and assume that, conditional on D = d, G ~ N(μd, Σd), for d = 1, 2. Then the posterior risk given G is

pr(D=1|G)=g{constant12GT(Ω1Ω2)G+GT(Ω1μ1Ω2μ2)}

where g(x) = ex/(1 + ex) and Ωd=(ωi,j,d)=d1 is the precision matrix for G conditional on D = d. Hence, an interaction between the gene pair (i, j) affects the disease risk if and only if δi,j= ωi,j,1ωi,j,2 = 0. The difference between the two precision matrices, denoted by Δ = (δi,j) = Ω1 – Ω2, is called the differential network. This type of model for a differential network has been used in Li et al. (2007) and Danaher et al. (2014). We thus propose to test for gene-by-gene interactions both by testing the global hypotheses

H0:Δ=0versusH1:Δ0, (1)

and by simultaneously testing the hypotheses

H0,i,j:δi,j=0versusH1,i,j:δi,j0,1i<jp,

while controlling for the overall false discovery rate at a pre-specified level.

Few authors have considered testing the equality of two precision matrices in the high-dimensional setting. The global null hypothesis Δ = 0, or equivalently Ω1 = Ω2, corresponds to the hypothesis that none of the gene pairs have interactive effects on D. The equality of two precision matrices is equivalent to the equality of two covariance matrices, and the latter has been studied under various alternatives. Under the dense alternative, where Σ1 and Σ2 differ in a large number of entries, various sum-of-square type testing procedures have been proposed (Schott, 2007; Srivastava & Yanagihara, 2010; Li & Chen, 2012). Under the sparse alternative with Σ1 and Σ2 differing only in a small number of entries, Cai et al. (2013) introduced a particularly powerful test. However, in the gene-by-gene interaction setting, the goal is to identify the structure of the differential network. In such cases, it is often reasonable to assume that Δ is sparse, while Σ1 – Σ2 is not. Hence, testing procedures that can leverage information on the sparsity of Δ may improve power. Furthermore, due to the fundamental difference between conditional and unconditional dependences, the various procedures for testing the covariance matrices may not be well adapted to testing specific entries of the precision matrices.

The first goal of this paper is to develop a global test for H0 : Δ = 0 that is powerful against sparse alternatives. We then develop a multiple testing procedure for simultaneously testing the hypotheses {H0,i,j : 1 ≤ i < jp} with false discovery rate control to infer the structure of the differential network. In the high-dimensional setting, there is no sample precision matrix that one can use to approximate Ωd. We propose to infer Ωd by relating its elements to the coefficients of a set of regression models for G conditional on D = d. We then construct test statistics based on the covariances between the residuals from the fitted regression models. The testing procedures are easy to implement. A Matlab implementation is available in the Supplementary Material.

2. Global Testing of Differential Networks

2.1. Notation and Definitions

In this section we consider testing the global hypothesis (1). We begin with notation and definitions that will be used in the rest of the paper. Let Xk ε ℝp and Yk ε ℝp denote G given D = 1 and D = 2, respectively, Xk ~ N(μ11) for k = 1,…, n1, Yk ~ N(μ12) for k = 1,…, n2, where Σd = (σi,j,d) for d = 1, 2, and {Xk : k = 1,…, n1} and {Yk : k = 1,…, n2} are independent observations from the two populations. Let X = (X1,…, Xn1)T and Y = (Y1,…, Yn2)T denote the data matrices. Let Ωd=(ωi,j,d)=d1, for d = 1,2.

For subscripts, we use the convention that i stands for the ith entry of a vector and (i,j) for the entry in the ith row and jth column of a matrix, k represents the kth sample and d indexes the binary trait. Let βi,1 = (β1,i,1,…,βp−1,i,1)T denote the regression coefficients of Xk,i regressed on the rest of the entries of Xk and let βi,2 = (β1,i,2,…,βp−1,i,2)T denote the regression coefficients of Yk,i regressed on the rest of the entries of Yk.

For any vector μd with dimension p × 1, let μ−i,d denote the (p – 1) × 1 vector by removing the ith entry from μd. For a symmetric matrix A, let λmax(A) and λmin(A) denote the largest and smallest eigenvalues of A. For any p × q matrix A, Ai,j denotes the ith row of A with its jth entry removed and A−i,j denotes the jth column of A with its ith entry removed. The matrix Ai,j denotes a (p – 1) × (q – 1) matrix obtained by removing the ith row and jth column of A. For an n × p data matrix U = (U1,…, Un)T, let U.,i=(U1,iT,,Un,iT)T with dimension n×(p1),U¯.,i=n1k=1nUk,i with dimension 1 × (p − 1), U(i) = (U1,i,…, Un,i)T with dimension n×1,U¯(i)=(U¯i,,U¯i)T with dimension n × 1, where U¯i=n1k=1nUk,i, and U¯(,i)=(U¯,iT,,U¯,iT)T with dimension n × (p − 1). For tuning parameters λ, let λnd,i,d represent the ith tuning parameter for binary trait d, which depends on the sample size nd.

For a vector β = (β1,…p)T ε ℝp, define the q norm by |β|q=(i=1p|βi|q)1/q for 1 ≤ q∞. A vector β is called k-sparse if it has at most k nonzero entries. For a matrix Ω = (ωi,j)p×p, the matrix 1-norm is the maximum absolute column sum, ΩL1=max1i,jpi=1p|ωi,j|, the matrix elementwise infinity norm is defined to be ||Ω|| = max1≤i,jp |ωi,j| and the elementwise 1 norm is Ω1=i=1pj=1p|ωi,j|. For a matrix Ω, we say Ω is k-sparse if each row/column has at most k nonzero entries. For a set ℋ, denote by |ℋ| the cardinality of ℋ. For two sequences of real numbers {an} and {bn}, write an = O(bn) if there exists a constant C such that |an| ≤ C|bn| holds for all n, write an = o(bn) if limn→∞ an/bn = 0, and write anbn if there are positive constants c and C such that can/bnC for all n.

2 2. Testing Procedure

It is well known (e.g., Anderson, 2003, Section 2.5), that in the Gaussian setting the precision matrix can be described in terms of regression models. Specifically, we may write

Xk,i=αi,1+Xk,iβi,1+εk,i,1,(i=1,,p;k=1,,n1), (2)
Yk,i=αi,2+Yk,iβi,2+εk,i,2,(i=1,,p;k=1,,n2), (3)

where εk,i,dN(0,σi,i,di,i,di,i,d1i,i,d)(d=1,2), are independent of Xk,−i and Yk,−i respectively, and αi,d=μi,di,i,di,i,d1μi,d. The regression coefficient vectors βi,d and the error terms εk,i,d satisfy

βi,d=ωi,i,d1Ωi,i,d,ri,j,d=cov(εk,i,d,εk,j,d)=ωi,j,dωi,i,dωj,j,d,

where cov(·,·) denotes the population covariance. Since the null hypothesis H0 : Δ = 0 is equivalent to the hypothesis

H0:max1ijp|ωi,j,1ωi,j,2|=0,

a natural approach to test H0 is to first construct estimators of ωi,j,d, and then base the test on the maximum standardized differences. We first construct estimators of ri,j,d

Let β^i,d=(β^1,i,d,,β^p1,i,d)T be estimators of βi,d satisfying

max1ip|β^i,dβi,d|1=op{(logp)1}, (4)
max1ip|β^i,dβi,d|2=op{(ndlogp)1/4}. (5)

Estimators β^i,d that satisfy (4) and (5) can be obtained easily via methods such as the lasso and Dantzig selector. See Section 2.3 for details. Define the residuals by

ε^k,i,1=Xk,iX¯i(Xk,iX¯.,i)β^i,1,ε^k,i,2=Yk,iY¯i(Yk,iY¯.,i)β^i,2.

A natural estimator of ri,j,d is the sample covariance between the residuals,

ri,j,d=1ndk=1ndε^k,i,dε^k,j,d. (6)

However, when ij,ri,j,d tends to be biased due to the correlation induced by the estimated parameters and it is desirable to construct a bias-corrected estimator. Lemma 2 shows that

ri,j,d=Ri,j,dri,i,d(β^i,j,dβi,j,d)rj,j,d(β^j1,i,dβj1,i,d)+op{(ndlogp)12}

where Ri,j,d is the empirical covariance between {εk,i,d: k = 1,… ,nd} and {εk,j,d : k = 1,…,nd}. For 1 ≤ ijp, βi,j,d = − ωi,j,dj,j,d and βj−1,i,d = −ωi,j,di,i,d Thus, we propose a bias-corrected estimator of ri,j,d as

r^i,j,d=(ri,j,d+ri,i,dβ^i,j,d+rj,j,dβ^j1,i,d),1i<jp. (7)

The bias of r^i,j,d is of order max{ri,j,d(log p/nd)1/2,(ndlog p)−1/2}.

For i = j, note that ri,i,d = 1/ui,i,d. We show in Lemma 2 that

max1ip|ri,i,dri,i,d|=Op{(logp/nd)1/2},

which implies that r^i,i,d=ri,i,d is a nearly unbiased estimator of ri,i,d. A natural estimator of ωi,j,d can then be defined by

Ti,j,d=r^i,j,dr^i,i,dr^j,j,d,1ijp (8)

We test H0 : Δ = 0 based on the estimators T={Ti,j,1Ti,j,2:1ijp}

The estimators Ti,j,1Ti,j,2 in T are heteroscedastic and possibly have a wide range of variability. We first standardize Ti,j,1Ti,j,2 before combining information from all entries in T. Let Ui,j,d=(1/nd)k=1nd{εk,i,dεk,j,dE(εk,i,dεk,j,d)} and Ui,j,d=(ri,j,dUi,j,d)/(ri,i,drj,j,d). It will be shown in Lemma 2 that, uniformly in 1 ≤ i ≤ jp,

|Ti,j,dUi,j,d|=Op{(logp/nd)12}ri,j,d+op{(ndlogp)12}.

Let θi,j,d=var(Ui,j,d). Note that

θi,j,d=var{εk,i,dεk,j,d/(ri,i,drj,j,d)}/nd=(1+ρi,j,d2)/(ndri,i,drj,j,d),

where ρi,j,d2=βi,j,d2ri,i,d/rj,j,d. We then estimate θi,j,d by

θ^i,j,d=(1+β^i,j,d2r^i,i,d/r^j,j,d)/(ndr^i,i,dr^j,j,d).

Define the standardized statistics

Wi,j=Ti,j,1Ti,j,2(θ^i,j,1+θ^i,j,2)1/2,1ijp. (9)

Finally, we propose the following test statistic for testing the global null hypothesis H0,

Mn=max1ijpWi,j2=max1ijp(Ti,j,1Ti,j,2)2θ^i,j,1+θ^i,j,2. (10)

The asymptotic properties of Mn will be studied in detail in Section 3. Intuitively, {Wi,j} are approximately standard normal variables under the null H0 and they are only weakly dependent under suitable conditions. Thus Mn is the maximum of the squares of p(p + 1)/2 such random variables, so its value should be close to 2 log{p(p + 1)/2} ≈ 4 log p under H0. We show in Section 3 that, under certain regularity conditions, Mn − 4 log p − log log p converges to a type I extreme value distribution under H0 : Δ = 0.

Based on the limiting null distribution of Mn, which will be developed in Section 3.1, we define the test ψα by

Ψα=I(Mnqα+4logploglogp) (11)

where qα is the 1 − α quantile of the type I extreme value distribution with the cumulative distribution function exp{(8π)−1/2et/2}, i.e.,

qα=log(8π)2loglog(1α)1. (12)

The hypothesis H0 is rejected whenever ψα = 1.

2.3. Data-driven estimation of regression coefficients

The testing procedure requires the estimation of regression coefficients βi,d, for i = 1,…,p and d = 1, 2. Various estimators have been studied in the literature, including the lasso and Dantizg selector. Here, we use the lasso by solving the optimization problem,

β^i,1=Di,11/2argminup1{(2n1)1|(X.,iX¯(,i))Di,11/2u(X(i)X¯(i))|22+λn1,i,1|u|1}, (13)
β^i,2=Di,21/2argminvp1{(2n2)1|(Y.,iY¯(,i))Di,21/2v(Y(i)Y¯(i))|22+λn2,i,2|v|1}, (14)

where Di,d=diag(^i,i,d) and λnd,i,d=κd(σ^i,i,dlogp/nd)1/2 , d = 1,2. Then by Proposition 4.2 of Liu (2013), under Condition (C1) given in Section 3 and a mild condition on the sparsity of βi,d (i = 1,…, p, d = 1,2), the convergence rates in (4) and (5) can be guaranteed by using any κd > 2. The result is formally stated in Corollary 1. In practice, κd = 2 works well for global testing of H0 : Δ = 0, and for the multiple testing procedure with false discovery rate control, a data-driven algorithm is proposed in Section 5 to select κd adaptively.

2.4. Discussion

The global test ψα given in (11) is based on estimators of ωi,j,1ωi,j,2 Here we estimate ωi,j,d by first constructing estimators for ri,j,d = ωi,j,d/(ωi,i,dωj,j,d), and then estimating ri,j,d through bias correction of the residuals r^i,j,d defined in (7).

Liu (2013) considered multiple testing of entries of a single precision matrix Ω = (ωi,j). In the one-sample case, ωi,j = 0 is equivalent to ri,j= ωi,j/(ωi,iωj,j) = 0 under the null and ri,j is easier to estimate. The procedure in Liu (2013) is based on the estimation of ri,j instead of ωi,j. However, in Section 4 we will also consider multiple testing between two groups, and ωi,j,1= ωi,j,2 is not equivalent to ri,j,1= ri,j,2. Thus, it is necessary to construct testing procedures based directly on estimators of ωi,j,1 − ωi,j,2.

Testing the global hypothesis H0 : Ω1 = Ω2 is equivalent to testing H0 : Σ1 = Σ2, which has been well studied (Schott, 2007; Srivastava & Yanagihara, 2010; Li & Chen, 2012; Cai et al., 2013). In particular, Cai et al. (2013) constructed a global test for H0 : Σ1 = Σ2 that is powerful against the alternative where Σ1 − Σ2 is sparse. However, in many applications, the goal is to learn the structure of the differential network, and we are interested in both testing the global hypothesis H0 : Ω1 = Ω2 and multiple testing of the entrywise hypotheses H0,i,j : ωi,j,1 = ωi,j,2. In such cases, it is often reasonable to assume that Δ = Ω1 − Ω2 is sparse, but Σ1 − Σ2 is not. Hence, testing procedures for H0 : Σ1 = Σ2 cannot leverage information on the sparsity of Δ and more importantly do not naturally lead to a multiple testing procedure for simultaneously testing the entrywise hypotheses H0,i,j : ωi,j,1 = ωi,j,2.

3. Theoretical Results for the Global Test

3 1. Asymptotic Null Distribution of Mn

In this section, we analyze the properties of the new test for testing the global null hypothesis H0 : Δ = 0, including the null distribution of the test statistic Mn, the asymptotic size and power. We are particularly interested in the power of the new test under the alternative with Δ sparse. We further show that the power is minimax rate optimal.

Under assumptions (C1) and (C2), Theorem 1 indicates that under H0, Mn − 4 log p + log log p converges weakly to a Gumbel random variable with distribution function exp{−(8π)−1/2et/2}.

  • (C1)

    Assume that log p = o(n1/5), n1n2, and for some constant C0>0,C01λmin(Ωd)λmax(Ωd)C0, for d = 1,2. There exists some τ > 0 such that | Aτ| = o(p1/16) where Aτ = {(i,j) : |ωi,j,d| ≥ (log p)−2−τ, 1 ≤ i<jp, for d = 1 or 2}.

  • (C2)

    Let Dd be the diagonal of Ωd and let (ηi,j,d)=Rd=Dd1/2ΩdDd1/2, for d = 1,2. Assume that max1≤ijp |ηi,j,d| ≤ ηd ≤ 1 for some constant 0 < ηd < 1.

Condition (C1) on the eigenvalues is a common assumption in the high-dimensional setting and implies that most of the variables are not highly correlated with each other. Condition (C2) is also mild. For example, if max1≤ijp |ηi,j,d| = 1, then Ωd is singular. The following theorem states the asymptotic null distribution for Mn.

Theorem 1

Suppose that (C1), (C2), (4) and (5) hold. Then under H0, for any t ε ℝ,

pr(Mn4logp+loglogpt)exp{(8π)1/2exp(t/2)},asn1,n2,p, (15)

where Mn is defined in equation (10). Under H0, the convergence in (15) is uniform for all {Xk : k = 1,…, n1} and {Yk : k = 1,…, n2} satisfying (C1), (C2), (4) and (5).

Equations (4) and (5) are mild conditions on the estimator of βi,d in order to obtain the limiting distribution in Theorem 1. As discussed in Section 2 3, these conditions can be guaranteed by the lasso estimator for example.

Corollary 1

Suppose that (C1) and (C2) hold and max1≤i≤p |βi,d|0 = o{n1/2 / (log p)3/2}. Then under H0, for any κd > 2 in (13) and (14), and for any t ε ℝ,

pr(Mn4logp+loglogpt)exp{(8π)1/2exp(t/2)},n1,n2,p, (16)

where Mn is defined in (10).

3 2. Power Analysis

We now turn to an analysis of the power of the test ψα given in (11). We shall define the following class of precision matrices:

U(c)={(Ω1,Ω2):max1ijp|ωi,j,1ωi,j,2|(θi,j,1+θi,j,2)1/2c(logp)1/2}. (17)

The next theorem shows that the null parameter set in which Ω1 = Ω2 is asymptotically distinguishable from U(4) by the test ψα. That is, H0 is rejected by the test ψα with overwhelming probability if (Ω1,Ω2)U(4).

Theorem 2

Let the test ψα be given as in (11). Suppose that (C1), (4) and (5) hold. Then

inf(Ω1,Ω2)U(4)pr(Ψα=1)1,n,p.

The following result shows that this lower bound is rate-optimal. Let Tα be the set of all α-level tests, i.e., pr(Tα = 1) ≤ α under H0 for all TαTα.

Theorem 3

Suppose that log p = o(n). Let α, β > 0 and α + β < 1. Then there exists a constant c0 > 0 such that for all sufficiently large n and p,

inf(Ω1,Ω2)U(c0)supTαTαpr(Tα=1)1β.

Theorem 3 shows that, if c0 is sufficiently small, then any α level test is unable to reject the null hypothesis correctly uniformly over (Ω1,Ω2)U(c0) with probability tending to one. So the order (logp)1/2 in the lower bound of max1≤ijp{|ωi,j,1ωi,j,2/(θi,j,1 + θi,j,2)1/2} in (17) cannot be improved.

4. Multiple Testing with False Discovery Rate Control

If the global null hypothesis is rejected, it is often of interest to investigate the structure of the differential network Δ. A natural approach is to carry out simultaneous testing on the elements of Δ. In this section, we introduce a multiple testing procedure with false discovery rate control for testing (p2p) /2 hypotheses

H0,i,j:δi,j=0versusH1,i,j:δi,j0,1i<jp. (18)

The standardized differences of Ti,j,1 and Ti,j,2 are defined by the test statistics Wi,j=(Ti,j,1Ti,j,2)/(θ^i,j,1+θ^i,j,2)1/2 as in (9). Let t be the threshold level such that H0,i,j is rejected if |Wi,j |≥ t. Let ℋ0 = {(i, j) : δi,j = 0,1 ≤ i < jp} be the set of true nulls. Denote by R0(t)=(i,j)0I(|Wi,j|t) the total number of false positives, and by R(t) = Σ1≤i<j≤p I(|Wi,j|≥ t) the total number of rejections. The false discovery proportion and false discovery rate are defined as

FDP(t)=R0(t)R(t)1,FDR(t)=E{FDP(t)}.

An ideal choice of t would reject as many true positives as possible while controlling the false discovery rate and false discovery proportion at the pre-specified level α. That is, we select

t0=inf{0t2(logp)1/2:FDP(t)α}.

Since ℋ0 is unknown, we can estimate (i,j)0I{|Wi,j|t} by 2{1Φ(t)}|0| as in Liu (2013), where ϕ(t) is the standard normal cumulative distribution function. Note that |0| can be estimated by (p2p)/2 due to the sparsity of Δ. This leads to the following multiple testing procedure.

  1. Calculate the test statistics Wi,j.

  2. For given 0 ≥ α ≥ 1, calculate
    t^=inf{0t2(logp)1/2:2{1Φ(t)}(p2p)/2R(t)1α}.
    If t^ does not exists, set t^=2(logp)1/2.
  3. For 1 ≤ i < jp, reject H0,i,i,. if and only if |Wi,j|t^.

The following theorem shows that, under regularity conditions, the above procedure controls the false discovery proportion and false discovery rate at the pre-specified level α asymptotically.

Theorem 4

Let

Sρ={(i,j):1i<jp,|ωi,j,1ωi,j,2|(θi,j,1+θi,j,2)1/2(logp)1/2+ρ}.

Suppose for some ρ > 0 and some δ > 0, |Sρ|[1/{(8π)1/2α}+δ](loglogp)1/2. Suppose that |AT0|=o(pν) for any ν > 0, where Aτ is given in Condition (C1). Assume that q0=|0|cp2 for some c > 0, and (4) and (5) hold. Let q = (p2p)/2. Then under (C1) with pcnr for some c > 0 and r > 0, we have

lim(n,p)FDR(t^)αq0/q=1,FDP(t^)αq0/q1

in probability, as (n, p) → ∞.

The condition |Sρ|[1/{(8π)1/2α}+δ](loglogp)1/2 in Theorem 4 is mild, since there are (p2p)/2 hypotheses in total and this condition only requires a few entries with the standardized difference having magnitude exceeding {(log p)1/2+ρ/n}1/2 for some constant ρ > 0. The technical condition |AT0|=o(pν) for any ν > 0 is to ensure that most of the regression residuals are not highly correlated with each other under the null hypotheses H0,i,j : δi,j = 0.

The basic idea for the proof of Theorem 4 is similar to that in Liu (2013). However, the setting here is more complicated as ωi,j,1 and ωi,j,2 are not necessarily zero under H0,i,j : δi,j = 0. So the coordinates of the regression residuals in (2) and (3) can be correlated with each other. Thus slightly stronger conditions are needed and the proof is more involved.

5. Simulation Study

The proposed testing procedures are easy to implement, and the Matlab code is available in the Supplementary Material. We carry out a simulation study to investigate the numerical performance, including the size and power, of the global test Ψα and the false discovery rate controlled multiple testing procedure.

We first introduce the matrix models used in the simulations. Let D = (Di,j) be a diagonal matrix with Di,i = Unif(0.5, 2.5) for i = 1,…,p. The following four models under the null, Ω1=Ω2=Ω(m)=(ωi,j(m))(m=1,,4), are used to study the size of the tests.

  • Model 1: Ω(1)=(ωi,j(1)) where ωi,i(1)=1, ωi,i+1(1)=ωi+1,i(1)=0.6, ωi,i+2(1)=ωi+2,i(1)=0.3 and ωi,j(1)=0 otherwise. Ω(1) = D1/2Ω*(1)D1/2.

  • Model 2: Ω(2)=(ωi,j(2)) where ωi,j(2)=ωj,i(2)=0.5 for i = 10(k − 1) + 1 and 10(k − 1) + 2 ≤ j ≤ 10(k − 1) + 10, 1 ≤ kp/10. ωi,j(2)=0 otherwise. Ω(2) = D1/2*(2) + δI)/(1 + δ)D1/2 with δ = |λmin*(2))| + 0.05.

  • Model 3: Ω(3)=(ωi,j(3)) where ωi,i(3)=1, ωi,j(3)=0.8×Bernoulli(1,0.05) for i < j and ωj,i(3)=ωi,j(3). Ω(3) = D1/2*(3)+ δI)/(1 + δ)D1/2 with δ = |λmin*(3))| + 0.05.

  • Model 4: (4)=(σi,j(4)) where σi,i(4)=1, σi,j(4)=0.5 for 2(k − 1) + 1 ≤ ij ≤ 2k, where k = 1,…, [p/2] and σi,j(4)=0 otherwise. Ω(4) = d1/2{(Σ*(4) + δI)/(1 + δ)}−1 D1/2 with δ = |λmin*(4))| + 0.05.

For global testing of H0 : Δ = 0, the sample sizes are taken to be n1 = n2 = 100, while the dimension p varies over the values 50, 100, 200 and 400. For each model, data are generated from multivariate normal distributions with mean zero and covariance matrices 1=Ω11 and 2=Ω21 The nominal significance level for all the tests is set at α1 − 0.05.

To evaluate the power of the proposed tests, let U = (ui,j) be a matrix with eight random nonzero entries. The locations of four nonzero entries are selected randomly from the upper triangle of U, each with a magnitude generated randomly and uniformly from the set [−2ω(log p/n)1/2, −ω(log p/n)1/2] ∪ [ω(log p/n)1/2,2 ω(log p/n)1/2], where ω=max1ipωi,i(m). The other four nonzero entries in the lower triangle are determined by symmetry. We use the following four pairs of precision matrices (Ω1(m),Ω2(m))(m=1,,4), to show the power of the test, where Ω1(m)=Ω(m)+δI and Ω2(m)=Ω(m)+U+δI, with δ = |min{λmin(m) + U), λmin(m))}| + 0.05. The actual sizes and powers in percentage for the four models, reported in Table 1, are estimated from 1000 replications.

Table 1.

Empirical sizes and powers (%) for global testing with α1 = 0.05, n1 = n2 = 100, and 1000 replications.

p Model 1 Model 2 Model 3 Model 4
Size

50 3.8 3.9 5.4 4.4
100 3.6 4.4 4.1 3.8
200 3.4 3.6 3.7 3.9
400 3.5 3.7 3.6 3.5

Power

50 100 98.7 95.6 81.6
100 99.7 96.6 95.1 77.8
200 93.1 88.2 93.6 72.1
400 86.3 73.1 77.7 70.7

Table 1 shows that the sizes of the global test Ψα1 are close to the nominal level in all cases. This reflects the fact that the null distribution of the test statistic Mn is well approximated by its asymptotic distribution. The empirical sizes are slightly below the nominal level in some models, due to the correlation among the variables. Similar phenomena have also been observed in Cai et al. (2013) and are theoretically justified by their Proposition 1. Table 1 shows that the proposed test is powerful in all settings, although the two precision matrices differ only in eight entries with the magnitude of the difference of the order (log p/n)1/2.

In addition, we consider nearer alternatives by generating the nonzero entries randomly and uniformly from the set [−ω(2 log p/n)1/2, ω(2 log p/n)1/2]. The power results are summarized in Table 2. Under the nearer alternatives, the magnitude of the standardized difference of Ω1 − Ω2 is smaller and as a result the power is lower.

Table 2.

Empirical power (%) for global testing under nearer alternatives.

p Model 1 Model 2 Model 3 Model 4
Power under nearer alternative

50 90.3 71.6 58.9 20.6
100 89.4 70.3 60.8 22.8
200 81.9 55.2 54.2 21.7
400 73.5 54.7 57.7 17.5

More extensive simulation results are presented in the Supplementary Material. The proposed test significantly outperforms both that of Cai et al. (2013), which is powerful when Σ1 − Σ2 is sparse under the alternative, and that of Li & Chen (2012), which is powerful when Σ1 − Σ2 is dense under the alternative.

For simultaneous testing of the individual entries of the differential network Δ with false discovery rate control, we select λnd,i,d in (13) and (14) adaptively with the principle of making (i,j)0I(|Wi,j|t) and {22Φ(t)}|0| as close as possible. The algorithm is as follows.

  1. For any given i ∈{1,…,p}, let λn1,i,1=(s/20)(^i,i,1logp/n1)1/2 and λn2,i,2=(s/20)(^i,i,2logp/n2)1/2 for s = 1,…, 40. For each s, calculate β^i,d(s)(i=1,,p) and d = 1,2. Based on the estimated regression coefficients, construct the corresponding standardized difference Wi,j(s) for each s.

  2. Choose
    s^=argminl=110(1ijpI{|Wi,j(s)|Φ1(1l[1Φ{(logp)1/2}]/10)}lp(p1)[1Φ{(logp)1/2}]/101)2.

The tuning parameters are chosen to be λn1,i,1=s^/20(^i,i,1logp/n1)1/2 and λn2,i,2=s^/20(^i,i,2logp/n2)1/2.

Pairwise comparisons among these four models are considered. The sample sizes are n1 = n2 = 100, while the dimension p = 50, 100, and 200. The false discovery rate level is set at α2 = 0.1, and the empirical false discovery rate and the power of false discovery rate control in percentage, summarized in Table 3, are estimated from 100 replications. We examine the power based on the average powers for 100 replications as follows

1100l=1100(i,j)1I(|Wi,j,l|t^)|1|,

where Wi,j,l denotes standardized difference for the lth replication and 1 denotes the nonzero locations. For all six cases, the false discovery rates are close to α across all dimensions. For empirical power, the procedure is powerful when the dimension p is low, and retains high power for the comparisons between Model 1 and Models 2 and 4. However, for the comparison between Model 2 and Model 3, the power is low when dimension is high and this is because all of | ωi,j,1ωi,j,2|/(θi,j,1n1 + θi,j,2n2)1/2 is smaller than 0.25 when p = 200 and D = I. Similarly, most nonzero entries of the standardized difference for Model 2 and 4 are smaller than 0.24. Thus it is difficult to detect nonzero locations. Furthermore, under the same scenario, ωi,j/(θi,jn)1/2 is always smaller than 0.16 for Model 3, and thus the detection becomes harder when we compare Model 3 with other models. Thus, the power results are not good when Model 3 is included in the comparison.

Table 3.

Empirical false discovery rate and power (%) with α2 = 0.1, n1 = n2 = 100, and 100 replications.

p Models 1, 2 Models 1, 3 Models 2, 3 Models 1, 4 Models 2, 4 Models 3, 4
Empirical False Discovery Rate

50 10.5 11.0 12.6 12.2 11.5 10.2
100 9.5 10.0 12.1 11.8 11.4 9.5
200 9.7 10.4 11.2 11.7 11.6 10.3

Power

50 67.9 65.6 35.7 55.0 30.2 26.1
100 64.2 38.3 19.3 51.4 25.1 18.2
200 61.1 20.6 17.1 46.1 21.7 11.3

6. Real Data Analysis

The high throughput technology and massively parallel measurement of mRNA expression catalyzed a new area of genomic biomarkers. A number of prominent genomic markers have been identified to assist in predicting breast cancer patient survival in clinical practice, and increasingly, pharmacogenomic endpoints are being incorporated into the design of clinical trials (Olopade et al., 2008). Molecular pathways of pathogenesis for breast cancer have also been increasingly discovered and curated (Nathanson et al., 2001). However, the role of gene-by-gene interactions, within and across pathways, in breast cancer survival remain unclear. Here, we apply our procedures to identify gene-by-gene interactions important for breast cancer survival.

For illustration, we consider 32 pathways from the molecular signature database that are related to breast cancer survival. Examples include the MAPK/ERK, WNT, TGF-β, P13k-AKT-mTOR and ATRBRCA pathways. Existing literature has indicated that a defect in the MAPK pathway may lead to uncontrolled growth, which is a step necessary for the development of all cancers (Santen et al., 2002; Downward, 2003). Mutations or deregulated expression of genes in the Wnt pathway can induce cancer (Klaus & Birchmeier, 2008). The TGF-β signaling pathway is critical to a plethora of cellular processes including cell proliferation, apoptosis and differentiation (Shi & Massagué, 2003). An increase in the TGF-β2 expression is associated with response to tamoxifen for breast cancer patients (Buck & Knabbe, 2006). The ATRBRCA pathway describes the role of BRCA1, BRCA2 and ATR in cancer susceptibility (Venkitaraman, 2002). BRCA1 and BRCA2 are the best-known genes linked to breast cancer risk. Hence, these pathways may play critical roles in breast cancer progression. To examine the interactions between genes in these pathways, we applied our procedure to a recent breast cancer gene expression study of 295 patients with primary breast carcinomas from the Netherlands Cancer Institute (van de Vijver et al., 2002). Out of the 32 pathways, there are a total of p = 754 genes with available data in this study. The two populations we consider are the short term survivors, defined as those 78 patients who died within 5 years; and the long term survivors, defined as those 69 patients who survived more than 10 years. We are particularly interested in identifying gene pairs with interactive effects on the binary cancer survival trait using the proposed procedures. In this setting, the sparsity assumption about βi,k’s is reasonable as it is generally believed that transcriptional regulation of a single gene is generally defined by a small set of regulatory elements (Segal et al., 2003; Dobra et al., 2004).

Based on our proposed procedures, we identified nine pairs of gene-by-gene interactions as significant at a false discovery rate level of 0.1. An interaction here does not simply indicate a co-expression between a pair of genes, but instead represents a difference between the co-expression patterns among the long terms survivors and among the short term survivors. As shown in Figure 1, the majority of the genes involved in these interactions belong to five major pathways, the MAPK, WNT, TGF-β, Apoptosis, and ATRBRCA pathways, although many of these genes belong to multiple pathways. One pair of the identified interactions represent gene-by-gene interactions within pathways and the remaining eight pairs represent cross-talk between these pathways, some of which are previously documented. A total of five interactions are between the MAPK signaling pathway and the WNT and TGF-β, Apoptosis, ATRBRCA and MTA3 pathways. These cross-talks are not surprising since MAPK modulates a wide range of processes including gene expression, mitosis, proliferation, metabolism and apoptosis (Wada & Penninger, 2004). Several recent studies suggest extensive crosstalk between WNT and MAPK signaling pathways in cancer. For example, hyper-activation of MAPK signaling results in down-regulation of the WNT signal transduction pathway in melanoma, suggesting a negative crosstalk between the two pathways; while in colorectal cancer, stimulating the WNT pathway leads to activation of the MAPK pathway through Ras stabilization, representing a positive crosstalk (Guardavaccaro & Clevers, 2012). The observed interactive effect between the WNT and MAPK pathways suggests that the cross-talk between these two pathways may play an important role in breast cancer survival. The interaction between the tumor suppressor gene BRCA2 and the MAPK pathway has been documented in experiments with prostate cancer cells with upregulation of BRCA2 linked to an increase in MAPK activity (Moro et al., 2007). In the WNT pathway, the WNT1 gene promotes cell survival in various cell types and it has been experimentally shown that blocking WNT1 signaling can induce apoptotic cell death (You et al., 2004). Thus the interaction between WNT1 gene and the PRKACB gene in the Apoptosis pathway may also be crucial for breast cancer.

Fig. 1.

Fig. 1

Identified gene-by-gene interactions for the breast cancer example. The dashed lines between gene-paris represent detected interactions. Genes inside each circle belong to the same pathway whose name is also shown.

A. Appendix: Proofs

A·1. Technical Lemmas

We prove the main results in this section. We begin by collecting technical lemmas proved in the supplementary material. The first lemma is the classical Bonferroni inequality.

Lemma A1 (Bonferroni inequality)

Let B=t=1pBt. For any k < [p/2], we have

t=12k(1)t1Ftpr(B)t=12k1(1)t1Ft,

where Ft=1i1<<itppr(Bi1Bit).

For d = 1, 2, let Ui,j,d=nd1k=1nd(εk,i,dεk,j,dEεk,i,dεk,j,d), and define Ui,j,d=(ri,j,dUi,j,d)/(ri,i,drj,j,d) for 1 ≤ i < j ≤ p and Ui,i,d=(ri,i,d+Ui,i,d)/(ri,i,dri,i,d).

Lemma A2

Suppose that Conditions (C1), (4) and (5) hold. Then

max1ip|ri,i,dri,i,d|=Op{(logp/nd)1/2},

and

ri,j,d=Ri,j,dri,i,d(β^i,j,dβi,j,d)rj,j,d(β^j1,i,dβj1,i,d)+op{(ndlogp)1/2},

for 1 ≤ i < jp, where Ri,j,d is the empirical covariance between {εk,i,d : k = 1, …, nd} and {εk,j,d : k = 1, …, nd}. Consequently, uniformly in 1 ≤ i < jp,

r^i,j,d(ωi,i,dσ^i,i,d,ε+ωj,j,dσ^j,j,d,ε1)ri,j,d=Ui,j,d+op{(ndlogp)1/2},|Ti,j,dUi,j,d|=Op{(logp/nd)12}ri,j,d+op{nd(logp)1/2},

and uniformly in 1 ≤ ip,

|Ti,i,dUi,i,d|=op{(ndlogp)1/2},

where r^i,j,d is defined in (7), (σ^i,j,d,ε)=(1/nd)k=1nd(εk,dε¯d)(εk,dε¯d)T,εk,d=(εk,1,d,,εk,p,d) and ε¯d=nd1k=1ndεk,d.

Lemma A3

Let Xk ~ N(μ1, Σ1) for k = 1, …, n1 and Yk ~ N(μ2, Σ2) for k = 1, …, n2. Define

1=(σi,j,1)p×p=1n1k=1n1(Xμ1)(Xμ1)T,2=(σi,j,2)p×p=1n2k=1n2(Yμ2)(Yμ2)T.

Then, for some constant C > 0, σi,j,1σi,j,2 satisfies the large deviation bound

pr[max(i,j)S(σi,j,1σi,j,2σi,j,1+σi,j,2)2var{(Xk,iμ1,i)(Xk,jμ1,j)}/n1+var{(Yk,iμ2,i)(Yk,jμ2,j)}/n2x2]C|S|{1Φ(x)}+O(p1)

uniformly for 0 ≤ x ≤ (8 log p)1/2 and any subset S{(i, j):1ijp}.

The following lemma is needed for false discovery rate control in Theorem 4.

Lemma A4

Let Vi,j = (Ui,j,2Ui,j,1){var(εk,i,1εk,j,1)/n1 + var(εk,i,2εk,j,2)/n2}−1/2. Under the same conditions as in Theorem 4, we have for any ε > 0 that,

0ttppr[|(i,j)0\Aτ{I(|Vi,j|t)pr(|Vi,j|t)}2q0{1Φ(t)}|ε]=o(1),0tppr[|(i,j)0\Aτ{I(|Vi,j|t)pr(|Vi,j|t)}2q0{1Φ(t)}|ε]dt=o(vp),

where tp = (4 log p − log2 p – log3 p)1/2 and vp = 1/{log p(log4 p)2}1/2.

A·2. Proof of Theorem 1

Without loss of generality, throughout this section, we assume that ωi,i,d = 1 for d = 1, 2 and i = 1,…, p. Let A = {(i, j) : 1 ≤ i ≤ j ≤ p}. (C1) implies |Aτ|=o(p1/16). To prove Theorem 1, we first show that the terms in Aτ are negligible. Then we use Lemma 1, together with the Gaussian approximation technique, to show that pr(max(i,j)A\AτWi,j24logp+loglogpt)exp{(8π)1/2exp(t/2)}, where Wi,j is defined in equation (9).

For d = 1, 2, let Vi,j = (Ui,j,2 − Ui,j,1)/{var(εk,i,1εk,j,1)/n1 + var(εk,i,d εk,j,d)/n2}1/2, where var(εk,i,dεk,j,d)=ri,i,drj,j,d(1+ρi,j,d2) with ρi,j,d2=βi,j,d2ri,i,d/rj,j,d. The proof of Lemma 2 yields

max1ip|r^i,i,dri,i,d|=Op{(logp/n)12}, (A1)

and max1ip|r^i,i,dRi,i,d|=op{(ndlogp)1/2}, where n = max{n1, n2}. Note that

max1ijp|β^i,j,d2r^i,i,d/r^j,j,dρi,j,d2)=op(1/logp), (A2)

and max1ijp|ωi,i,dσ^i,i,d,ε +ωj,j,dσ^j,j,d,ε 2|=Op{(logp/n)1/2}. Also note that for (i, j) ∈ A\Aτ, we have |ωi,j,d| = o{(log p)−1}. Then by Lemma 2, it is easy to see that, under conditions (C1), (4) and (5), we have, for (i, j) ∈ A\Aτ, max(i,j)A\AτWi,j||Vi,j=op{(logp)1/2}. For (i,j) ∈ Aτ as a result of Lemma 2, we have Wi,j = Vi,j + bi,j + op 1og p–1/2), where bi,j=2{ωi,j(σ^i,i,1,εσ^i,i,2,ε)+ωi,j(σ^j,j,1,εσj,j,2,ε)}/(θ^i,j,1+θ^i,j,2)1/2, (σ^i,j,d,ε)=nd1k=1nd(εk,dε¯d)(εk,dε¯d)T, εk,d=(εk,1,d,εk,p,d) and ε¯d=(1/nd)k=1ndεk,d. Note that

|bi,j|2(2ρi,j21+ρi,j2)12[|σi,i,1,εσi,i,2,ε|{var(εk,i,12)/n1+var(εk,i,22)/n2}12+|σj,j,1,εσj,j,2,ε|{var(εk,j,12)/n1+var(εk,j,22)/n2}12]+o{(logp)1/2},

where σi,i,d,ε=nd1k=1ndεk,i,d2. Thus, we have

pr(max(i,j)AτWi,j24logploglogp+t)Card(Aτ){pr(Vi,j2logp/8)+pr(bi,j22logp)}=o(1),

where the last equality is a direct result of Lemma 3. Thus it suffices to prove that

pr(max(i,j)A\AτVi,j24logp+loglogpt)exp{(8π)1/2exp(t/2)}.

We arrange the indices {(i, j) : (i, j) ∈ A\Aτ} in any ordering and set them as {(im, jm) : m = 1, …, q} with q =Card(A\Aτ). Let n1/n2K with K ≥ 1, θm,d=var(εim,dεjm,d), for d = 1, 2 and define Zk,m= (n1/n2){εk,im,2εk,jm,2E(εk,im,2εk,jm,2)} for 1 ≤ k ≤ n2, Zk,m={εk,im,1εk,jm,1E(εk,im,1εk,jm,1)} for n2 + 1 ≤ k ≤ n1 + n2, Vm=(n12θm,2/n2+n1θm,1)1/2k=1n1+n2Zk,m and V^m=(n12θm,2/n2+n1θm,1)1/2k=1n1+n2Z^k,m, where Z^k,m=Zk,mI(|Zk,m|τn)E{Zk,mI(|Zk,m|τn)}, and τn = 32K1 log(p + n). Note that max(i,j)A\AτVi,j2=max1mqVm2, and that

max1mqn1/2k=1n1+n2E[|Zk,m|I{|Zk,m|32K1log(p+n)}]Cn1/2max1kn1+n2max1mqE[|Zk,m|I{|Zk,m|32K1log(p+n)}]Cn1/2(p+n)4max1kn1+n2max1mqE[|Zk,m|exp{|Zk,m|/(8K1)}]Cn1/2(p+n)4.

Hence, pr{max1mq|VmV^m|(logp)1}pr(max1mqmax1kn1+n2|Zk,m|τn)=O(p1). By the fact that |max1mqVm2max1mqV^m2|2max1mq|V^m|max1mq|VmV^m|+max1mq|VmV^m|2, it suffices to prove that for any t ∈ ℝ, as n, p → ∞,

pr(max1mqV^m24logp+loglogpt)exp{(8π)1/2exp(t/2)}. (A3)

By Lemma 1, for any integer l with 0 < l < q/2,

d=12l(1)d11m1<<mdqpr(j=1dFmj)pr(max1mqV^m2yp)d=12l1(1)d11m1<<mdqpr(j=1dFmj), (A4)

where yp = 4 log p − log log p + t and Fmj=(V^mj2yp). Let Zk,m=Z^k,m/(n1θm,2/n2+θm,1)1/2 for m = 1, …, q and Wk=(Zk,m1,Zk,md), for 1 ≤ kn1 + n2. Define |a|min=min1id|ai| for any vector aRd. Then we have

pr(j=1dFmj)=pr(|n212k=1n1+n2Wk|minyp12).

Then it follows from Theorem 1 in Zaïtsev (1987) that

pr(|n21/2k=1n1+n2Wk|minyp1/2)pr{|Nd|minyp1/2εn(logp)1/2}+c1d52exp{n1/2εnc2d3τn(logp)1/2}, (A5)

where c1 > 0 and c2 > 0 are constants, εn → 0 which will be specified later and Nd=(Nm1,Nmd) is a normal random vector with E(Nd) = 0 and cov(Nd)=n2/n1cov(W1)+cov(Wn2+1). Recall that d is a fixed integer which does not depend on n, p. Because logp=o(n1/5), we can let εn → 0 sufficiently slowly that, for any large M > 0

c1d5/2exp{n1/2εnc2d3τn(logp)1/2}=O(pM). (A6)

Combining (A4), (A5) and (A6) we have

pr(max1mqV^m2yp)d=12l1(1)d11m1<<mdqpr{|Nd|minyp1/2εn(logp)1/2}+o(1). (A7)

Similarly, using Theorem 1 in Zaïtsev (1987) again, we can get

pr(max1mqV^m2yp)d=12l(1)d11m1<<mdqpr{|Nd|minyp1/2+εn(logp)1/2}o(1). (A8)

We recall the following lemma, which is shown in the supplementary material of Cai et al. (2013).

Lemma A5

For any fixed integer d ≥ 1 and real number t ∈ ℝ,

1m1<<mdqpr{|Nd|minyp1/2±εn(logp)1/2}=1d!{(8π)1/2exp(t/2)}d{1+o(1)}. (A9)

It then follows from Lemma 5, (A7) and (A8) that

limsupn,ppr(max1mqV^m2yp)d=12l(1)d11d!{(8π)1/2exp(t/2)}dliminfn,ppr(max1mqV^m2yp)d=12l(1)d11d!{(8π)1/2exp(t/2)}d

for any positive integer l. By letting l → ∞, we obtain (A3) and Theorem 1 is proved.

A·3. Proof of Theorem 2

Let Mn1=max1ijp{Ti,j,1Ti,j,2(ωi,j,1ωi,j,2)}2/(θ^i,j,1+θ^i,j,2). It follows from the proof of Theorem 1 that pr(Mn14logp21loglogp)1, as n, p → ∞. By (A1), (A2) and the inequalities max1ijp(ωi,j,1ωi,j,2)2/(θ^i,j,1+θ^i,j,2)2Mn1+2Mn, and max1ijp(ωi,j,1ωi,j,2|(θ^i,j,1+θ^i,j,2)1/24(logp)1/2, we have pr(Mn ≥ qα + 4 log p − log log p) → 1 as n, p → ∞.

A·4. Proof of Theorem 3

To prove the lower bound result, we first construct the worst case scenario to test between Ω1 and Ω2, and then apply the arguments as shown in Baraud (2002).

Let ℳ denote the set of all subsets of {1,…, p} with cardinality pr, for r < 1/2. Let m^ be a random subset of {1,…, p}, which is uniformly distributed on ℳ. We construct a class of Ω1, N={Ωm^,m^}, such that ωi,j = 0 for ij and 1/ωi,i1=ρ1im^, for i, j = 1,…, p and ρ = c(log p/n)1/2, where c > 0 will be specified later. Let Ω2 = I and Ω1 be uniformly distributed on N. Let μρ be the distribution of Ω1 − I. Note that μρ is a probability measure on {ΔS(pr):ΔF2=prρ2}, where S(pr) is the class of matrices with pr nonzero entries. Let dpr1({Xn, Yn}) and dpr2({Xn, Yn}) be the functions with precision matrices Ω1 and Ω2 respectively, likelihood then we have

Lμρ=Lμρ({Xn,Yn})=Eμρ{dpr1({Xn,Yn})dpr2({Xn,Yn})},

where Eμρ() is the expectation on Ω1. By the arguments in Baraud (2002), it suffices to show that E(Lμρ2)1+o(1). It is easy to check that

Lμρ=Em^[i=1n1|m^|1/2exp{12ZiT(Ωm^I)Zi}],

where m^=Ωm^1 and Z1,,Zn~i,i,d.N(0,I). Thus, we have

E(Lμρ2)=E((pkp)1m[i=1n1|m|1/2exp{ZiT(ΩmI)Zi/2}])2=(pkp)2m,mE[i=1n1|m|1/21|m|1/2exp{ZiT(Ωm+Ωm2I)Zi/2}]

Set Ωm + Ωm 2I = (ai,j). It is easy to show that ai,j = 0 for ij, aj,j = 0 if j ε (mm′)c, aj,j = 2(1/(1 + ρ)−1) if j ε m ∩ m′ and aj,j = 1/(1 + ρ) −1 if j ε m \ m′ \ m. Let t = | m ∩ m′|. Then

E(Lμρ2)=(pkp)1t=0kp(kpt)(pkpkpt)1(1+ρ)kpn(1+ρ)(kpt)n(1+ρ1ρ)tn/2pkp(pkp)!/p!t=0kp(kpt)(kpp)t(11ρ2)tn/2={1+kpp(1ρ)n/2}kp(1+o(1)),

for r < 1/2. Thus, by letting c be sufficiently small, we have

E(Lμρ2)exp{kplog(1+kppc21)}(1+o(1))exp(kp2pc21)(1+o(1))=1+o(1).

A·5. Proof of Theorem 4

We first show that t^, as defined in Section 4, is obtained in the range (0, 2(log p)1/2). Then we illustrate that R0(t), defined in Section 4, is close to 2 {1 − Φ(t)}|ℋ0| by first showing the terms in Aτ are negligible. We then focus on the set ℋ0 \ Aτ and prove the result based on Lemma 4.

Under the condition of Theorem 4, we have Σ1≤i<j≤p I{|Wi,j| ≥ 2(log p)1/2} ≥ [1/{(8π)1/2 α} + δ](log2 p)1/2, with probability going to one. Hence we have with probability going to one,

(p2p)/2max{1i<jpI{|Wi,j|2(logp)1/2},1}p2p2{1(8π)1/2α+δ}1(log2p)1/2.

Let tp = (4 log p − log2 p − log3 p)1/2. Because 1Φ(tp)1/{(2π)1/2tp}exp(tp2/2), we have pr(1t^tp)1 according to the definition of t^ in the false discovery rate control algorithm in Section 4. Note that, for 0t^tp, we have

2{1Φ(t^)}(p2p)/2max{1i<jpI{|Wi,j|2(logp)1/2},1}=α.

Thus to prove Theorem 4, it suffices to prove that |(i,j)0{I(|Wi,j|t)G(t)}|/{q0G(t)}0 in probability, for 0 ≤ t ≤ {4 log p + o(log p)}1/2, where G(t) = 2{1 Φ(t)}. Now we consider two cases.

  1. If t = {4 log p + o(log p)}1/2, the proof of Theorem 1 yields that pr(max(i,j)AτWi,j2t2)=o(1). Thus, it suffices to prove that |(i,j)0{I(|Wi,j|t)G(t)}|/{q0G(t)}0 probability. For (i, j) ∊ ℋ0 \ Aτ, we have from the proof of Theorem 1 that max1≤i<j≤p | Wi,jVi,j | = op {(log p)−1/2}. Thus, it suffices to show that
    |(i,j)0\Aτεi,j(t)q0G(t)|0 (A10)
    in probability, where εi,j(t) = I(|Vi,j |≥ t) − G(t).
  2. If t ≤ (C log p)1/2 with C < 4, we have
    |(i,j)Aτ0{I(|Wi,j|t)I(|Vi,j|t)}q0G(t)|2|Aτ0|O(p2C/2)0
    in probability. Thus, it is again enough to show that
    |(i,j)0\Aτεi,j(t)q0G(t)|0 (A11)
    in probability. Define 0=0\Aτ. Let 0 ≤ t0 < ⋯ < tm = tp such that tltl−1 = vp for l = 1,…, m − 1 and tm − tm−1 ≤ vp. Thus we have m·~ tp/vp. For any t such that tl−1 ≤ t ≤ tl, we have
    (i,j)0I(|Vi,j|tl)q0G(tl)G(tl)G(tl1)(i,j)0I(|Vi,j|tl)q0G(t)(i,j)0I(|Vi,j|tl1)q0G(tl1)G(tl1)G(tl).
    Thus it suffices to prove max0lm|(i,j)0εi,j(tl)|/{q0G(tl)}0 in probability. Note that
    pr{max0lm|(i,j)0εi,j(tl)q0G(tl)|ε}l=1mpr{|(i,j)0εi,j(tl)q0G(tl)|ε}1vp0tppr{|(i,j)0εi,j(tl)q0G(t)|ε}dt+l=m1mpr{|(i,j)0εi,j(tl)q0G(tl)|ε}.

Thus by (A5) with d = 1 and Lemma 4, Theorem 4 is proved.

Footnotes

Supplementary Material

Supplementary material available at Biometrika online includes more extensive simulation esults comparing the numerical performance of the proposed global test with that of other tests, the proofs of Lemmas 2, 3 and 4, and the Matlab code for numerical implementation.

Contributor Information

Yin Xia, Department of Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, USA.

Tianxi Cai, Department of Biostatistics, Harvard School of Public Health, Harvard University, Boston, Massachusetts 02115, USA.

T. Tony Cai, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.

References

  1. Anderson TW. An Introduction To Multivariate Statistical Analysis. 3rd New York: Wiley-Intersceince; 2003. [Google Scholar]
  2. Baraud Y. Non-asymptotic minimax rates of testing in signal detection. Bernoulli. 2002;8:577–606. [Google Scholar]
  3. Breiman L. Random forests. Mach Learn. 2001;45:5–32. [Google Scholar]
  4. Buck MB, Knabbe C. TGF-Beta signaling in breast cancer. Ann N Y Acad Sci. 2006;1089:119–126. doi: 10.1196/annals.1386.024. [DOI] [PubMed] [Google Scholar]
  5. Cai T, Liu W, Xia Y. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Statist Assoc. 2013;108:265–277. [Google Scholar]
  6. Chapman J, Clayton D. Detecting association using epistatic information. Genet Epidemiol. 2007;31:894–909. doi: 10.1002/gepi.20250. [DOI] [PubMed] [Google Scholar]
  7. Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79:1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J R Statist Soc B. 2014;76:373–397. doi: 10.1111/rssb.12033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. Sparse graphical models for exploring gene expression data. J Multivariate Anal. 2004;90:196–212. [Google Scholar]
  10. Downward J. Targeting RAS signalling pathways in cancer therapy. Nat Rev Cancer. 2003;3:11–22. doi: 10.1038/nrc969. [DOI] [PubMed] [Google Scholar]
  11. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11:446–450. doi: 10.1038/nrg2809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion) J R Statist Soc B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gregg JP, Lit L, Baron CA, Hertz-Picciotto I, Walker W, Davis RA, Croen LA, Ozonoff S, Hansen R, Pessah IN, et al. Gene expression changes in children with autism. Genomics. 2008;91:22–29. doi: 10.1016/j.ygeno.2007.09.003. [DOI] [PubMed] [Google Scholar]
  14. Guardavaccaro D, Clevers H. Wnt/β-Catenin and MAPK Signaling: Allies and enemies in different battlefields. Sci Signal. 2012;5 doi: 10.1126/scisignal.2002921. pe15. [DOI] [PubMed] [Google Scholar]
  15. Hu VW, Sarachana T, Kim KS, Nguyen A, Kulkarni S, Steinberg ME, Luu T, Lai Y, Lee NH. Gene expression profiling differentiates autism case–controls and phenotypic variants of autism spectrum disorders: evidence for circadian rhythm dysfunction in severe autism. Autism Res. 2009;2:78–97. doi: 10.1002/aur.73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Klaus A, Birchmeier W. Wnt signalling and its impact on development and cancer. Nat Rev Cancer. 2008;8:387–398. doi: 10.1038/nrc2389. [DOI] [PubMed] [Google Scholar]
  17. Kooperberg C, Leblanc M. Increasing the power of identifying gene × gene interactions in genomewide association studies. Genet Epidemiol. 2008;32:255–263. doi: 10.1002/gepi.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kooperberg C, Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol. 2005;28:157–170. doi: 10.1002/gepi.20042. [DOI] [PubMed] [Google Scholar]
  19. Li J, Chen SX. Two sample tests for high-dimensional covariance matrices. Ann Statist. 2012;40:908–940. [Google Scholar]
  20. Li KC, Palotie A, Yuan S, Bronnikov D, Chen D, Wei X, Choi OW, Saarela J, Peltonen L. Finding disease candidate genes by liquid association. Genome Biol. 2007;8:R205. doi: 10.1186/gb-2007-8-10-r205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liu W. Gaussian graphical model estimation with false discovery rate control. Ann Statist. 2013;41:2948–2978. [Google Scholar]
  22. Marchini J, Donnelly P, Cardon L. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37:413–417. doi: 10.1038/ng1537. [DOI] [PubMed] [Google Scholar]
  23. Mechanic L, Luke B, Goodman J, Chanock S, Harris C. Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions. BMC Bioinformatics. 2008;9:146. doi: 10.1186/1471-2105-9-146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Moore J. Computational analysis of gene-gene interactions using multifactor dimensionality reduction. Expert Rev Mol Diagn. 2004;4:795–803. doi: 10.1586/14737159.4.6.795. [DOI] [PubMed] [Google Scholar]
  25. Moro L, Arbini AA, Marra E, Greco M. Constitutive activation of MAPK/ERK inhibits prostate cancer cell proliferation through upregulation of BRCA2. Int J Oncol. 2007;30:217–224. doi: 10.3892/ijo.30.1.217. [DOI] [PubMed] [Google Scholar]
  26. Nathanson K, Wooster R, Weber B. Breast cancer genetics: what we know and what we need. Nat Med. 2001;7:552–556. doi: 10.1038/87876. [DOI] [PubMed] [Google Scholar]
  27. Olopade O, Grushko T, Nanda R, Huo D. Advances in Breast Cancer: Pathways to Personalized Medicine. Clin Cancer Res. 2008;14:7988. doi: 10.1158/1078-0432.CCR-08-1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Phillips PC. Epistasisthe essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008;9:855–867. doi: 10.1038/nrg2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, Parl F, Moore J. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–147. doi: 10.1086/321276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Santen RJ, Song RX, Mcpherson R, Kumar R, Adam L, Jeng MH, Yue W. The role of mitogen-activated protein (MAP) kinase in breast cancer. J Steroid Biochem Mol Biol. 2002;80:239–256. doi: 10.1016/s0960-0760(01)00189-3. [DOI] [PubMed] [Google Scholar]
  31. Schott JR. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput Stat Data An. 2007;51:6535–6542. [Google Scholar]
  32. Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003;34:166–176. doi: 10.1038/ng1165. [DOI] [PubMed] [Google Scholar]
  33. Shi Y, Massagué J. Mechanisms of TGF-β signaling from cell membrane to the nucleus. Cell. 2003;113:685–700. doi: 10.1016/s0092-8674(03)00432-x. [DOI] [PubMed] [Google Scholar]
  34. Srivastava MS, Yanagihara H. Testing the equality of several covariance matrices with fewer observations than the dimension. J Multivariate Anal. 2010;101:1319–1329. [Google Scholar]
  35. van de Vijver M, He Y, Van’t Veer L, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]
  36. van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, Van Der Kooy K, Marton MJ, Witteveen AT, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
  37. Venkitaraman AR. Cancer susceptibility and the functions of BRCA1 and BRCA2. Cell. 2002;108:171–182. doi: 10.1016/s0092-8674(02)00615-3. [DOI] [PubMed] [Google Scholar]
  38. Wada T, Penninger JM. Mitogen-activated protein kinases in apoptosis regulation. Oncogene. 2004;23:2838–2849. doi: 10.1038/sj.onc.1207556. [DOI] [PubMed] [Google Scholar]
  39. You L, He B, Uematsu K, Xu Z, Mazieres J, Lee A, Mccormick F, Jablons DM. Inhibition of wnt-1 signaling induces apoptosis in β-catenin-deficient mesothelioma cells. Cancer Res. 2004;64:3474–3478. doi: 10.1158/0008-5472.CAN-04-0115. [DOI] [PubMed] [Google Scholar]
  40. Zaïtsev AY. On the gaussian approximation of convolutions under multidimensional analogues of sn bernstein’s inequality conditions. Probab Theory Rel. 1987;74:535–566. [Google Scholar]
  41. Zerba K, Ferrell R, Sing C. Complex adaptive systems and human health: the influence of common genotypes of the apolipoprotein E (ApoE) gene polymorphism and age on the relational order within a field of lipid metabolism traits. Hum Genet. 2000;107:466–475. doi: 10.1007/s004390000394. [DOI] [PubMed] [Google Scholar]

RESOURCES