Abstract
Comparing two population means of network data is of paramount importance in a wide range of scientific applications. Numerous existing network inference solutions focus on global testing of entire networks, without comparing individual network links. The observed data often take the form of vectors or matrices, and the problem is formulated as comparing two covariance or precision matrices under a normal or matrix normal distribution. Moreover, many tests suffer from a limited power under a small sample size. In this article, we tackle the problem of network comparison, both global and simultaneous inferences, when the data come in a different format, i.e., in the form of a collection of symmetric matrices, each of which encodes the network structure of an individual subject. Such data format commonly arises in applications such as brain connectivity analysis and clinical genomics. We no longer require the underlying data to follow a normal distribution, but instead impose some moment conditions that are easily satisfied for numerous types of network data. Furthermore, we propose a power enhancement procedure, and show that it can control the false discovery, while it has the potential to substantially enhance the power of the test. We investigate the efficacy of our testing procedure through both an asymptotic analysis and a simulation study under a finite sample size. We further illustrate our method with examples of brain connectivity analysis.
Keywords: Auxiliary information, False discovery rate, Multiple testing, Network data, Power enhancement
1. Introduction
With recent prevalence of network data, the problem of comparing two populations of networks is gaining increasing attention. Our motivation is brain connectivity analysis, which studies functional and structural brain architectures through neurophysiological measures of brain activities and synchronizations (Fornito et al., 2013). Accumulated evidences have suggested that, compared to a healthy brain, the brain connectivity network alters in the presence of numerous neurological disorders, for example, Alzheimer’s disease, autism spectrum disorder, among many others. Such alternations are believed to hold crucial insights of disease pathologies (Fox and Greicius, 2010). A typical brain connectivity study collects imaging scans, such as functional magnetic resonance imaging, or diffusion tensor imaging, from groups of subjects with and without disorder. Based on the imaging scan, a network is constructed for each individual subject, with the nodes corresponding to a common set of brain regions, and the edges encoding the functional or structural associations between the regions. A fundamental scientific question of interest is to compare the brain networks and to identify local connectivity patterns that alter between the two populations. Network comparison is equally interesting in many other scientific areas as well, for instance, clinical genomics, where of crucial interest is to understand and compare gene regulatory networks of patients with and without cancer (Luscombe et al., 2004).
In the context of brain connectivity analysis, there has been a rich literature on network estimation methods (Ahn et al., 2015; Qiu et al., 2016; Wang et al., 2016; Zhu and Li, 2018, among many others). Recently, Zou et al. (2017) and Lan et al. (2018) studied estimation of the covariance matrix of a multivariate vector as a function of the similarity measure of the covariates, or a function of the adjacency matrix. There is, however, a relative paucity of inference methods, especially simultaneous inference for individual links. Even though both can produce, in effect, a concise representation of the network structure, network inference is a fundamentally different problem than network estimation. Among the few existing network inference solutions, Kim et al. (2014) studied a number of two-sample tests based on network summary metrics or generalized linear models. However, they only compared two networks globally, without any inference on the individual links of the networks. Besides, some of their tests resorted to bootstrap or permutation, which is computationally intensive and slow. Ginestet et al. (2017) characterized the geometry of the space of undirected networks with edge weights, and developed an analog of the classical two-sample test for network empirical means. However, they again focused on the global test of two entire networks. Chen et al. (2015) developed a method to detect differentially expressed connectivity subnetworks under different clinical conditions. They resorted to a permutation test, and controlled the family-wise error rate. Xia et al. (2015) first encoded the connectivity network by a partial correlation matrix computed from vector-valued data under a normal distribution. They then proposed a multiple testing procedure to compare the partial correlation matrices from the two populations, along with a proper false discovery control. Xia and Li (2019) further extended the test to matrix-valued data under a matrix normal distribution. In both cases, the test statistics were constructed based on the vector or matrix-valued data, which, as we explain next, may not be directly observable. Moreover, the underlying data distribution may not always be normal or matrix normal. Durante and Dunson (2018) developed a fully Bayesian solution for network comparison, which is very flexible and can handle the data format of our problem, but it requires specification of a series of prior distributions and can be computationally intensive.
Applications such as brain connectivity analysis actually raise new challenges for network inference. First, the observed data come in the form of p × p matrices, where p is the number of network nodes. Each such matrix encodes the network structure for one individual subject, and a collection of network samples are observed. For instance, in brain structural connectivity, what one observes are the numbers of white matter fibers between pairs of brain anatomical regions. This matrix of counts forms a network observation for one subject, with brain regions constituting the nodes and the fiber counts the links, and we observe multiple such count-valued networks for multiple subjects. This is ultimately different from the data format studied in most existing network methods, where a network structure usually takes the form a covariance or precision matrix of some vector-valued or matrix-valued data. This fundamental difference in terms of the available data format would thus require a completely new problem formulation and inferential procedure. Second, in a multitude of applications including brain connectivity analysis, the sample size is usually very small, e.g., in tens. This calls for a testing procedure that is powerful enough to detect differentially expressed links under a limited sample size. In this article, we address the problem of comparing two populations of network data, more precisely, the two population means of networks. We aim to consider both global and simultaneous inferences, tackle the new data format, and explicitly enhance the power of the test.
Specifically, suppose we observe two groups of samples, and , where Sd,l denotes the observed symmetric p × p network data for the lth sample in the dth group, nd is the total number of network samples in the dth group, l = 1, … , nd, and d = 1, 2. Suppose , where is some distribution with a symmetric mean matrix sd = (sd,i,j)p×p. Our goal is to test whether the two population means are the same:
(1) |
If the global null in (1) is rejected, we further aim to identify at which locations the two mean matrices are different. That is, we wish to simultaneously test:
(2) |
In Xia et al. (2015), the observed data represents expressions of multiple genes for two groups of patients with long and short term survival. It is a vector, and is assumed to follow a normal distribution with the covariance matrix Σd. Let Rd denote the corresponding partial correlation matrix, i.e., the standardized version of , d = 1, 2. Then the network structure is encoded by Rd, and the problem becomes testing if R1 = R2. Xia and Li (2019) followed a similar setup, except that the observed data becomes a matrix, which represents brain temporal neural activity measures collected at multiple brain locations for two groups of patients with and without attention deficit hyperactivity disorder. It is assumed to follow a matrix normal distribution with the covariance Σd ⊗ Λd, and the network is still encoded by the standardized version of . The key difference for our setting is that, we do not always observe Xd,l directly, but instead Sd,l only. This difference in data format completely distinguishes our method from nearly all existing solutions such as Xia et al. (2015) and Xia and Li (2019). Moreover, we do not impose that the underlying data follows a normal or matrix normal distribution. Instead, we consider a general class of distributions for satisfying some moment condition. Our method works for many different types of network links, for instance, binary links when follows a light tailed distribution, or count links when follows a heavy-tailed distribution.
For the global test (1), we develop a global test statistic taken as the maximum of a set of individual test statistics. We then derive its limiting null distribution, and show the resulting global test is power minimax optimal asymptotically. For the simultaneous test (2), we first develop a multiple testing procedure, and show that it can asymptotically control the false discovery at the pre-specified level. Next we propose a method to substantially enhance the power of the simultaneous inference procedure for (2). Specifically, we extend the grouping-adjusting-pooling idea of Xia et al. (2019a), and modify it for our inference of network data.
Our proposal differs from the existing solutions and makes several useful contributions. First, to the best of our knowledge, there has been no solution directly targeting simultaneous hypothesis testing of individual links for the network data in the format of Sd,l. Our method bridges this gap, and offers a timely solution to a range of scientific applications where this form of problem and data is commonly encountered. Second, our global test statistic is constructed as the maximum of the individual test statistics for all links. This type of maximum statistic enjoys various advantages and has been commonly employed in the hypothesis testing literature (e.g., Cai et al., 2013; Xia et al., 2019b). However, the derivation of its asymptotics, as well as the properties of the subsequent multiple testing procedure, are far from trivial in our new context of network comparison. Moreover, we remark that, in some network data applications, the individual test statistics may be correlated, and a global test statistic that utilizes such correlations may result in a more powerful test. However, this may not always be the case. For instance, in our brain connectivity application, the nodes are usually the brain anatomical regions, which can scatter at distant locations of the brain. As a result, there is no obvious correlation structure for the individual test statistics built on the pairs of brain regions. Therefore, we do not explicitly impose or employ any correlation structure when constructing the global test statistic. On the other hand, in our power enhancement procedure, we implicitly utilize the fact that some individual test statistics may be correlated and clustered. We then use a data driven approach to find such clusters and incorporate this information in our test. Finally, the power enhancement approach we develop is particularly useful in numerous applications, e.g., brain connectivity analysis, where the sample size is limited. Although motivated by Xia et al. (2019a), our enhancement method differs from Xia et al. (2019a) considerably in several ways. We explicitly compare the two power enhancement procedures in Section 4.5. Overall, we feel our method provides a useful addition to the general toolbox of network inference.
We adopt the following notation throughout this article. For a symmetric matrix Ad, let λmax(Ad) and λmin(Ad) denote the largest and smallest eigenvalues of Ad, respectively. For a set , let denote its cardinality. For two sequences of real numbers {an} and {bn}, write an = O(bn) if there exists a constant C such that |an| ≤ C|bn| holds for all n, write an = o(bn) if limn→∞ an/bn = 0, and write an ≍ bn if there are positive constants c and C such that c ≤ an/bn ≤ C for all n. Write n = n1n2/(n1 + n2) and assume that n1 ≍ n2.
The rest of the article is organized as follows. Section 2 presents the moment conditions for the distribution of and show they are easily satisfied in numerous types of network data. Section 3 develops the global testing and the simultaneous testing for the two-sample network comparison, and Section 4 studies power enhancement, both of which are key to our proposal. Section 5 presents the simulations, and Section 6 presents two brain connectivity analysis examples as illustration. The Supplementary Material collects additional lemmas and the proofs.
2. Moment Conditions and Examples
We begin with some moment conditions imposed on . We then give a number of examples and show that those conditions are easily satisfied in numerous types of network data.
2.1. Moment conditions
We assume that the distribution of the network data Sd,l satisfies one of the following two conditions: a sub-Gaussian-type tail, or a polynomial-type tail, as stated below.
(C1) (Sub-Gaussian-tail). Suppose that log p = o(n1/5), and that there exist some constants η > 0 and K > 0, such that, for d = 1, 2,
(C2) (Polynomial-tail). Suppose that for some constants γ0, c > 0, and that there exist some constants ϵ > 0 and K > 0, such that, for d = 1, 2,
We first comment that, both conditions are common, and similar conditions have been often assumed in the high-dimensional setting (Cai et al., 2014; Van de Geer et al., 2014). These moment conditions are much weaker than the Gaussian assumption as usually required in the testing literature (Schott, 2007). Next we discuss a number of network examples that satisfy the above moment conditions, including Bernoulli and mixture Bernoulli data, Poisson data, correlation and partial correlation data. Furthermore, we discuss some examples where the distributions are heavy-tailed, but after some data transformation, they still satisfy the moment conditions. Examples include transformed normal count data and transformed Wishart count data.
2.2. Network data examples
The first example is binary network, which is arguably the most commonly seen network data type, where each link is a binary indicator. The Bernoulli distribution is often assumed; i.e., for Sd,l = (Sd,l,i,j)p×p, Sd,l,i,j follows a Bernoulli distribution with mean sd,i,j, u < sd,i,j < 1 − u for a constant 0 < u < 1, l = 1, … , nd, d = 1, 2, and 1 ≤ i < j ≤ p. In such case, Sd,l satisfies the sub-Gaussian-tail condition in (C1), e.g., with η = 1 and K = (1 − u) exp{u(1 − u)−1} + u exp{(1−u)u−1}. The same holds true for the mixture Bernoulli distribution as discussed in Durante and Dunson (2018). That is, for some integer H > 0 and randomly selected {ϕ1, … , ϕH} subject to and ϕh > 0, , with for some constant 0 < u < 1, x = 0, 1, h = 1, … , H, l = 1, … , nd, d = 1, 2 and 1 ≤ i < j ≤ p. For this example, Sd,l again satisfies the sub-Gaussian-tail condition in (C1), with η = 1 and K = (1 − u) exp{u(1 − u)−1} + u exp{(1 − u)u−1}.
The second example is correlation network, which is another equally common network data type. In brain functional connectivity analysis and many other applications, the network is often encoded by a Pearson correlation or a partial correlation matrix. Take the Pearson correlation network as an example. The functional imaging data is usually summarized as a spatial-temporal matrix. That is, for the lth subject in the dth group, the observed data is of the form , where p is the number of brain regions, and td is the number of repeated measures. Then the brain functional connectivity network is encoded by the sample correlation matrix , where Xd,l,(·,j) denotes the jth column of the matrix Xd,l and denotes the sample mean vector (Fornito et al., 2013). Next we show that, as long as Xd,l satisfies one of the conditions in Lemma 1, then Sd,l satisfies the sub-Gaussian-tail condition (C1).
Lemma 1. Suppose Xd,l satisfies one of the following conditions: (i) log p = o(t1/5), and there exist constants η′ > 0, K′ > 0 such that E(exp[η′{Xd,l,i,j − E(Xd,l,i,j)}2/Var(Xd,l,i,j)]) ≤ K′, where t = max{t1, t2} and t1 ≍ t2; (ii) , for some , and there exist constants ϵ′ > 0, K′ > 0 such that , for i = 1, … , p, j = 1, … , td. Then Sd,l satisfies the sub-Gaussian-tail condition in (C1), with η = 1/4 and K = 2, as t → ∞.
We remark that a similar result as Lemma 1 can be obtained for the partial correlation network, by using the inverse regression techniques as in Liu (2013). Xia and Li (2019) tackled network comparison assuming Xd,l is directly observable and follows a matrix normal distribution. Lemma 1 suggests that, the test we develop later is still applicable when Xd,l is available, even though it may not be as powerful as the test of Xia and Li (2019) in this case. On the other hand, the main focus of this article is to develop a test of comparing two networks even when Xd,l is not observed, but only Sd,l is. As such, our test is more general than that of Xia and Li (2019).
The third example is count network, another common network data type, where each link is a count. For instance, in brain structural connectivity analysis, the link is the number of white matter fibers between anatomical brain regions. The Poisson distribution is often imposed; i.e., Sd,l,i,j follows a Poisson distribution with mean sd,i,j, 0 < u1 < sd,i,j < u2, l = 1, … , nd, d = 1, 2, 1 ≤ i < j ≤ p. For any constant ϵ > 0, let M be the smallest integer that is no smaller than 4γ0 + 2 + ϵ, where γ0 is as defined in (C2). Then Sd,l satisfies the polynomial-tail condition (C2), with K upper bounded by , and is the number of ways to partition a set of M objects into i non-empty subsets.
We next consider some examples where the original network data , and is some heavy-tailed distribution that only differs in the mean matrix between the two groups. In such cases, testing the means of the original samples are equivalent to testing the means of the transformed data, Sd,l,i,j = f(Gd,l,i,j), where f is some one-to-one transformation function. One example is the log-normal count network. After the logarithmic transformation of Gd,l, the transformed data Sd,l follows a normal distribution, and thus both (C1) and (C2) are satisfied. This can be further extended to the transformed normal mixture network. Another example is the transformed Wishart count network, where the transformed data Sd,l follows the Wishart distribution. For this case, Sd,l satisfies the sub-Gaussian-tail condition (C1). Moreover, in this case, the testing problems (1) and (2) are closely related to the covariance matrix testing problems studied in Li and Chen (2012) and Cai et al. (2013). The key difference between our method and the existing ones is that, we only observe Sd,l, but not the original vector samples. This example can be further extended to the case of the product of Gaussian mixtures network, or the Wishart mixtures network.
3. Two-sample Test on Network Data
We begin with the construction of a test statistic for the two testing problems (1) and (2). We then develop a global testing procedure for (1), and a simultaneous testing procedure for (2). For each test, we derive its corresponding asymptotic properties.
3.1. Test statistics
We first observe that the testing problem (1) is equivalent to the test, . This motivates us to construct the test statistic based on
where . We standardize Wi,j, and estimate the variance of Sd,l,i,j by
(3) |
respectively. This leads to our test statistic,
(4) |
3.2. Global test
In brain connectivity analysis and many other applications, it is generally postulated that the differences between two network structures concentrate on a small number of brain regions. This translates to a sparse alternative in our global test. Correspondingly, we construct the global test statistic as,
Let denote the covariance matrix of vech(Sd,l), where q = p(p − 1)/2, and vech(·) is the operator that turns the upper triangular part of Sd,l into a vector. Let denote the corresponding correlation matrix. We introduce two conditions.
(A1) for some constant C0 > 0, d = 1, 2.
(A2) maxd=1,2 max1≤i<j≤q |rd,i,j| < r < 1 for some constant 0 < r < 1.
Both conditions are mild. Particularly, Condition (A1) implies that for some constant K > 0, where sj(α0) = |{i : maxd=1,2 |rd,i,j| ≥ cq}| and cq is a correlation order that depends on q, with a common choice of for some α0 > 0. In other words, it allows at most highly correlated pairs of network entries. For the high-dimensional vector-valued data, such a condition on the eigenvalues of the covariance matrix is commonly imposed (Bickel et al., 2008; Rothman et al., 2008; Yuan, 2010; Cai et al., 2014). Condition (A2) is also mild, because if max1≤i<j≤q |rd,i,j| = 1, then Γd is singular. We next obtain the limiting distribution of our test statistic Mn.
Theorem 1. Suppose that (A1)-(A2), and one of (C1) and (C2) hold. Then for any ,
Based on this limiting null distribution, we define the asymptotic α-level test as,
where qα = −log π − 2 log log(1 − α)−1.
We next study the power and the asymptotic optimality of the test Ψα. Toward that end, define the sparsity of s1 − s2 as kq = |{(i, j) : s1,i,j − s2,i,j ≠ 0, 1 ≤ i < j ≤ p}|. We also introduce a class of (s1, s2),
Theorem 2. Suppose that one of (C1) and (C2) holds. Then,
Furthermore, suppose that kq = o(qr) for some r < 1/2. Let α, β > 0 and α + β = 1. Then there exists a constant c0 > 0 such that, for all sufficiently large nd and q,
where is the set of all α-level tests, i.e., for all .
This theorem shows that the null hypothesis in (1) can be rejected by Ψα with a high probability if the pair of the network means belong to the class . In addition, with the mild sparsity condition kq = o(qr), the lower bound rate of (log q)1/2 cannot be further improved, because for a sufficiently small c0, any α-level test is unable to reject the null correctly uniformly over with probability tending to 1. Henceforth, the global test Ψα reaches the power minimax optimality asymptotically.
3.3. Simultaneous test
We next develop a multiple testing procedure for (2) based on the test statistic Ti,j in (4). Let h be the threshold level such that H0,i,j is rejected if |Ti,j| ≥ h. Let be the set of true nulls, and the set of true alternatives, where . Denote by and the total number of false positives and rejections, respectively. Then we define the false discovery proportion and false discovery rate by
An ideal choice of h would reject as many true positives as possible while controlling the FDP at the pre-specified level α. That is, we select h0 = inf {h : 0 ≤ h ≤ (2 log q)1/2, FDP(h) ≤ α}. Since R0(h) is unknown, we estimate it conservatively by 2q{1 − Φ(h)}, where Φ(h) is the standard normal cumulative distribution function. This leads to our multiple testing procedure as summarized in Algorithm 1.
We next show that this testing procedure controls the FDR and FDP asymptotically at the pre-specified level. For notation simplicity, we write and , where is obtained in Algorithm 1. Define , and . We further introduce some conditions.
Algorithm 1.
Simultaneous inference with FDR control
|
(B1) , for some constant δ > 0 and any sufficiently small constant ρ > 0.
(B2) for some constants ξ > 0 and 0 < ν < (1 − r)/(1 + r).
(B3) for some constant c1 > 0.
Condition (B1) on is mild, as it only requires a small number of s1 and s2 having standardized difference with the order of (log q)1/2+ρ for any sufficiently small constant ρ > 0. Condition (B2) is mild, as it requires that not too many Sd,l,i,j are highly correlated, but still allows the number of highly correlated pairs to grow in the order of o(q1+ν). Condition (B3) is also a natural and mild assumption, because if it does not hold, i.e., q0 = o(q), then we can simply reject all the hypotheses. As a result, we would have |R0| = q0, |R| = q, and the FDR would tend to zero. Under these conditions, we obtain the asymptotic properties of our multiple testing procedure in terms of false discovery control.
Theorem 3. Suppose that (A2), (B1)-(B3), and one of (C1) and (C2) hold with for some constants γ0, c > 0. Then,
4. Power Enhancement
In brain connectivity analysis and many other applications, the sample size nd is often small, whereas the number of nodes p can be moderate to large. This results in a limited power for the proposed test. We explore in this section an explicit power enhancement method that has potential to substantially improve the power of the simultaneous inference developed in Section 3.3. We borrow the idea of grouping, adjusting and pooling (GAP) that was first proposed in Xia et al. (2019a). However, our method differs from Xia et al. (2019a) in many ways, including a different, and actually less restrictive, assumption, a different set of primary and auxiliary statistics, and a different modification of the multiple testing procedure. We show that the modified procedure is asymptotically more powerful, meanwhile it can still control FDR and FDP asymptotically. We obtain these properties assuming the sub-Gaussian-tail condition (C1). Parallel results can be obtained under the polynomial-tail condition (C2) too, but are technically more involved. We begin by describing the intuition behind our power enhancement solution, then derive the proper auxiliary statistic for our inference problem. We then develop the modified simultaneous testing procedure, and study its asymptotic properties in terms of power improvement and false discovery control. We also compare in detail our method with the GAP method of Xia et al. (2019a).
4.1. Intuition
We recognize that there exists additional information in the data that is potentially useful to improve the simultaneous testing procedure of Algorithm 1. We first discuss our intuition, then use some simple example to illustrate where the auxiliary information is and how it can facilitate our multiple testing procedure.
In a multitude of applications including brain connectivity analysis, it is often believed that the difference between the two networks under different biological conditions is small. This means s1 − s2 is sparse. Accordingly, one can find a baseline matrix s0, such that and are individually sparse. Let denote the support of , d = 1, 2, and denote the union support. Note that the set of alternative hypotheses defined in Section 3.3 is the same as , if s1,i,j ≠ s2,i,j for every . In general, is a proper subset of . Since and are both sparse, we realize that the cardinality of is small. Moreover, the following relationship holds true:
Therefore, the knowledge about is useful to help narrow down the search in multiple testing. In other words, if one can find a way to identify possible entries (i, j) in , it would provide useful information about the set of true alternatives , or equivalently, the set of true nulls . As a consequence, it can potentially increase the power of the testing procedure.
A key observation is then, while the test statistic is built on the difference between and as defined in Section 3.1, the sum of and can provide crucial information about . Consider a toy example where the network data is binary, and Sd,l,i,j follows a Bernoulli distribution with mean sd,l,i,j, l = 1, … , nd, d = 1, 2, 1 ≤ i < j ≤ p. Assume that s1,i,j = s2,i,j = s0,i,j = 0.1 for 80% of the (i, j) pairs, s1,i,j = s2,i,j = s0,i,j = 0.9 for 10% of the (i, j) pairs, and for the rest of the (i, j) pairs, s1,i,j, s2,i,j ~ Uniform(0.1, 0.9) and s0,i,j = 0.1. In this example, for the pairs , the sum of s1,i,j and s2,i,j is either very small, which is 0.2, or very large, i.e., 1.8. Meanwhile, for the pairs , the sum is in between. Henceforth, this sum contains useful information about , and can potentially enhance the power of the multiple testing procedure.
Based on the above discussion, we can see that, the more sparsity structure information the auxiliary statistics can capture, the more information they can provide about the union support , and the more substantial power gain the test can achieve. In general, the sparser the true difference s1 − s2 is, the more information the auxiliary statistics can offer.
4.2. Auxiliary statistics
We next formally construct the auxiliary statistic that provides useful information about the union support . It is important to note that, the auxiliary statistic should be constructed so that they are asymptotically independent of the test statistic Ti,j in (4). This way the null distribution of Ti,j would not be distorted by the incorporation of the auxiliary statistic.
Recall Vd,i,j in (3) is the sample variance of Sd,l,i,j. We construct the auxiliary statistic as,
where . The next proposition shows that the test statistic Ti,j and the auxiliary statistic Ai,j are asymptotically independent under the null hypothesis. Define
Proposition 1. Suppose (C1) holds with log q = o(n1/c) for some c > 5. For any constants M > 0 and C > 0, we have
uniformly for , , and 1 ≤ i < j ≤ p, with G(h) = 2{1 − Φ(h)}. Furthermore, for all 0 ≤ k ≤ CN with an integer constant N,
uniformly for and 1 ≤ i < j ≤ p, where .
4.3. Power enhanced simultaneous test
Based on (Ti,j, Ai,j), we now modify the simultaneous testing procedure of Algorithm 1. We first describe the main idea. We next summarize the modified testing procedure in Algorithm 2. Finally, we discuss some specific choices of the key parameters of the algorithm.
Since there are totally q = p(p − 1)/2 tests to carry out simultaneously, we rearrange the pairs of {(Ti,j, Ai,j), 1 ≤ i < j ≤ p} into {(Ti, Ai), i = 1, … , q}. After obtaining all the p-values, pi = 2{1 − Φ(|Ti|)}, from Algorithm 1, our basic idea is to adjust those p-values by , with wi being the adjusting weights, i = 1, … , q. We utilize the auxiliary statistics Ai to help compute the adjusting weights wi, by groups. Specifically, we consider a set of grid values, , where C1, C2 and N are some pre-specified constants. We divide the index set {1, … , q} into K groups according to the auxiliary statistics (A1, … , Aq). As an example, we take K = 3. That is, we choose two grid points in , and obtain K = 3 groups of indices, , , and . For each group , we compute its cardinality, . We also estimate the proportion, πk, of alternatives in , k = 1, … , K. To do so, we employ the method of Schweder and Spjøtvoll (1982) and Storey (2002) to obtain an estimate first, then stablize it by , where ϵ is a small positive number; we set ϵ = 10−5. Then for all the indices in , we compute the group-wise adjusting weight:
(5) |
This idea of adjusting the weights wi by groups is motivated by our intuition in Section 4.1. After obtaining the weights, we adjust the p-values and apply the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995, BH) to the adjusted p-values . Finally, we search all possible choices of among , and find the one that yields the largest number of rejections. We apply BH again to the adjusted p-values under this choice of to obtain the final adjusted rejection region. We summarize this modified simultaneous testing procedure in Algorithm 2.
We discuss some specific choices of the parameters in Algorithm 2. First, the number of groups K is usually set at K = 3. As shown in Xia et al. (2019a), when K ≤ 4, there is little additional power gain, but a more expensive computation. Second, the constants C1 and C2 can be chosen so that is equal to the smallest value of the auxiliary statistics and is equal to the largest value of the auxiliary statistics. If the absolute values of the smallest and largest auxiliary statistics exceed , we truncate at C1 = −16 and C2 = 16 to stabilize and expedite the computation. We note here that, if the network data are non-negative, such as the binary and poisson network data, then both C1 and C2 are non-negative. By contrast, in Xia et al. (2019a), C1 and C2 were fixed at −4 and 4. Finally, N can be any integer for the theoretical validity. Numerically, a larger value of N implies a more precise grid search, but at the cost of a heavier computational burden. We choose N such that the gap between two adjacent grid points, (log p)1/2/N, equals 0.1 approximately.
Algorithm 2.
Adjusted simultaneous inference with FDR control and power enhancement.
|
4.4. FDR control and power enhancement
We next show that the modified inference of Algorithm 2 is asymptotically more powerful than Algorithm 1, meanwhile it can still asymptotically control the false discovery.
Denote the adjusted p-values from Algorithm 2, and the ordered adjusted p-values. The corresponding adjusted FDP is:
where is the cutoff obtained from Step 3.4 of Algorithm 2, and I(·) is the indicator function. Accordingly, FDRadj = E(FDPadj). The next theorem shows that the modified procedure can still control FDR and FDP asymptotically.
Theorem 4. Suppose (A2), (B1)-(B3), and (C1) hold with for constants γ0, c > 0. Then,
Next, denote the power of the testing procedures of Algorithms 1 and 2 by Ψ and Ψadj, respectively. That is,
Then the next theorem shows that, by incorporating the auxiliary statistics Ai,j, the modified simultaneous testing procedure of Algorithm 2 is asymptotically more powerful than Algorithm 1, which is solely based on the test statistics Ti,j.
Theorem 5. Suppose the same conditions in Theorem 4 hold. Then,
4.5. Comparison to GAP
Although motivated by the GAP method of Xia et al. (2019a), our power enhancement procedure is also considerably different. While GAP tackled the problem of mean comparison of vector-valued samples, we target the problem of network mean comparison. This leads to a different set of test and auxiliary statistics, but a number of additional intrinsic differences as well.
First, the two methods impose different assumptions. A key requirement for GAP to enhance the power is that the parameters of interest from each group are individually sparse. In our setup, however, the parameters may all be non-negative. For instance, in a binary network or a count network, all the entries of both means s1 and s2 are usually non-negative. As such, the means may not be individually sparse. Our procedure instead only requires the difference of the two means s1 − s2 is sparse, which reasonably holds and is often imposed in numerous applications including brain connectivity analysis (Zhu and Li, 2018).
Second, the two methods differ in terms of the range of the auxiliary statistics that contribute most to the power enhancement. Consider the case when K = 3. In Xia et al. (2019a), since both means are assumed to be individually sparse, the tests that are more likely to be adjusted and rejected are those with the corresponding auxiliary statistics either being negative and small, or positive and large. That is, the power enhancement hinges more on those tests in and with small or large auxiliary statistics. However, in our setup, the individual means s1 and s2 can both be dense and their entries are all positive. Instead we only assume that s1 − s2 is sparse. Take a binary brain connectivity network as an example. The observed networks are often sparse, in that most links are zero, since it is known that brain connections are energy consuming and biological units tend to minimize energy-consuming activities (Raichle and Gusnard, 2002; Bullmore and Sporns, 2009). This translates to small connection probabilities for most entries of s1 and s2, while all these probabilities are positive. Moreover, the difference of the means between the two populations is often sparse, which translates to equal connection probabilities for most entries of s1 and s2, or equivalently zero difference for most entries of of s1 − s2. This is similar to the toy example we discuss in Section 4.1. For such cases, as a consequence of Algorithm 2, the tests whose corresponding auxiliary statistics are too small or too large would be adjusted so that they are less likely to be rejected. Instead, those tests whose auxiliary statistics are in between would be adjusted so that they are more likely to be rejected. In other words, the power enhancement in our setup may hinge more on , rather than and .
Third, due to the above difference, the grid construction in Step 1.4 of Algorithm 2 is noticeably different from that of GAP in Xia et al. (2019a). Specifically, in Xia et al. (2019a), to ensure the inclusion of important locations in and , the constants C1 and C2 can be simply fixed at −4 and 4, respectively, so that the upper bound of those small negative auxiliary statistics and lower bound of those large positive auxiliary statistics can be attained in the grid . By contrast, for our problem, the upper bound of the auxiliary statistics in the union support can go beyond the bound in Xia et al. (2019a), i.e., , and the lower bound of the auxiliary statistics in the union support can be non-negative. Since the tests in are more likely to be adjusted and rejected, we need to do a more thorough grid construction and choose the constants C1 and C2 based on the smallest and largest values of the auxiliary statistics as described in Section 4.3.
5. Simulations
We first present the simulation setup, where we consider different network structures, sparsity levels, network sizes and sample sizes. We then investigate the empirical performance of the global test, and compare the two simultaneous tests, Algorithms 1 and 2.
5.1. Setup
We consider p × p networks, with two network sizes, p = 100 and 200. This results in q = 100(100 − 1)/2 = 4590 and q = 200(200 − 1)/2 = 19900 links, respectively. We consider five common network structures, the Bernoulli, Bernoulli mixture, and transformed Wishart distributions, and for the Bernoulli case, the binary links are generated from a power-law distribution, a stochastic block model, and an Erdös-Rényi model. For each network structure, we further consider three sparsity levels.
- + Bernoulli: Select the sets and from q hypotheses according to the following models generated by the R package igraph, with , d = 1, 2. Here kq is a parameter that controls the sparsity level, and is specified later.
- − Power-law distribution: with p nodes, kq/2 edges, the power law exponent of the degree distribution is set to 2.1, and all other parameters are set to the default values.
- − Stochastic block model: with 2 blocks and the diagonal Bernoulli rates matrix, where the diagonal values are set to kq/(2q).
- − Erdös-Rényi model: with p nodes and kq/2 edges.
- Let . For , generate Sd,l,i,j ~ Bernoulli(1, 0.3). For , generate Sd,l,i,j ~ Bernoulli(1, rd,i,j), where r1,i,j is set to 0.5 with probability 0.1, and 0.8 otherwise, whereas r2,i,j is set to 0.8 with probability 0.1, and 0.5 otherwise.
+ Bernoulli mixture: Generate in the same way as before. Generate Sd,l,i,j ~ Bernoulli(1, rd,i,j), where rd,i,j = πi,j * rd,1,i,j + (1 − πi,j) * rd,2,i,j, with πi,j ~ Uniform(0, 1). For , rd,1,i,j = rd,2,i,j = 0.3, d = 1, 2. For , r1,1,i,j is set to 0.5 with probability 0.1, and 0.7 otherwise, whereas r2,1,i,j is set to 0.7 with probability 0.1, and 0.5 otherwise, and rd,2,i,j = rd,1,i,j + 0.2.
+ Wishart with logarithm transformation: Select the sets and from q hypotheses, uniformly and randomly, with , and , d = 1, 2. Let . Generate Σd such that if and otherwise. Let and , where I is the identify matrix. Generate , with m = 300, and , where round(·) rounds a number to the nearest integer.
For each network structure, the parameter kq controls the sparsity level, and we examine three levels, kq = 0.2q, 0.15q and 0.1q, where q is the total number of the network links.
5.2. Results
First, we investigate the empirical size of the proposed global test Ψα for the global testing problem (1). For this problem, the population network means are equal to each other under the null hypothesis, and we set , and set the sample size n1 = n2 = 500. We also compare with the global testing method aSPU developed by Kim et al. (2014), implemented in the R package aSPU. Table 1 reports the empirical size of the two tests, in percentage, based on 1000 data replications under the significance level α = 5%. It is seen clearly from the table that our proposed global testing procedure controls the type I error reasonably well, while the aSPU method has a slight size inflation in some cases though not severe. In addition, we report the computation time of each method, in seconds, averaged over three sparsity levels and all replications. It is seen from the last two columns of the table that, the average computation time of aSPU is much longer than our method; e.g., for p = 100 and p = 200, it is about 9 times and 15 times of our method, respectively. This is because Kim et al. (2014) did not derive the theoretical null distribution of their test statistics, but instead employed the permutations to obtain the critical value, which results in a more time consuming procedure. We also note that, in this setting, the sample size is much smaller than the total number of hypotheses q, but is larger than the sample size we use in the multiple testing simulations. This is due to the relatively slow convergence rate of the Bernoulli normal approximation and the maximum type statistics.
Table 1:
The empirical size and computation time for our global test Ψα and the aSPU test of Kim et al. (2014). The empirical size is in percentage based on 1000 data replications. The computation time is in seconds averaged over three sparsity levels and all replications. The significance level is α = 5%, and the sample size is n1 = n2 = 500.
Method | p = 100 | p = 200 | Computation time | ||||||
---|---|---|---|---|---|---|---|---|---|
0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q | p = 100 | p = 200 | ||
Ψα | 4.1 | 5.1 | 5.6 | 5.0 | 3.6 | 4.9 | 0.52 | 1.99 | |
Power-law | |||||||||
aSPU | 5.8 | 6.7 | 6.0 | 5.8 | 5.6 | 5.7 | 4.86 | 32.6 | |
Ψα | 4.3 | 5.4 | 5.2 | 5.9 | 4.4 | 5.1 | 0.56 | 2.00 | |
Stochastic Block | |||||||||
aSPU | 6.2 | 6.1 | 5.5 | 6.2 | 7.0 | 5.4 | 4.66 | 32.8 | |
Ψα | 5.3 | 5.4 | 4.1 | 4.8 | 5.4 | 5.4 | 0.52 | 2.01 | |
Erdös-Rényi | |||||||||
aSPU | 5.9 | 7.5 | 5.2 | 6.1 | 5.3 | 6.4 | 4.74 | 32.3 | |
Ψα | 6.2 | 5.1 | 4.9 | 4.8 | 3.4 | 5.5 | 0.51 | 2.06 | |
Bernoulli mixture | |||||||||
aSPU | 6.8 | 6.3 | 5.4 | 7.0 | 5.4 | 6.0 | 4.77 | 32.9 | |
Ψα | 4.9 | 4.5 | 5.7 | 4.2 | 4.8 | 4.5 | 0.54 | 2.23 | |
Transformed Wishart | |||||||||
aSPU | 6.5 | 6.9 | 6.3 | 6.5 | 6.8 | 5.7 | 4.42 | 28.1 |
Next, we examine the empirical FDR and the empirical power of the simultaneous testing procedure for the multiple testing problem (2). We consider two sample sizes, n1 = n2 = 100 and n1 = n2 = 25, and the latter mimics the real data setting where the sample size is very limited. We apply both Algorithms 1 and 2, one with the proposed power enhancement, and one without. Table 2 reports the empirical FDR and power, both in percentage, based on 100 replications under the significance level α = 5% for the Bernoulli network structure. Table 3 reports the results for the Bernoulli mixture and Wishart with logarithm transformation. It is seen that, in all cases, the empirical FDRs are generally controlled under the nominal level by both algorithms. Algorithm 2 is slightly more conservative than Algorithm 1, which is mainly due to the normalization step of the weight calculation as shown in (5). A similar phenomenon has also been observed in Xia et al. (2019a). For the empirical power, it is seen that Algorithm 2 achieves a clear power improvement over Algorithm 1, without sacrificing the size of the test. This is mainly due to the utilization of the auxiliary information in Algorithm 2. Furthermore, the performance under the varying sample size confirms the power enhancement of Algorithm 2 as theoretically revealed in Section 4.4. We also observe that, the power gain becomes more substantial when the true difference s1 − s2 becomes more sparse, which agrees with our intuition explained in Section 4.1.
Table 2:
The empirical FDR and the empirical power for the simultaneous testing procedures, Algorithms 1 and 2. The results are in percentages based on 100 data replications. The significance level is α = 5%. The network structure is Bernoulli.
p = 100 | p = 200 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n1 = n2 = 100 | n1 = n2 = 25 | n1 = n2 = 100 | n1 = n2 = 25 | |||||||||
Network structure | 0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q |
Bernoulli, power-law | Empirical FDR | |||||||||||
Algorithm 1 | 4.1 | 4.5 | 4.7 | 6.2 | 6.3 | 7.2 | 4.2 | 4.4 | 4.9 | 5.8 | 6.5 | 7.5 |
Algorithm 2 | 2.6 | 2.6 | 2.9 | 3.5 | 4.6 | 5.3 | 2.2 | 2.3 | 2.5 | 3.5 | 4.9 | 5.1 |
Empirical power | ||||||||||||
Algorithm 1 | 88.7 | 87.0 | 84.7 | 42.2 | 40.8 | 39.7 | 88.7 | 87.0 | 84.9 | 41.6 | 40.5 | 39.9 |
Algorithm 2 | 92.1 | 91.7 | 90.9 | 54.8 | 54.1 | 53.4 | 92.3 | 91.8 | 91.0 | 54.2 | 53.9 | 53.2 |
Bernoulli, stochastic block | Empirical FDR | |||||||||||
Algorithm 1 | 4.3 | 4.4 | 4.8 | 6.1 | 6.4 | 7.7 | 4.2 | 4.5 | 4.9 | 5.8 | 6.3 | 7.6 |
Algorithm 2 | 2.8 | 2.7 | 3.0 | 3.5 | 4.5 | 5.5 | 2.2 | 2.3 | 2.5 | 3.5 | 5.0 | 5.0 |
Empirical power | ||||||||||||
Algorithm 1 | 89.0 | 87.1 | 84.8 | 41.5 | 40.4 | 40.0 | 89.0 | 87.0 | 84.9 | 41.4 | 40.4 | 40.1 |
Algorithm 2 | 92.2 | 91.7 | 90.8 | 54.5 | 54.5 | 54.0 | 92.5 | 91.9 | 90.9 | 54.2 | 53.9 | 53.4 |
Bernoulli, Erdös-Rényi | Empirical FDR | |||||||||||
Algorithm 1 | 4.1 | 4.4 | 4.8 | 6.0 | 6.0 | 7.3 | 4.0 | 4.4 | 4.8 | 5.9 | 5.2 | 7.3 |
Algorithm 2 | 2.3 | 2.6 | 2.9 | 3.9 | 4.1 | 5.7 | 2.1 | 2.2 | 2.4 | 3.7 | 4.5 | 5.1 |
Empirical power | ||||||||||||
Algorithm 1 | 88.0 | 86.9 | 84.7 | 44.1 | 41.8 | 40.6 | 88.1 | 86.8 | 84.5 | 44.4 | 42.0 | 40.8 |
Algorithm 2 | 91.8 | 91.3 | 90.6 | 54.7 | 54.4 | 53.3 | 91.9 | 91.4 | 90.5 | 54.6 | 54.1 | 53.6 |
Table 3:
The empirical FDR and the empirical power for the simultaneous testing procedures, Algorithms 1 and 2. The results are in percentages based on 100 data replications. The significance level is α = 5%. The network structure is Bernoulli mixture and transformed Wishart.
p = 100 | p = 200 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n1 = n2 = 100 | n1 = n2 = 25 | n1 = n2 = 100 | n1 = n2 = 25 | |||||||||
Network structure | 0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q | 0.2q | 0.15q | 0.1q | 0.2q | 0.l5q | 0.1q |
Bernoulli mixture | Empirical FDR | |||||||||||
Algorithm 1 | 4.0 | 4.4 | 4.8 | 6.1 | 6.0 | 7.4 | 4.0 | 4.4 | 4.8 | 6.0 | 5.8 | 7.5 |
Algorithm 2 | 1.4 | 1.7 | 2.0 | 3.2 | 4.2 | 5.0 | 1.3 | 1.5 | 1.7 | 2.9 | 4.3 | 4.5 |
Empirical power | ||||||||||||
Algorithm 1 | 88.3 | 87.2 | 85.6 | 41.8 | 40.8 | 41.1 | 88.2 | 87.1 | 85.7 | 41.6 | 40.7 | 41.2 |
Algorithm 2 | 93.8 | 93.6 | 93.6 | 54.2 | 54.3 | 54.1 | 93.5 | 93.6 | 93.5 | 53.9 | 54.2 | 54.1 |
Transformed Wishart | Empirical FDR | |||||||||||
Algorithm 1 | 4.2 | 4.6 | 5.1 | 5.0 | 5.6 | 6.6 | 4.2 | 4.6 | 4.9 | 4.9 | 5.3 | 5.9 |
Algorithm 2 | 1.6 | 1.8 | 2.0 | 2.4 | 2.8 | 3.7 | 1.6 | 1.9 | 2.0 | 1.8 | 2.1 | 2.6 |
Empirical power | ||||||||||||
Algorithm 1 | 63.5 | 65.9 | 69.6 | 44.1 | 46.8 | 50.6 | 52.6 | 55.7 | 60.4 | 37.5 | 40.2 | 43.1 |
Algorithm 2 | 70.9 | 73.8 | 78.4 | 50.3 | 54.5 | 59.9 | 59.8 | 63.9 | 69.9 | 41.4 | 45.2 | 50.0 |
6. Brain Connectivity Analysis
We illustrate our method with two brain connectivity analysis examples.
6.1. Structural connectivity analysis
The first example is a brain structural connectivity analysis of diffusion tensor images (DTI). DTI is a magnetic resonance imaging technique that measures the diffusion of water molecules to map white matter tractography in the brain. The data we analyze is the KKI-42 dataset, available at http://openconnecto.me/data/public/MR/archive/, and its detailed description can be found in Landman et al. (2011). This data consists of 21 subjects with no history of neurological conditions, aging from 22 to 61 years old. Each subject received two resting-state DTI under a scan-rescan imaging session. For simplicity, we treat the data as if those images were from independent samples, which is common for the analysis of this dataset (Wang et al., 2017). It results in a total sample size of 42 for this study. Brain regions are constructed following the Desikan Atlas (Desikan et al., 2006), leading to p = 68 regions equally divided in the left and right hemispheres. Each DTI image has been preprocessed and summarized in the form of a 68 × 68 network, where the edges record the total number of white matter fibers between the pair of nodes. It is also equally common to focus on the form of a binary network, where the edges become the binary indicators of presence or absence of white matter fibers (Wang et al., 2017). We partition the subjects into two age groups, the ones whose are younger than 30 years, and the ones who are 30 or older. Age 30 is a transition period, usually known as the “age 30 transition”, when the first phase of early adulthood comes to a close, and the basis for the next life structure is formed. Moreover, this partition yields about the same number of subjects for each group, with n1 = 22 for the younger-than-30 age group, and n2 = 20 for the older-than-30 age group. We study the age-related difference in structural connectivity patterns, which is of universal interest, as aging is the main risk factor for progressive loss of both structures and functions of brain neurons (Morrison and Hof, 1997).
We apply both multiple testing procedures Algorithms 1 and 2 to this dataset, first the binary network, then the count network with a logarithm transformation. We set the significance level at 0.05. For the binary network, out of the total of 2278 links, Algorithm 1 identifies 2 significantly different links, whereas the power-enhanced Algorithm 2 identifies 8 links, including the first link found by Algorithm 1 plus 7 additional links. For the count network, Algorithm 1 identifies 4 significantly different links, whereas Algorithm 2 identifies 15 links, including all the links found by Algorithm 1 plus 11 additional links. These results agree with both our theory and simulations, in that Algorithm 2 is usually able to recognize more significant links than Algorithm 1. Table 4 reports the identified links by the two algorithms for both types of network data. Some links found by our power-enhanced procedure agree with the neuroscience literature; for instance, the link between left fusiform and left temporal pole under the count network. The temporal pole, also known as Brodmanna area 38, is a paralimbic region involved in high-level semantic memories and socio-emotional processing. The fusiform gyrus is part of the temporal lobe and occipital lobe in Brodmann area 37, and is linked with various neural pathways related to recognition. Li et al. (2013) also found significant difference in structural connectivity patterns between the left fusiform and the left temporal pole, for the young subjects (18 to 23 years old) versus the middle-aged and old subjects (30 to 58, and 61 to 89 years old). Meanwhile, other links found by our procedure require further scientific validation; for instance the links between left temporal pole and orbitofrontal cortex. The latter is a prefrontal cortex region in the frontal lobe of the brain involved in the cognitive process of decision-making.
Table 4:
Structural connectivity analysis of the KKI-42 dataset. Reported are the significantly different links found by Algorithms 1 and 2 for the binary and count network data, respectively.
Binary network | Count network | ||
---|---|---|---|
Algorithm 1 | Algorithm 2 | Algorithm 1 | Algorithm 2 |
r.posteriorcingulate ↔ l.superiorparietal | r.posteriorcingulate ↔ l.superiorparietal | r.corpuscallosum ↔ l.superiorparietal | r.corpuscallosum ↔ l.superiorparietal |
r.posteriorcingulate ↔ l.supramarginal | r.precuneus ↔ l.postcentral | l.isthmuscingulate ↔ l.posteriorcingulate | l.isthmuscingulate ↔ l.posteriorcingulate |
r.caudalanteriorcingulate ↔ r.lingual | r.caudalmiddlefrontal ↔ r.rostralmiddlefrontal | r.caudalmiddlefrontal ↔ r.rostralmiddlefrontal | |
- | r.posteriorcingulate ↔ l.caudalmiddlefrontal | l.lateralorbitofrontal ↔ l.superiorfrontal | l.lateralorbitofrontal ↔ l.superiorfrontal |
- | 1. lateral occipital ↔ l.parsopercularis | - | r.posteriorcingulate ↔ l.precuneus |
r.superiorparietal ↔ l.precentral | - | r.caudalmiddlefrontal ↔ r.parstriangularis | |
- | r.paracentral ↔- l.superiorparietal | - | 1.fusiform ↔ l.temporalpole |
- | l.bankssts ↔ l.frontalpole | - | l.entorhinal ↔ l.lateralorbitofrontal |
- | r.corpuscallosum ↔ l.precuneus | ||
- | l.caudalmiddlefrontal ↔ l.pericalcarine | ||
- | r.bankssts ↔ r.postcentral | ||
- | l.lateralorbitofrontal ↔ l.temporalpole | ||
- | l.parsopercularis ↔ l.rostralmiddlefrontal | ||
- | l.medialorbitofrontal ↔ l.temporalpole | ||
- | l.corpuscallosum ↔ l.superiorparietal |
6.2. Functional connectivity analysis
The second example is a brain functional connectivity analysis of functional magnetic resonance images (fMRI). Functional MRI measures the blood oxygen level signals, and provides a tool to study brain functional connectivity network. The data we analyze is the ADHD-200 dataset, and is available at http://neurobureau.projects.nitrc.org/ADHD200/Data.html. More detailed description can be found in Ahn et al. (2015). Attention deficit hyperactivity disorder (ADHD) is one of the most commonly diagnosed child-onset neurodevelopmental disorders, and has an estimated childhood prevalence of 5−10% worldwide (Pelham et al., 2007). This data consists of 96 subjects with ADHD, and 91 normal controls. Each subject received a resting-state fMRI scan, and each brain image is parcellated using the Anatomical Automatic Labeling (AAL) Atlas with p = 116 regions (Tzourio-Mazoyer et al., 2002). The resulting data is a spatial by temporal matrix, which is then turned into a Pearson correlation matrix or a partial correlation matrix to represent the brain functional connectivity network. Actually, both correlation measures are frequently used in functional connectivity analysis (Bullmore and Sporns, 2009). We use both measures to study the difference in functional connectivity patterns between the two groups of subjects with and without ADHD.
We again apply both multiple testing procedures Algorithms 1 and 2 to this dataset, first the Pearson correlation network, then the partial correlation network. For the Pearson correlation network, Algorithm 1 identifies no significantly different links, whereas the power-enhanced Algorithm 2 identifies 3 links. For the partial correlation network, Algorithm 1 again identifies no significantly different links, whereas the power-enhanced Algorithm 2 identifies 5 links. Table 5 reports the identified links by the two algorithms for both types of network data. One brain region that differentiating links concentrate is cerebellum. The cerebellum is responsible for motor control and cognitive functions such as attention and language, and dysfunction in the cerebellum in ADHD patients have been reported (Toplak et al., 2006). We also remark that, there are fewer links found here compared to those found in Xia and Li (2019). This is because the data in the format of spatial temporal matrix analyzed in Xia and Li (2019) carries more information than the data in the format of correlation matrix. Nevertheless, the focus of this article is to develop inferential tests for the scientific applications where only the data format of some symmetric network matrix is available.
Table 5:
Functional connectivity analysis of the ADHD-200 dataset. Reported are the significantly different links found by Algorithms 1 and 2 for the Pearson correlation and partial correlation network data, respectively.
Pearson correlation network | Partial correlation network | ||
---|---|---|---|
Algorithm 1 | Algorithm 2 | Algorithm 1 | Algorithm 2 |
- | r.frontal.sup ↔ r.frontal.med.orb | - | l.paracentral.lobule ↔ r.paracentral.lobule |
- | r.cerebelum6 ↔ r.cerebelum.8 | - | r.frontal.sup.orb ↔ r.frontal.mid.orb |
- | I.cerebelum8 ↔ vermis7 | - | r.frontal.inf.orb ↔ 1.temporal.pole.sup |
- | r.fusiform ↔ r.cerebelum6 | ||
- | l.frontal.sup ↔ l.frontal.mid |
7. Conclusion
In this article, we develop both global and simultaneous inference methods for network comparisons when the data are observed in the form of p × p matrices, each of which encodes the network structure for an individual subject. This data format is different from those studied in the existing network literature, and leads to a different set of testing procedures and the associated theory. In addition, we propose a power enhancement approach to tackle the challenge of limited sample size in numerous applications.
We have primarily focused on the scenario of using a symmetric matrix to encode a network structure in this article. In principle, our methods can be extended to the asymmetric matrix scenario as well, with corresponding modifications of the total number of tests and the related theoretical properties. In the interest of space, we leave it as our future research.
Supplementary Material
Acknowledgement
Xia’s research was partially supported by NSFC grants 11771094, 11690013 and The Recruitment Program of Global Experts Youth Project. Li’s research was partially supported by NSF grant DMS-1613137 and NIH grants R01AG061303, R01AG062542 and R01AG034570.
Footnotes
Supplementary Material
The additional lemmas and theorem proofs are available in the online supplementary material.
References
- Ahn M, Shen H, Lin W, and Zhu H (2015). A sparse reduced rank framework for group analysis of functional neuroimaging data. Statistica Sinica, 25:295–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57:289–300. [Google Scholar]
- Bickel PJ, Levina E, et al. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36:199–227. [Google Scholar]
- Bullmore E and Sporns O (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews. Neuroscience, 10:186–198. [DOI] [PubMed] [Google Scholar]
- Cai TT, Liu W, and Xia Y (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108:265–277. [Google Scholar]
- Cai TT, Liu W, and Xia Y (2014). Two-sample test of high dimensional means under dependency. Journal of the Royal Statistical Society, Series B, 76:349–372. [Google Scholar]
- Chen S, Kang J, Xing Y, and Wang G (2015). A parsimonious statistical method to detect groupwise differentially expressed functional connectivity networks. Human Brain Mapping, 36:5196–5206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, et al. (2006). An automated labeling system for subdividing the human cerebral cortex on mri scans into gyral based regions of interest. Neuroimage, 31:968–980. [DOI] [PubMed] [Google Scholar]
- Durante D and Dunson DB (2018). Bayesian inference and testing of group differences in brain networks. Bayesian Analysis, 13:29–58. [Google Scholar]
- Fornito A, Zalesky A, and Breakspear M (2013). Graph analysis of the human connectome: Promise, progress, and pitfalls. NeuroImage, 80:426–444. [DOI] [PubMed] [Google Scholar]
- Fox MD and Greicius M (2010). Clinical applications of resting state functional connectivity. Frontiers in Systems Neuroscience, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ginestet CE, Li J, Balachandran P, Rosenberg S, Kolaczyk ED, et al. (2017). Hypothesis testing for network data in functional neuroimaging. The Annals of Applied Statistics, 11(2):725–750. [Google Scholar]
- Kim J, Wozniak JR, Mueller BA, Shen X, and Pan W (2014). Comparison of statistical tests for group differences in brain functional networks. NeuroImage, 101:681–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lan W, Fang Z, Wang H, and Tsai C-L (2018). Covariance matrix estimation via network structure. J. Bus. Econom. Statist, 36(2):359–369. [Google Scholar]
- Landman BA, Huang AJ, Gifford A, Vikram DS, Lim IAL, Farrell JA, Bogovic JA, Hua J, Chen M, Jarso S, et al. (2011). Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage, 54:2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J and Chen SX (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40:908–940. [Google Scholar]
- Li X, Pu F, Fan Y, Niu H, Li S, and Li D (2013). Age-related changes in brain structural covariance networks. Frontiers in Human Neuroscience, 7:98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu W (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics, 41:2948–2978. [Google Scholar]
- Luscombe NM, Madan Babu M, Yu H, Snyder M, Teichmann SA, and Gerstein M (2004). Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 431:308–312. [DOI] [PubMed] [Google Scholar]
- Morrison JH and Hof PR (1997). Life and death of neurons in the aging brain. Science, 278:412–419. [DOI] [PubMed] [Google Scholar]
- Pelham WE, Foster EM, and Robb JA (2007). The economic impact of attention-deficit/hyperactivity disorder in children and adolescents. Ambulatory Pediatrics, 7(1, Supplement): 121–131. Measuring Outcomes in Attention Deficit Hyperactivity Disorder. [DOI] [PubMed] [Google Scholar]
- Qiu H, Han F, Liu H, and Caffo B (2016). Joint estimation of multiple graphical models from high dimensional time series. Journal of Royal Statistical Society, Series B., 78:487–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raichle ME and Gusnard DA (2002). Appraising the brain’s energy budget. Proceedings of the National Academy of Sciences, 99:10237–10239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rothman AJ, Bickel PJ, Levina E, and Zhu J (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515. [Google Scholar]
- Schott JR (2007). Some high-dimensional tests for a one-way MANOVA. Journal of Multivariate Analysis, 98:1825–1839. [Google Scholar]
- Schweder T and Spjøtvoll E (1982). Plots of p-values to evaluate many tests simultaneously. Biometrika, 69:493–502. [Google Scholar]
- Storey JD (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64:479–498. [Google Scholar]
- Toplak ME, Dockstader C, and Tannock R (2006). Temporal information processing in adhd: Findings to date and new methods. Journal of Neuroscience Methods, 151(1):15–29. Towards a Neuroscience of Attention-Deficit/Hyperactivity Disorder (ADHD). [DOI] [PubMed] [Google Scholar]
- Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, and Joliot M (2002). Automated anatomical labeling of activations in {SPM} using a macroscopic anatomical parcellation of the {MNI} {MRI} single-subject brain. NeuroImage, 15(1):273–289. [DOI] [PubMed] [Google Scholar]
- Van de Geer S, Bühlmann P, Ritov Y, Dezeure R, et al. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42:1166–1202. [Google Scholar]
- Wang L, Zhang Z, and Dunson D (2017). Common and individual structure of multiple networks. arXiv preprint arXiv:1707.06360. [Google Scholar]
- Wang Y, Kang J, Kemmer PB, and Guo Y (2016). An efficient and reliable statistical method for estimating functional connectivity in large scale brain networks using partial correlation. Frontiers in Neuroscience, 10:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia Y, Cai T, and Cai TT (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102:247–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia Y, Cai TT, and Sun W (2019a). GAP: a general framework for information pooling in two-sample sparse inference. Journal of the American Statistical Association, to appear. [Google Scholar]
- Xia Y and Li L (2019). Matrix graph hypothesis testing and application in brain connectivity alternation detection. Statistica Sinica, 29:303–328. [Google Scholar]
- Xia Y, Li L, Lockhart SN, and Jagust WJ (2019b). Simultaneous covariance inference for multimodal integrative analysis. Journal of the American Statistical Association, accepted:1–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan M (2010). High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research, 11:2261–2286. [Google Scholar]
- Zhu Y and Li L (2018). Multiple matrix gaussian graphs estimation. Journal of the Royal Statistical Society, Series B, 80:927–950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou T, Lan W, Wang H, and Tsai C-L (2017). Covariance regression analysis. J. Amer. Statist. Assoc, 112(517):266–281. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.