Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 23.
Published in final edited form as: Biometrika. 2020 Jul 8;108(1):1–16. doi: 10.1093/biomet/asaa059

The asymptotic distribution of modularity in weighted signed networks

Rong Ma 1, Ian Barnett 1
PMCID: PMC8300091  NIHMSID: NIHMS1637765  PMID: 34305154

Summary

Modularity is a popular metric for quantifying the degree of community structure within a network. The distribution of the largest eigenvalue of a network’s edge weight or adjacency matrix is well studied and is frequently used as a substitute for modularity when performing statistical inference. However, we show that the largest eigenvalue and modularity are asymptotically uncorrelated, which suggests the need for inference directly on modularity itself when the network size is large. To this end, we derive the asymptotic distributions of modularity in the case where the network’s edge weight matrix belongs to the Gaussian orthogonal ensemble, and study the statistical power of the corresponding test for community structure under some alternative models. We empirically explore universality extensions of the limiting distribution and demonstrate the accuracy of these asymptotic distributions through Type I error simulations. We also compare the empirical powers of the modularity based tests with some existing methods. Our method is then used to test for the presence of community structure in two real data applications.

Keywords: Asymptotic distribution, Community detection, Modularity, Network data analysis

1. Introduction

Many scientific and social systems are composed of large numbers of interacting elements. These systems can be conceptualized as networks where nodes represent elements in the system and network edges represent interactions between elements. Networks appear consistently across scientific domains, ranging from protein interaction networks within a living cell to social networks of people communicating within society (Wasserman & Faust, 1994; Boccaletti et al., 2006). Networks frequently divide into communities, or groups of nodes that cluster together. Detecting these network communities is a well-studied problem, with the most popular methods revolving around the maximization of a function known as modularity over all possible partitions of the network into communities (Newman & Girvan, 2004; Newman, 2006a,b; Good et al., 2010; Chen et al., 2014). For most moderately large networks, enumerating all possible divisions into communities to find the maximum is not feasible. Many methods have been developed that aim to find optimal or near-optimal solutions with low computational complexity (Agarwal & Kempe, 2008; Lancichinetti & Fortunato, 2009).

One of the most well-known approaches for identifying network community structure is the spectral approach proposed by Newman (2006a,b). If we consider an undirected random graph G(E, V) where |V | = n, whose signed edge weight matrix is WG, we define its modularity using the Newman–Girvan definition as:

Q(G)=sgn(u1)WGsgn(u1) (1)

where u1n is the eigenvector corresponding to the largest eigenvalue, λ1(WG), of WG and sgn(u1) ∈ {0,±1}n is the vector of signs of u1. Since by definition Q(G) only depends on the weight matrix WG, throughout the paper we will not distinguish Q(G) and Q(WG). In this setting, a common choice of null model considers W to be a Wigner matrix. In addition, the treatment of networks with signed weights is distinct from networks with positive weights with respect to modularity. For example, Traag & Bruggeman (2009) considered community detection in complex networks with positive and negative links using a generalized Potts model.

Although modularity is frequently used, interpreting modularity tends to be a subjective exercise, most frequently done without the aid of statistical inference. In cases where inference is performed, simulation from some assumed null distribution is required (Dwyer et al., 2014; Rizkallah et al., 2016; Telesford et al., 2016; Lichoti et al., 2016; Springer et al., 2017; Zhang & Chen, 2016). For very large networks, simulation of the null distribution is not computationally feasible. Because of this, some have looked towards analytical and asymptotic inference solutions based on the spectral decomposition:

Q(G)=i=1nλi(WG){sgn(u1)ui}2, (2)

where uin is the eigenvector corresponding to the ith largest eigenvalue λi(WG). Because λ1 has a disproportionately large role in Q(G) and because λ1(WG) frequently can be well-modelled by a Tracy-Widom distribution (Tracy & Widom, 1994) for general Wigner matrices (Tao & Vu, 2011), there are some methods that use λ1 as a proxy for Q(G) when performing inference (Bickel & Sarkar, 2016; Lei, 2016). While approximating modularity with λ1 is a tempting alternative, as we will show the smaller terms in equation (2) play a nontrivial role in the null distribution of modularity and should not be ignored.

In this paper, we derive the asymptotic distribution of modularity defined in (1) as n → ∞ under the Gaussian orthogonal ensemble random matrices. Weighted networks with signed edges such as correlation networks can be well-modelled by Gaussian orthogonal ensemble random matrices under a variety of null models. Correlation networks frequently appear in many contexts such stock market price networks (Chi et al., 2010), brain activity networks based on functional magnetic resonance imaging (Bullmore & Sporns, 2009), and gene expression networks (Langfelder & Horvath, 2008), to name a few. We demonstrate the convergence rate and accuracy of this distribution through simulations, and also analytically and numerically explore the statistical power of the associated tests under some alternatives. In addition, we perform tests for modularity in two data examples: a U.S. congressional voting network and a morphological network of the human cranium.

Throughout the paper, we denote Sn={(x1,,xn+1)n+1:x12+x22++xn+12=1}, and O(n) as the orthogonal group, consisting of all the n × n orthogonal matrices. For a symmetric matrix Wn×n, we denote λ1(W) ≥ λ2(W) ≥ · · · ≥ λn(W) as its ordered eigenvalues. The function sgn(·) returns the sign of an object (scalar, vector or matrix). We denote →d as convergence in distribution and → as a.s. convergence. For any vector x = (x1, …, xn), we denote its 1 norm as x1=i=1n|xi|, and its 2 norm as x2=(i=1nxi2)1/2.

2. The asymptotic distribution of network modularity

2·1. The limit distribution under the Gaussian orthogonal ensemble setting

We first study the asymptotic distribution of Q(W) under the Gaussian orthogonal ensemble setting, where W is a standard Wigner matrix representing signed edge weights, whose upper off-diagonal entries and the diagonal entries are jointly independent with Wij ~ N(0, 1) for i > j and Wij ~ N(0, 2) for i = j. Our first main result concerns the limiting distribution of the modularity Q(W).

Theorem 1.

For a random sample Wn×n from the GOE, let Q=sgn(u1)Wsgn(u1), where u1n is the first eigenvector of W. Then, for all x, we have

pr{n1(Q2n1/2u112)x}Φ{x21/2(12/π)},asn, (3)

where Φ(x) is the cumulative distribution function for the standard normal random variable. In particular,

Q=An+Bn (4)

where for any small constant ϵ > 0, it holds that cov{An/n,n5/6(Bn2n1/2u112)}=O(n1/6+ϵ) and

AnndN{0,2(12/π)2},n5/6(Bn2n1/2u112)d2πTW1, (5)

where TW1 is the Tracy–Widom distribution.

Remark 1. From the above theorem, the limit distribution of the normalized statistic n1(Q2n1/2u112) is normal N{0, 2(1 − 2)2} as n → ∞. In other words, for large n, the modularity Q is roughly distributed around the center 2n1/2u112 with the standard error 21/2(1 − 2)n. In particular, for the Gaussian orthogonal ensemble, it can be shown that the functional u112 of the first eigenvector u1 satisfies |u12/n2/π|=OP(n1/2). As a result, one has Q/n = 4n1/2π + OP(1), which means Q/n is concentrated around 4πn1/2, with a constant order fluctuation.

Remark 2. From the second statement in Theorem 1, we know that Q can be decomposed into two weakly dependent parts. The first part is asymptotically a centred normal, whereas the second part can be characterized by a shifted and scaled Tracy–Widom random variable. In particular, according to the characterization in (5), the standardized statistic can be decomposed as

n1(Q2n1/2u112)=Ann+Bn2n1/2u112n, (6)

where n1(Bn2n1/2u112)=O(n1/6). Hence the contribution to the asymptotic variance from Bn diminishes as n → ∞, at the rate of n−1/6. In other words, the term An is responsible for the asymptotic variance whereas the term Bn only contributes to the asymptotic mean of Q.

The proof of the above theorem relies on the key observation that

Q(W)=λ1(W){sgn(u1)ui}2+i=2nλi(WG){sgn(u1)ui}2, (7)

where the two terms can be treated separately. In fact, by setting Bn=λ1(W){sgn(u1)ui}2 and An=i=2nλi(WG){sgn(u1)ui}2, the statement (5) in Theorem 1 can be proved by carefully analysing the joint distribution of the Gaussian orthogonal ensemble eigenvalues, eigenvectors and their functionals. In particular, to show the asymptotic normality of An/n, we adopted several technical tools including the Haar measure on the orthogonal group O(n), a Berry–Esseen bound for exchangeable pairs of random vectors, the semicircle law and the eigenvalue rigidity result for Wigner matrices. We leave the detailed proof of Theorem 1 to the Appendix.

2·2. Second-order correction using convolution

Practically, as the variance contribution from Bn diminishes at a very slow rate, it could be far from precise to use N{0, 2(1 − 2)2} as an approximation of the empirical distribution of n1(Q2n1/2u112). Instead, by Theorem 1, we suggest taking the n−1/6 order term into account and using the convolution of independent normal N{0, 2(1 − 2)2} and rescaled Tracy-Widom distribution 2n1/6π1TW1, as a finite sample approximation of the limiting distribution. Hereafter we denote the cumulative distribution function of the convolution as F. The empirical performance of such convolutional approximation is assessed in §3.

Moreover, in Table 1, we numerically evaluate the correlation between An and Bn. The vanishing correlation in this case provides another justification of our use of convolution for the second-order approximation.

Table 1.

Empirical correlation between An and Bn. Each correlation is estimated over 105 iterations

n 50 100 500 1000
cor(An, Bn) 0·021 0·016 0·005 0·003

2·3. Universality implied by random matrix theory

Although the limit distribution (3) of Theorem 1 was proven under the standard Gaussian orthogonal ensemble setting, the analysis only relies on the joint distribution of the eigenvalues and the eigenvectors of the Gaussian orthogonal ensemble, as yielded by the proof in the Appendix. In this section, we discuss the potential universality of our results, or its generalizability to other matrix ensembles.

In connection to the recent achievements in random matrix theory, it has been shown that the asymptotic behaviour of the eigenvalues and eigenvectors of many important classes of random matrices are the same as those of the Gaussian orthogonal ensemble. For example, the well-known semicircle law has been obtained for sample covariance matrices (Bai & Yin, 1988), sample correlation matrices (Jiang, 2004b), Erdős-Rényi graphs (Erdős et al., 2013), random regular graphs (Bauerschmidt et al., 2017), generalized Wigner matrices (Tao & Vu, 2010; Erdős et al., 2012b) and deformed Wigner matrices (Knowles & Yin, 2013b); universality results for eigenvectors have been obtained for generalized Wigner matrices (Tao & Vu, 2011; Knowles & Yin, 2013a; Bourgade & Yau, 2017) and, more recently, for sample covariance matrices (Bloemendal et al., 2016; Ding, 2019).

For the matrix ensembles whose spectral behaviour deviates significantly from those of the Gaussian orthogonal ensemble, we admit that the same limit distribution (3) would not hold in general. For example, when p/nγ ∈ (0, ∞), it is well known that the limiting eigenvalue distribution for the sample covariance matrices is a nonsymmetric Marcenko–Pastur law (Marchenko & Pastur, 1967). In this case, (3) become questionable as some of the calculations, such as Equation (A6) in Appendix, will no longer hold. However, we do want to emphasize that the analytical framework developed in this paper is generic, and can be applied to derive the asymptotic distributions under other settings, although the calculation of some relevant quantities, such as those paralleling Lemma A1, A2, and A4, might be technically challenging.

2·4. Comparison with λ1(W)

In addition to the modularity statistic studied in this paper, some other statistics have been proposed for the purpose of community detection, especially the largest eigenvalue λ1(W) (Bickel & Sarkar, 2016; Lei, 2016). The distribution of the largest eigenvalue of the Wigner matrix is well understood. Tracy & Widom (1994) first derived this distribution, and given the ostensibly prominent role that λ1 plays in Q, some have used the close relationship between λ1 and Q in order to test for the presence of community structure in networks (Bickel & Sarkar, 2016; Lei, 2016). Here we investigate how close of a proxy λ1 is to Q to see if this approximation is justified. To evaluate this, we consider the correlation cor(Q/n, n1/6λ1). The following theorem provides a negative answer by showing the asymptotic uncorrelatedness between Q/n and n1/6λ1.

Theorem 2.

Under the condition of Theorem 1, it holds that, for any ϵ > 0,

cov(Q/n,n1/6λ1)=O(n1/6+ϵ). (8)

In Fig. 1, we show the scatter-plots of n1/6λ1 and Q/n for various n, based on 10000 simulations from a standard Wigner matrix as defined in § 2·1. As a result, a clear decrease in the empirical correlation can be observed as n increases, indicating the poor asymptotic approximation of Q by λ1.

Fig. 1.

Fig. 1.

Relationship between modularity and the largest eigenvalue of the modularity matrix. Standard Wigner matrices were generated 10000 times for each n to produce the scatter-plots and corresponding correlation estimates.

2·5. Statistical power for community detection

Our next result concerns the statistical power of the test based on the normalized modularity n1(Q2n1/2u112) and its limiting distribution obtained under the Gaussian orthogonal ensemble null. Specifically, for a given signed edge weight matrix W, we calculate its first eigenvector u1, and reject the null hypothesis whenever n1(Q2n1/2u112)>Φ1(1α) for some desired level α ∈ (0, 1). Naturally, one could replace Φ−1(1 − α) by F−1(1 − α) for better finite sample performance. For the alternative model, we consider the following deformed Gaussian orthogonal ensemble model where the signed edge weight matrix W=Θ+Zn×n is symmetric, with Θ incorporating the underlying community structure and Z being a standard Wigner matrix. Examples of Θ include blockwise constant matrices or block-wise diagonal matrices extensively studied under the stochastic block models (Lei et al., 2015; Zhang & Zhou, 2016; Hu et al., 2020), and some general low rank matrices commonly considered for studying the spectral clustering algorithms (Lu & Zhou, 2016; Löffler et al., 2019).

Theorem 3.

Suppose W=Θ+Zn×n, where Z is a standard Wigner matrix and Θ is some fixed symmetric matrix. Then, as long as λ1(Θ) ≥ C0n and λn(Θ) >C1n for some universal constants C0, C1 > 0, we have pr{n1(Q2n1/2u112)>Φ1(1α)}1 and pr{n1(Q2n1/2u112)>F1(1α)}1.

Remark 3. The alternative model considered in Theorem 3 covers a wide range of scenarios where community structure is present in the network edge weights. In particular, the theorem only requires the matrix Θ to have sufficiently large global signal to be detectable from the noisy observations, and there is no need to specify the community structure incorporated in Θ.

3. Simulations

3·1. Empirical quantile assessment

In this section, we conduct simulation studies to empirically evaluate our derived limiting distribution in the previous section. We generate the Gaussian orthogonal ensemble matrices whose dimension n varies from 50 to 5000, and calculate their modularities defined by (1). We compare the empirical distributions of the standardized statistic in (6) based on simulated modularities against their theoretical quantiles qα corresponding to probabilities α varying from 0·01 to 0·99. Specifically, we evaluate the tail probabilities of the standardized statistic at qα for different n and cutoffs α. For the theoretical distribution, we consider two distributions, namely, the normal distribution N{0,2(1 − 2)2} obtained in Theorem 1, and its second-order correction F. In particular, in the latter case, the theoretical quantiles are obtained numerically by generating 100000 samples. Table 2 and Table 3 show the empirical quantiles based on 100000 rounds of simulations. Comparing these two tables, it is clear that the convolution F provides a better approximation of the empirical distribution.

Table 2.

Empirical probabilities at N {0, 2(1 − 2/π)2} quantiles: the Gaussian orthogonal ensemble case

n 50 100 500 1000 2000 5000
α = 0·01 0·0061 0·0054 0·0059 0·0056 0·0056 0·0057
α = 0·05 0·0210 0·0226 0·0251 0·0260 0·0268 0·0283
α = 0·25 0·0914 0·1042 0·1295 0·1364 0·1470 0·1602
α = 0·50 0·2033 0·2313 0·2927 0·3104 0·3322 0·3564
α = 0·75 0·3745 0·4221 0·5162 0·5469 0·5743 0·6054
α = 0·95 0·6668 0·7231 0·8134 0·8397 0·8588 0·8808
α = 0·99 0·8337 0·8758 0·9326 0·9455 0·9551 0·9652

Table 3.

Empirical probabilities at F quantiles: the Gaussian orthogonal ensemble case

n 50 100 500 1000 2000 5000
α = 0·01 0·0191 0·0154 0·0125 0·0128 0·0103 0·0114
α = 0·05 0·0813 0·0709 0·0580 0·0570 0·0534 0·0526
α = 0·25 0·1835 0·2016 0·2293 0·2350 0·2353 0·2427
α = 0·50 0·4078 0·4314 0·4689 0·4804 0·4886 0·4889
α = 0·75 0·6720 0·6922 0·7257 0·7320 0·7464 0·7405
α = 0·95 0·9179 0·9286 0·9432 0·9448 0·9469 0·9482
α = 0·99 0·9808 0·9843 0·9891 0·9882 0·9889 0·9888

3·2. Universality of the limit distribution

In this section, we empirically evaluate the validity of our theoretical limit distribution as well as its second-order approximation under some non-Gaussian orthogonal ensemble matrix ensembles. In particular, as of both theoretical and practical interest, we consider (i) symmetric random matrices with heavy-tailed, nonsymmetric distributions such as the exponential distribution, Exp(1); (ii) the adjacency matrix of sparse Erdős–Rényi random graph (Erdős et al., 2012a, 2013) with p = n−1/4; and (iii) the sample correlation matrix Rn of N independent observations from N(0, In) with N = n5/2. In case (i) and (ii), the entries of the random matrices are normalized to match the first two moments of Gaussian orthogonal ensemble. In case (iii), the modularity is calculated from the normalized matrix N1/2(RnIn). In Tables 46, we show the empirical tail probabilities evaluated at different quantiles of the convolution F. The results concerning tail probabilities evaluated at the quantiles of N{0, 2(1 − 2)2} are put in the Supplementary Material. Our numerical results suggest the universality of our limit distribution as well as its second-order approximation over a wide range of non-Gaussian orthogonal ensemble random matrices, which implies its strong potential for practical applications beyond the Gaussian orthogonal ensemble setting.

Table 4.

Empirical probabilities at F quantiles: heavy-tailed nonsymmetric distribution Exp(1)

n 50 100 500 1000 2000 5000
α = 0·01 0·0884 0·0796 0·0253 0·0197 0·0155 0·0130
α = 0·05 0·1900 0·1839 0·0957 0·0795 0·0689 0·0616
α = 0·25 0·4387 0·4572 0·3513 0·3204 0·3002 0·2800
α = 0·50 0·6479 0·6790 0·6086 0·5825 0·5570 0·5338
α = 0·75 0·8271 0·8540 0·8236 0·8078 0·7920 0·7773
α = 0·95 0·9615 0·9732 0·9689 0·9666 0·9619 0·9595
α = 0·99 0·9912 0·9950 0·9948 0·9937 0·9928 0·9920

Table 6.

Empirical probabilities at F quantiles: sample correlation matrix (N = n5/2)

n 20 50 75 100 150 200
α = 0·01 0·0033 0·0068 0·0059 0·0073 0·0076 0·0083
α = 0·05 0·0176 0·0310 0·0343 0·0383 0·0480 0·0465
α = 0·25 0·1280 0·1791 0·2075 0·2186 0·2356 0·2373
α = 0·50 0·3289 0·4156 0·4561 0·4648 0·4880 0·4905
α = 0·75 0·6164 0·6914 0·7219 0·7308 0·7479 0·7550
α = 0·95 0·9197 0·9364 0·9450 0·9491 0·9532 0·9516
α = 0·99 0·9829 0·9872 0·9912 0·9896 0·9919 0·9908

3·3. Empirical power assessment

Under certain alternative models for weighted signed networks that suggest community structure, we numerically assess and compare powers of the modularity based tests and some other methods for community detection. Specifically, we consider the following deformed/spiked Gaussian orthogonal ensemble model where the signed edge weight matrix W=βuu+D+Zn×n is symmetric, with β, uSn−1, D a diagonal matrix, and Z a standard Wigner matrix. In particular, for each n ∈ {100, 200, 300, 400, 500, 600}, we set β = √n, and set u such that its first n/2 coordinates are n−1/2 and the rest of the coordinates are −n−1/2. This implies two clusters of nodes of equal size, where the within-group and cross-group edge weights are two distinct values. The diagonal entries of D are randomly generated from [−√n, √n], to increase heterogeneity. Table 7 shows the empirical powers of: (i) Modularity Test I, the test based on the normalized modularity and its Gaussian limiting distribution in Theorem 1; (ii) Modularity Test II, the test based on the normalized modularity and the convolutional approximation F, whose quantiles are obtained numerically as in previous sections; (iii) Largest Eigenvalue Test, the test based on λ1(W) and its Tracy–Widom limiting distribution (Johnstone & Ma, 2012), and (iv) Entrywise Maximum Test, the test based on the entrywise maxima max1≤ijn |{cov(W)}ij| and its Gumbel limiting distribution (Jiang, 2004a; Hu et al., 2020). The details of the Largest Eigenvalue Test and the Entrywise Maximum Test and their asymptotic validity under the null model are demonstrated in the Supplementary Material. The empirical powers of these methods at level α = 0.05 are calculated from 100000 rounds of simulations. From Table 7, we find that the Modularity Tests I and II are more powerful than the Largest Eigenvalue Test and the Entrywise Maximum Test when n is large (n ≥ 400), while the Entrywise Maximum Test is more powerful for smaller n.

Table 7.

Empirical powers of four different methods at level α = 0.05

n 50 100 200 400 600 800
Modularity Test I 0·3791 0·4779 0·5819 0·6653 0·7106 0·7358
Modularity Test II 0·4432 0·5569 0·6493 0·7164 0·7525 0·7793
Largest Eigenvalue Test 0·5471 0·5804 0·6159 0·6510 0·6788 0·6952
Entrywise Maximum Test 0·7686 0·7108 0·6655 0·6206 0·5977 0·5969

4. Real data analysis

4·1. Analysis of US congressional voting networks

Annual voting records for individuals in the U.S. house of representatives provide a commonly used example of a highly modular network, where nodes stand for representatives and edge weights correspond with the correlation between the voting records of pairs of representatives. Recently, evidence of increased partisan polarization has been observed based on increased modularity in more recent annual voting networks (Neal, 2018). We let W be a centred and scaled correlation matrix with zeroes in the diagonal based on the 1984 congressional voting records. We removed congressmen with more than 50% votes unrecorded for the year for a total of n = 431 congressmen in our network. Letting sgn(u1) determine community membership, representatives were strongly divided based on party affiliation, with 96·9% Democrat and 3·1% Republican membership in one community, and 77·6% Republican and 22·4% Democrat membership in the other community. Modularity was very large, with Q/n2u112/n1/2=3143, which based on Theorem 1 provides overwhelming statistical evidence of community structure.

Given the nature of partisan politics, it is unsurprising that there was strong evidence to reject a null hypothesis of no community structure in congress. Thus we also explore the less obvious question of whether there are additional communities beyond the Republican and Democrat divide. By restricting the data to the 205 congressmen in the Republican-dominated community, namely, 77·6% republican, we recentred and rescaled weights over this subset and applied Theorem 1 again. There again was overwhelming statistical evidence of additional community structure as Q/n2u112/n1/2=985. The largest eigenvalue and the entrywise maximum-based tests led to consistent conclusions with p-values less than 1·0 × 10−4 in both cases. Therefore, this Republican dominated subset divides into two additional communities: one community with a majority, 58·7%, of Democrats, and another with a large majority, 88·1%, of Republicans. This is evidence of a substantial subset of moderate Democrats that, despite being initially clustered with Republicans based on their voting record, also demonstrated sufficient differences from the Republicans to warrant belonging to a separate and distinct community.

4·2. Network structure of the human cranium

Morphological networks of the human cranium define nodes to be anatomically defined measurements between landmark points on the cranium of a particular individual. Edge weights are defined by Pearson correlations between cranial measurements for each pair of landmarks. In Fig. 2, the corresponding correlation network demonstrates blocks of cranial landmarks with nested correlation structure, such as what can be observed for landmarks 1 through 24. Due to different cranial landmarks developing simultaneously on the cranium for each individual, and therefore subject to the same environmental factors throughout development, this nested structure is an expected feature of this morphological network. Network nestedness occurs when interactions of less connected nodes form proper subsets of the interactions of more connected nodes. Modularity is a type of nestedness where there is no distinct heirarchical structure separating nodes with a low degree from nodes with a high degree within a community, and so modularity can be interpreted as an intermediate form of nestedness.

Fig. 2.

Fig. 2.

Correlation network of landmark measurements of the human cranium. Cranial landmarks are discrete anatomical points that are homologous across humans. A sample of 1367 male crania were to used to calculate the Pearson correlations between each pair of 44 cranial landmark measurements.

Cantor et al. (2017) constructed morphological networks of the human crania using 1367 males to calculate correlations between each pair of 44 different landmark measurements. They found significant statistical evidence of nestedness in the resulting correlation network. We apply Theorem 1 to this network after proper normalization and find that there is overwhelming statistical evidence of community structure (Q/n2u112/n1/2=257). Similarly, tests for the same null hypothesis based on the asymptotic distributions of the first eigenvalue and the entrywise maximum both lead to the same conclusion with both p-values less than 1·0 × 10−4. This implies that human crania tend to contain clusters of landmarks, likely spatially close to one another, that grow together in parallel throughout development.

5. Discussion

Our numerical results show that, although having a significant improvement upon the original limit distribution, the second-order approximation seems still insufficient for applications with small sample sizes. Hence, it would be interesting to find some more accurate higher-order approximation for the limit distribution in Theorem 1.

In Reichardt & Bornholdt (2006) and Fortunato & Barthelemy (2007), it was shown that the modularity defined as in (1) has its own limits, such as it is unable to find community structure in networks with many small communities. To address the issue, Arenas et al. (2008) proposed a generalized modularity which includes a resolution parameter. Consequently, it would also be of interest to extend our analysis to the generalized modularity (Newman, 2016).

In §2.1, due to complicated dependence structure between the error term n1/2(u12/n2/π) and the first order fluctuation An/n, our current analytical framework can only lead us to the limiting distribution of the normalized n1(Q2n1/2u112). We admit this is mainly due to the limitation of our technical tools, and, in light of Remark 1, it is of interest whether a direct limiting distribution for n−1(Qn3/24) can be obtained. Some numerical comparisons of n−1Q, n1(Q2n1/2u112) and n−1(Qn3/24) are presented in our Supplementary Material, which suggest that a test based on n−1(Qn3/24) could be more powerful against certain alternatives. We leave the more rigorous theoretical investigations for future research.

Supplementary Material

Supplemental Materials

Table 5.

Empirical probabilities at F quantiles: sparse Erdős-Rényi random graph (p = n−1/4)

n 50 100 500 1000 2000 5000
α = 0·01 0·0006 0·0015 0·0067 0·0094 0·0110 0·0128
α = 0·05 0·0059 0·0121 0·0374 0·0465 0·0557 0·0606
α = 0·25 0·0681 0·1085 0·2093 0·2424 0·2646 0·2784
α = 0·50 0·2261 0·3035 0·4464 0·4921 0·5181 0·5344
α = 0·75 0·4958 0·5776 0·7069 0·7403 0·7648 0·7759
α = 0·95 0·8524 0·8899 0·9354 0·9458 0·9541 0·9562
α = 0·99 0·9633 0·9733 0·9870 0·9894 0·9913 0·9913

Acknowledgement

The authors are grateful to the editor, associate editor and two referees for their comments and suggestions that significantly improved the quality of the manuscript. This research was supported by the National Institute of Mental Health of the National Institutes of Health (R01MH116884 (IB)).

Appendix

Throughout, for sequences {an} and {bn}, we write an = o(bn) (or an = oP(bn)) if limn an/bn = 0 (in probability), and write an = O(bn) (or an = OP(bn)), anbn or bnan if there exists a constant C such that anCbn for all n (in probability). We write anbn if anbn and anbn. For a set A, we denote |A| as its cardinality. Lastly, C,C0,C1, … are constants that may vary from place to place.

In the following, we prove Theorem 1 in the main paper. The proofs of other theorem and some technical lemmas are collected in the Supplementary Material.

Proof of Theorem 1.

We first recall some important results concerning the eigenvectors and the eigenvalues of the Gaussian orthogonal ensemble. Specifically, the eigenvectors u1(W), …, un(W) are uniformly distributed on the half-sphere S+n1={x=(x1,,xn)Sn1:x1>0), and the joint distribution of (u1(W), …, un(W)) is the Haar measure on the orthogonal group O(n), with each column multiplied by −1 or 1 so that the columns all belong to S+n1 (O’Rourke et al., 2016). An immediate consequence is the following proposition characterizing the joint distribution of the eigenvectors of W.

Proposition A1.

Let v be a random vector uniformly distributed on Sn−1. Then v has the same distribution as (ξ1(j=1nξj2)1/2,,ξn(j=1nξj2)1/2) where ξ1, …, ξn are independently and identically drawn from N(0, 1).

Another well-known fact related to the modularity under the Gaussian orthogonal ensemble is the limiting distribution of its largest eigenvalue λ1(W), derived in the seminal works of Tracy and Widom (Tracy & Widom, 1994, 1996).

Theorem A4 (The Tracy–Widom Law).

Let λ1(W) denote the largest eigenvalue of W where W is a sample from the Gaussian orthogonal ensemble with dimension n × n, then pr{n1/6(λ1(W) − 2n1/2) ≤ s} → F(s), where F(s) denotes the Tracy–Widom distribution.

Proof of Theorem 1.

By definition and eigen-decomposition of W, we have

Q/n=sgn(u1)Wsgn(u1)/n=λ1(W){sgn(u1)u1}2/n+i=2nλi(W){sgn(u1)ui}2/n=λ1(W)u112/n+i=2nλi(W){sgn(u1)ui}2/nBn/n+An/n. (A1)

The proof is separated into three parts. Firstly, we show that Bn/n in (A1), after proper centring and scaling, converges weakly to a Tracy-Widom distribution. Secondly, we show that An/n is asymptotically normal. Finally, we deal with the covariance between the two terms.

Part I.

By Proposition 1, we know that u1(W),u2(W), …, un(W) are independent and have the same distribution as

(ξ1j=1nξj2,,ξnj=1nξj2), (A2)

where ξ1, …, ξn are independently and identically distributed standard normal random variables. Therefore,

u112/n=(j=1n|ξj|)2nj=1nξj2=nj=1nξj2(j=1n|ξj|n)2.

On the one hand, ξj2 are independent χ2 random variables, which satisfy sub-exponential tail bound. By standard concentration inequality for sub-exponential random variables such as Proposition 5.16 in Vershynin (2010), we have, for any ϵ > 0

pr{|1ni=1nξi2E(ξi2)|>(log1ϵn)1/2}<ϵc,

for some constant c > 0. On the other hand, standard concentration inequality for sub-Gaussian random variables yields, for any ϵ > 0,

pr{|1ni=1n|ξi|E(|ξi|)|>(log1ϵn)1/2}<ϵc,

for some constant c > 0. Thus with probability at least 1 – O(ϵc) for some c > 0,

|nj=1nξj2(j=1n|ξj|n)22π||nj=1nξj21|2π+nj=1nξj2|(j=1n|ξj|n)2(E|ξi|)2|2π(log1ϵn)1/2+2(2π)1/2{log1ϵn+(log1ϵn)1/2}C(n1logϵ1)1/2. (A3)

By Theorem A3, we have

n1/6{λ1(W)u112/n2u112/n1/2}d2πTW1. (A4)

In other words, λ1(W)u112/n2u112/n1/2=OP(n1/6).

Part II.

We denote the second term in (A1) as

An/n=n1i=2nλi(W){sgn(u1)ui}2.

Denote γj for j = 1, …, n, as the classical location of the jth eigenvalue, scaled by n1/2, under the semicircle law ordered in increasing order. In other words,

nγjρsc(x)dx=j(j=1,,n),

where ρsc(x) = (2π)−1√(4 − x2)+ is the semicircle law. Define

Ω0=n1/2i=2nγi{sgn(u1)ui}2. (A5)

In what follows, we show that Ω0 is asymptotically normal with variance 2(1 − 2)2, and then conclude by verifying |Ω0An/n| → 0 in probability.

Asymptotic normality of Ω0.

The proof of asymptotic normality depends on the following key observations about a single sgn(u1)ui.

Lemma A1.

Suppose (u1, …, un) has a Haar measure on orthogonal group O(n). Then for any i = 2, …, n, it holds that sgn(u1)uidN(0,12/π). In particular, we have sgn(u1)ui=Ni+Op(logn/n1/2), where Ni are drawn independently from N(0, σ2) and σ2 = 1 − 2 + o(1).

Our next result concerns the relation between two elements sgn(u1)ui and sgn(u1)uj where i, j ∈ {2, …, n},ij. In particular, we show that (sgn(u1)ui,sgn(u1)uj) is an isotropic vector.

Lemma A2.

Suppose (u1, …, un) has a Haar measure on orthogonal group O(n). Then for any i, j ∈ {2, …, n} with ij, it holds that E{sgn(u1)uisgn(u1)uj}=0 and E[{sgn(u1)ui}2{sgn(u1)uj}2]=(12/π)2+o(1).

Now without loss of generality we assume n is even, namely, n = 2m for some integer m > 0. Define

γi{sgn(u1)ui}2=ζi2,fori=1,,m

and

γni+1{sgn(u1)uni+1}2=ηi2,fori=1,,m.

Hence

Ω0=n1/2i=2nγi{sgn(u1)ui}2=m1/2i=2m(ζi2ηi2)/21/2+n1/2γn{sgn(u1)un}2.

Set αi=(ζi2ηi2)/21/2 for i = 2, …, m. It is easy to check

E(αi)=0,E(αiαj)=0, (A6)

suing Lemma A2, the exchangeable property of the Haar measure on O(n) and the symmetry γi = −γni+1. The asymptotic normality of Ω0 can be obtained from the following central limit theorem for the symmetric isotropic random vectors and the fact that, by Lemma A1, n1/2γn{sgn(u1)un}20.

Lemma A3.

Suppose that X=(X1,,Xn)n has a distribution that is invariant under reflections in the coordinate hyperplanes and

E(Xi)=0,E(Xi2)=σi2<,E(XiXj)=0

for i, j ∈ {1, …, n} and i 6= j. Let θ = (θ1, …, θn) ∈ Sn−1 be a fixed vector and σθ2=i=1Nθi2σi2. Then

supt|pr(i=1nθiXiσθt)Φ(t)|2{σθ4i,jθi2θj2E(Xi2Xj2)1}1/2+(8/π)1/4[σθ3{maxiE(|Xi|3)}i=1n|θi|3]1/2.

Lemma A4.

For all i = 2, …, m, it holds that E(αi2)=2γi2(12/π)2+o(1). For any fixed i, j ∈ {2, …, m} and ij, it holds that cov(αi2,αj2)=o(1).

Let Ω0 = θα + O(n−1/2), where θ = (1/m, …, 1/m) and α = (α2, …, αm). Denote σi2=Eαi2. It then follows that σθ2=1mi=2mσi2σsum2/m. Combining Lemma A3 and A4, we have

supt|pr(θαm1/2σmt)Φ(t)|2mσsum2(1m22i,jmEαi2αj21m22i,jmσi2σj2)1/2+Cm1/4=2mσsum2{1m2i=2mVar(αi2)+1m22ijmcov(αi2,αj2)}1/2+Cm1/4

for some constant C > 0. Now note that

limmσsum2m=limm2(12/π)2mi=2mγi2=2(12/π)2,1m2i=2mvar(αi2)1mmaxiE(αi4)=O(1/m), (A7)

and using Lemma A4

1m22ijmcov(αi2,αj2)=m1mcov(α12,α22)=o(1),

it follows that

supt|P(θαm1/2σsumt)Φ(t)|=o(1).

Note that (A7) holds, by Slutsky’s theorem, we have

Ω021/2(12/π)dN(0,1). (A8)

Asymptotic normality of An/n.

We need the following results obtained by Erdős et al. (2012b).

Lemma A5 (Rigidity of Eigenvalues).

For generalized Wigner matrices, if γj is the classical location of the jth eigenvalue under the semicircle law ordered in increasing order, then the scaled j-th eigenvalue λj/n1/2 is close to γj in the sense that for some positive constants C,c

pr[j:|λj/n1/2γj|(logn)cloglogn{min(j,nj+1)}1/3n2/3]Cexp{(logn)cloglogn}

for sufficiently large n.

As a consequence, for any j = 1, …, n, we have

|λj/n1/2γj|=oP({min(j,nj+1)}1/3n2/3+δ)

for any small δ > 0. In other words, the eigenvalue is near its classical location with an error of at most N−1(log n)C log log n for generalized Wigner matrices in the bulk and the estimate deteriorates by a factor (n/j)1/3 near the edge jn. As a consequence, for any sufficiently small δ > 0,

|An/nΩ0|n1/2i=2n|λi(W)/n1/2γi|{sgn(u1)ui}2=n1/2i=2n{sgn(u1)ui}2oP(n2/3+δ)

Note that {sgn(u1)ui}2=OP(1), we have |An/n − Ω0| → 0 in probability. So the asymptotic normality of Ω follows from Slutsky’s theorem and (A8).

Part III.

By definition, we have

cov(An/n,n5/6(Bn2n1/2u112))=cov(n1i=2nλi{sgn(u1)ui}2,(λ12n1/2)u112n5/6).

It suffices to control cov(λi{sgn(u1)ui}2,n1/3(λ1/n1/22)u112) for any i = 2, …, n. Now we define

cov[λi{sgn(u1)ui}2,n1/3(λ1/n1/22)u112]=cov{γi(sgn(u1)ui)2,n1/6(λ1/n1/22)u112}+E.

Lemma A6.

Under the conditions of Theorem 1, for any small constant ϵ > 0, it holds that |E|=o(n1/6+2ϵ) and cov{γi(sgn(u1)ui)2,n1/6(λ1/n1/22)u112}=O(n1/2+ϵ).

Applying Lemma A6 to the above equation, we complete the third part of our proof.

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes proofs of the other theorems and technical lemmas, as well as some supplementary tables and figures.

References

  1. Agarwal G & Kempe D (2008). Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66, 409–418. [Google Scholar]
  2. Arenas A, Fernandez A & Gomez S (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics 10, 053039. [Google Scholar]
  3. Bai Z-D & Yin Y-Q (1988). Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a wigner matrix. The Annals of Probability, 1729–1741. [Google Scholar]
  4. Bauerschmidt R, Knowles A & Yau H-T (2017). Local semicircle law for random regular graphs. Communications on Pure and Applied Mathematics 70, 1898–1960. [Google Scholar]
  5. Bickel PJ & Sarkar P (2016). Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 253–273. [Google Scholar]
  6. Bloemendal A, Knowles A, Yau H-T & Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164, 459–552. [Google Scholar]
  7. Boccaletti S, Latora V, Moreno Y, Chavez M & Hwang D-U (2006). Complex networks: Structure and dynamics. Physics Reports 424, 175–308. [Google Scholar]
  8. Bourgade P & Yau H-T (2017). The eigenvector moment flow and local quantum unique ergodicity. Communications in Mathematical Physics 350, 231–278. [Google Scholar]
  9. Bullmore E & Sporns O (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10, 186. [DOI] [PubMed] [Google Scholar]
  10. Cantor M, Pires MM, Marquitti FM, Raimundo RL, Sebastián-González E, Coltri PP, Perez SI, Barneche DR, Brandt DY, Nunes K et al. (2017). Nestedness across biological scales. PloS one 12, e0171691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chen M, Kuzmin K & Szymanski BK (2014). Community detection via maximization of modularity and its variants. IEEE Transactions on Computational Social Systems 1, 46–65. [Google Scholar]
  12. Chi KT, Liu J & Lau FC (2010). A network perspective of the stock market. Journal of Empirical Finance 17, 659–667. [Google Scholar]
  13. Ding X (2019). Singular vector distribution of sample covariance matrices. Advances in Applied Probability 51, 236–267. [Google Scholar]
  14. Dwyer DB, Harrison BJ, Yücel M, Whittle S, Zalesky A, Pantelis C, Allen NB & Fornito A (2014). Large-scale brain network dynamics supporting adolescent cognitive control. Journal of Neuroscience 34, 14096–14107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Erdős L, Knowles A, Yau H-T & Yin J (2012a). Spectral statistics of erdős-rényi graphs ii: Eigenvalue spacing and the extreme eigenvalues. Communications in Mathematical Physics 314, 587–640. [Google Scholar]
  16. Erdős L, Knowles A, Yau H-T, Yin J et al. (2013). Spectral statistics of erdős–rényi graphs i: local semicircle law. The Annals of Probability 41, 2279–2375. [Google Scholar]
  17. Erdős L, Yau H-T & Yin J (2012b). Rigidity of eigenvalues of generalized wigner matrices. Advances in Mathematics 229, 1435–1515. [Google Scholar]
  18. Fortunato S & Barthelemy M (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 36–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Good BH, De Montjoye Y-A & Clauset A (2010). Performance of modularity maximization in practical contexts. Physical Review E 81, 046106. [DOI] [PubMed] [Google Scholar]
  20. Hu J, Zhang J, Qin H, Yan T & Zhu J (2020). Using maximum entry-wise deviation to test the goodness-of-fit for stochastic block models. Journal of the American Statistical Association, 1–30. [Google Scholar]
  21. Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. The Annals of Applied Probability 14, 865–880. [Google Scholar]
  22. Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics, 35–48. [Google Scholar]
  23. Johnstone IM & Ma Z (2012). Fast approach to the Tracy-Widom law at the edge of GOE and GUE. The Annals of Applied Probability 22, 1962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Knowles A & Yin J (2013a). Eigenvector distribution of wigner matrices. Probability Theory and Related Fields 155, 543–582. [Google Scholar]
  25. Knowles A & Yin J (2013b). The isotropic semicircle law and deformation of wigner matrices. Communications on Pure and Applied Mathematics 66, 1663–1749. [Google Scholar]
  26. Lancichinetti A & Fortunato S (2009). Community detection algorithms: a comparative analysis. Physical review E 80, 056117. [DOI] [PubMed] [Google Scholar]
  27. Langfelder P & Horvath S (2008). Wgcna: an r package for weighted correlation network analysis. BMC bioinformatics 9, 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lei J (2016). A goodness-of-fit test for stochastic block models. The Annals of Statistics 44, 401–424. [Google Scholar]
  29. Lei J, Rinaldo A et al. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics 43, 215–237. [Google Scholar]
  30. Lichoti JK, Davies J, Kitala PM, Githigia SM, Okoth E, Maru Y, Bukachi SA & Bishop RP (2016). Social network analysis provides insights into african swine fever epidemiology. Preventive veterinary medicine 126, 1–10. [DOI] [PubMed] [Google Scholar]
  31. Löffler M, Zhang AY & Zhou HH (2019). Optimality of spectral clustering for gaussian mixture model. arXiv preprint arXiv:1911.00538. [Google Scholar]
  32. Lu Y & Zhou HH (2016). Statistical and computational guarantees of lloyd’s algorithm and its variants. arXiv preprint arXiv:1612.02099. [Google Scholar]
  33. Marchenko VA & Pastur LA (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik 114, 507–536. [Google Scholar]
  34. Neal ZP (2018). A sign of the times? weak and strong polarization in the us congress, 1973–2016. Social Networks. [Google Scholar]
  35. Newman ME (2006a). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104. [DOI] [PubMed] [Google Scholar]
  36. Newman ME (2006b). Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Newman ME (2016). Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E 94, 052315. [DOI] [PubMed] [Google Scholar]
  38. Newman ME & Girvan M (2004). Finding and evaluating community structure in networks. Physical Review E 69, 026113. [DOI] [PubMed] [Google Scholar]
  39. O’Rourke S, Vu V & Wang K (2016). Eigenvectors of random matrices: a survey. Journal of Combinatorial Theory, Series A 144, 361–442. [Google Scholar]
  40. Reichardt J & Bornholdt S (2006). When are networks truly modular? Physica D: Nonlinear Phenomena 224, 20–26. [Google Scholar]
  41. Rizkallah J, Benquet P, Wendling F, Khalil M, Mheich A, Dufor O & Hassan M (2016). Brain network modules of meaningful and meaningless objects. In Biomedical Engineering (MECBME), 2016 3rd Middle East Conference on. IEEE. [Google Scholar]
  42. Springer A, Kappeler PM & Nunn CL (2017). Dynamic vs. static social networks in models of parasite transmission: predicting c ryptosporidium spread in wild lemurs. Journal of Animal Ecology 86, 419–433. [DOI] [PubMed] [Google Scholar]
  43. Tao T & Vu V (2010). Random matrices: Localization of the eigenvalues and the necessity of four moments. arXiv preprint arXiv:1005.2901. [Google Scholar]
  44. Tao T & Vu V (2011). Random matrices: universality of local eigenvalue statistics. Acta mathematica 206, 127. [Google Scholar]
  45. Telesford QK, Lynall M-E, Vettel J, Miller MB, Grafton ST & Bassett DS (2016). Detection of functional brain network reconfiguration during task-driven cognitive states. NeuroImage 142, 198–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Traag VA & Bruggeman J (2009). Community detection in networks with positive and negative links. Physical Review E 80, 036115. [DOI] [PubMed] [Google Scholar]
  47. Tracy CA & Widom H (1994). Level-spacing distributions and the airy kernel. Communications in Mathematical Physics 159, 151–174. [Google Scholar]
  48. Tracy CA & Widom H (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics 177, 727–754. [Google Scholar]
  49. Vershynin R (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027. [Google Scholar]
  50. Wasserman S & Faust K (1994). Social network analysis: Methods and applications, vol. 8. Cambridge university press. [Google Scholar]
  51. Zhang AY & Zhou HH (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics 44, 2252–2280. [Google Scholar]
  52. Zhang J & Chen Y (2016). A hypothesis testing framework for modularity based network community detection. Statistica Sinica 27, 437–456. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Materials

RESOURCES