Summary
Modularity is a popular metric for quantifying the degree of community structure within a network. The distribution of the largest eigenvalue of a network’s edge weight or adjacency matrix is well studied and is frequently used as a substitute for modularity when performing statistical inference. However, we show that the largest eigenvalue and modularity are asymptotically uncorrelated, which suggests the need for inference directly on modularity itself when the network size is large. To this end, we derive the asymptotic distributions of modularity in the case where the network’s edge weight matrix belongs to the Gaussian orthogonal ensemble, and study the statistical power of the corresponding test for community structure under some alternative models. We empirically explore universality extensions of the limiting distribution and demonstrate the accuracy of these asymptotic distributions through Type I error simulations. We also compare the empirical powers of the modularity based tests with some existing methods. Our method is then used to test for the presence of community structure in two real data applications.
Keywords: Asymptotic distribution, Community detection, Modularity, Network data analysis
1. Introduction
Many scientific and social systems are composed of large numbers of interacting elements. These systems can be conceptualized as networks where nodes represent elements in the system and network edges represent interactions between elements. Networks appear consistently across scientific domains, ranging from protein interaction networks within a living cell to social networks of people communicating within society (Wasserman & Faust, 1994; Boccaletti et al., 2006). Networks frequently divide into communities, or groups of nodes that cluster together. Detecting these network communities is a well-studied problem, with the most popular methods revolving around the maximization of a function known as modularity over all possible partitions of the network into communities (Newman & Girvan, 2004; Newman, 2006a,b; Good et al., 2010; Chen et al., 2014). For most moderately large networks, enumerating all possible divisions into communities to find the maximum is not feasible. Many methods have been developed that aim to find optimal or near-optimal solutions with low computational complexity (Agarwal & Kempe, 2008; Lancichinetti & Fortunato, 2009).
One of the most well-known approaches for identifying network community structure is the spectral approach proposed by Newman (2006a,b). If we consider an undirected random graph G(E, V) where |V | = n, whose signed edge weight matrix is WG, we define its modularity using the Newman–Girvan definition as:
| (1) |
where is the eigenvector corresponding to the largest eigenvalue, λ1(WG), of WG and sgn(u1) ∈ {0,±1}n is the vector of signs of u1. Since by definition Q(G) only depends on the weight matrix WG, throughout the paper we will not distinguish Q(G) and Q(WG). In this setting, a common choice of null model considers W to be a Wigner matrix. In addition, the treatment of networks with signed weights is distinct from networks with positive weights with respect to modularity. For example, Traag & Bruggeman (2009) considered community detection in complex networks with positive and negative links using a generalized Potts model.
Although modularity is frequently used, interpreting modularity tends to be a subjective exercise, most frequently done without the aid of statistical inference. In cases where inference is performed, simulation from some assumed null distribution is required (Dwyer et al., 2014; Rizkallah et al., 2016; Telesford et al., 2016; Lichoti et al., 2016; Springer et al., 2017; Zhang & Chen, 2016). For very large networks, simulation of the null distribution is not computationally feasible. Because of this, some have looked towards analytical and asymptotic inference solutions based on the spectral decomposition:
| (2) |
where is the eigenvector corresponding to the ith largest eigenvalue λi(WG). Because λ1 has a disproportionately large role in Q(G) and because λ1(WG) frequently can be well-modelled by a Tracy-Widom distribution (Tracy & Widom, 1994) for general Wigner matrices (Tao & Vu, 2011), there are some methods that use λ1 as a proxy for Q(G) when performing inference (Bickel & Sarkar, 2016; Lei, 2016). While approximating modularity with λ1 is a tempting alternative, as we will show the smaller terms in equation (2) play a nontrivial role in the null distribution of modularity and should not be ignored.
In this paper, we derive the asymptotic distribution of modularity defined in (1) as n → ∞ under the Gaussian orthogonal ensemble random matrices. Weighted networks with signed edges such as correlation networks can be well-modelled by Gaussian orthogonal ensemble random matrices under a variety of null models. Correlation networks frequently appear in many contexts such stock market price networks (Chi et al., 2010), brain activity networks based on functional magnetic resonance imaging (Bullmore & Sporns, 2009), and gene expression networks (Langfelder & Horvath, 2008), to name a few. We demonstrate the convergence rate and accuracy of this distribution through simulations, and also analytically and numerically explore the statistical power of the associated tests under some alternatives. In addition, we perform tests for modularity in two data examples: a U.S. congressional voting network and a morphological network of the human cranium.
Throughout the paper, we denote , and O(n) as the orthogonal group, consisting of all the n × n orthogonal matrices. For a symmetric matrix , we denote λ1(W) ≥ λ2(W) ≥ · · · ≥ λn(W) as its ordered eigenvalues. The function sgn(·) returns the sign of an object (scalar, vector or matrix). We denote →d as convergence in distribution and → as a.s. convergence. For any vector x = (x1, …, xn), we denote its ℓ1 norm as , and its ℓ2 norm as .
2. The asymptotic distribution of network modularity
2·1. The limit distribution under the Gaussian orthogonal ensemble setting
We first study the asymptotic distribution of Q(W) under the Gaussian orthogonal ensemble setting, where W is a standard Wigner matrix representing signed edge weights, whose upper off-diagonal entries and the diagonal entries are jointly independent with Wij ~ N(0, 1) for i > j and Wij ~ N(0, 2) for i = j. Our first main result concerns the limiting distribution of the modularity Q(W).
Theorem 1.
For a random sample from the GOE, let , where is the first eigenvector of W. Then, for all , we have
| (3) |
where Φ(x) is the cumulative distribution function for the standard normal random variable. In particular,
| (4) |
where for any small constant ϵ > 0, it holds that and
| (5) |
where is the Tracy–Widom distribution.
Remark 1. From the above theorem, the limit distribution of the normalized statistic is normal N{0, 2(1 − 2/π)2} as n → ∞. In other words, for large n, the modularity Q is roughly distributed around the center with the standard error 21/2(1 − 2/π)n. In particular, for the Gaussian orthogonal ensemble, it can be shown that the functional of the first eigenvector u1 satisfies . As a result, one has Q/n = 4n1/2π + OP(1), which means Q/n is concentrated around 4πn1/2, with a constant order fluctuation.
Remark 2. From the second statement in Theorem 1, we know that Q can be decomposed into two weakly dependent parts. The first part is asymptotically a centred normal, whereas the second part can be characterized by a shifted and scaled Tracy–Widom random variable. In particular, according to the characterization in (5), the standardized statistic can be decomposed as
| (6) |
where . Hence the contribution to the asymptotic variance from Bn diminishes as n → ∞, at the rate of n−1/6. In other words, the term An is responsible for the asymptotic variance whereas the term Bn only contributes to the asymptotic mean of Q.
The proof of the above theorem relies on the key observation that
| (7) |
where the two terms can be treated separately. In fact, by setting and , the statement (5) in Theorem 1 can be proved by carefully analysing the joint distribution of the Gaussian orthogonal ensemble eigenvalues, eigenvectors and their functionals. In particular, to show the asymptotic normality of An/n, we adopted several technical tools including the Haar measure on the orthogonal group O(n), a Berry–Esseen bound for exchangeable pairs of random vectors, the semicircle law and the eigenvalue rigidity result for Wigner matrices. We leave the detailed proof of Theorem 1 to the Appendix.
2·2. Second-order correction using convolution
Practically, as the variance contribution from Bn diminishes at a very slow rate, it could be far from precise to use N{0, 2(1 − 2/π)2} as an approximation of the empirical distribution of . Instead, by Theorem 1, we suggest taking the n−1/6 order term into account and using the convolution of independent normal N{0, 2(1 − 2/π)2} and rescaled Tracy-Widom distribution , as a finite sample approximation of the limiting distribution. Hereafter we denote the cumulative distribution function of the convolution as F. The empirical performance of such convolutional approximation is assessed in §3.
Moreover, in Table 1, we numerically evaluate the correlation between An and Bn. The vanishing correlation in this case provides another justification of our use of convolution for the second-order approximation.
Table 1.
Empirical correlation between An and Bn. Each correlation is estimated over 105 iterations
| n | 50 | 100 | 500 | 1000 |
|---|---|---|---|---|
| cor(An, Bn) | 0·021 | 0·016 | 0·005 | 0·003 |
2·3. Universality implied by random matrix theory
Although the limit distribution (3) of Theorem 1 was proven under the standard Gaussian orthogonal ensemble setting, the analysis only relies on the joint distribution of the eigenvalues and the eigenvectors of the Gaussian orthogonal ensemble, as yielded by the proof in the Appendix. In this section, we discuss the potential universality of our results, or its generalizability to other matrix ensembles.
In connection to the recent achievements in random matrix theory, it has been shown that the asymptotic behaviour of the eigenvalues and eigenvectors of many important classes of random matrices are the same as those of the Gaussian orthogonal ensemble. For example, the well-known semicircle law has been obtained for sample covariance matrices (Bai & Yin, 1988), sample correlation matrices (Jiang, 2004b), Erdős-Rényi graphs (Erdős et al., 2013), random regular graphs (Bauerschmidt et al., 2017), generalized Wigner matrices (Tao & Vu, 2010; Erdős et al., 2012b) and deformed Wigner matrices (Knowles & Yin, 2013b); universality results for eigenvectors have been obtained for generalized Wigner matrices (Tao & Vu, 2011; Knowles & Yin, 2013a; Bourgade & Yau, 2017) and, more recently, for sample covariance matrices (Bloemendal et al., 2016; Ding, 2019).
For the matrix ensembles whose spectral behaviour deviates significantly from those of the Gaussian orthogonal ensemble, we admit that the same limit distribution (3) would not hold in general. For example, when p/n → γ ∈ (0, ∞), it is well known that the limiting eigenvalue distribution for the sample covariance matrices is a nonsymmetric Marcenko–Pastur law (Marchenko & Pastur, 1967). In this case, (3) become questionable as some of the calculations, such as Equation (A6) in Appendix, will no longer hold. However, we do want to emphasize that the analytical framework developed in this paper is generic, and can be applied to derive the asymptotic distributions under other settings, although the calculation of some relevant quantities, such as those paralleling Lemma A1, A2, and A4, might be technically challenging.
2·4. Comparison with λ1(W)
In addition to the modularity statistic studied in this paper, some other statistics have been proposed for the purpose of community detection, especially the largest eigenvalue λ1(W) (Bickel & Sarkar, 2016; Lei, 2016). The distribution of the largest eigenvalue of the Wigner matrix is well understood. Tracy & Widom (1994) first derived this distribution, and given the ostensibly prominent role that λ1 plays in Q, some have used the close relationship between λ1 and Q in order to test for the presence of community structure in networks (Bickel & Sarkar, 2016; Lei, 2016). Here we investigate how close of a proxy λ1 is to Q to see if this approximation is justified. To evaluate this, we consider the correlation cor(Q/n, n1/6λ1). The following theorem provides a negative answer by showing the asymptotic uncorrelatedness between Q/n and n1/6λ1.
Theorem 2.
Under the condition of Theorem 1, it holds that, for any ϵ > 0,
| (8) |
In Fig. 1, we show the scatter-plots of n1/6λ1 and Q/n for various n, based on 10000 simulations from a standard Wigner matrix as defined in § 2·1. As a result, a clear decrease in the empirical correlation can be observed as n increases, indicating the poor asymptotic approximation of Q by λ1.
Fig. 1.
Relationship between modularity and the largest eigenvalue of the modularity matrix. Standard Wigner matrices were generated 10000 times for each n to produce the scatter-plots and corresponding correlation estimates.
2·5. Statistical power for community detection
Our next result concerns the statistical power of the test based on the normalized modularity and its limiting distribution obtained under the Gaussian orthogonal ensemble null. Specifically, for a given signed edge weight matrix W, we calculate its first eigenvector u1, and reject the null hypothesis whenever for some desired level α ∈ (0, 1). Naturally, one could replace Φ−1(1 − α) by F−1(1 − α) for better finite sample performance. For the alternative model, we consider the following deformed Gaussian orthogonal ensemble model where the signed edge weight matrix is symmetric, with Θ incorporating the underlying community structure and Z being a standard Wigner matrix. Examples of Θ include blockwise constant matrices or block-wise diagonal matrices extensively studied under the stochastic block models (Lei et al., 2015; Zhang & Zhou, 2016; Hu et al., 2020), and some general low rank matrices commonly considered for studying the spectral clustering algorithms (Lu & Zhou, 2016; Löffler et al., 2019).
Theorem 3.
Suppose , where Z is a standard Wigner matrix and Θ is some fixed symmetric matrix. Then, as long as λ1(Θ) ≥ C0√n and λn(Θ) > −C1√n for some universal constants C0, C1 > 0, we have and .
Remark 3. The alternative model considered in Theorem 3 covers a wide range of scenarios where community structure is present in the network edge weights. In particular, the theorem only requires the matrix Θ to have sufficiently large global signal to be detectable from the noisy observations, and there is no need to specify the community structure incorporated in Θ.
3. Simulations
3·1. Empirical quantile assessment
In this section, we conduct simulation studies to empirically evaluate our derived limiting distribution in the previous section. We generate the Gaussian orthogonal ensemble matrices whose dimension n varies from 50 to 5000, and calculate their modularities defined by (1). We compare the empirical distributions of the standardized statistic in (6) based on simulated modularities against their theoretical quantiles qα corresponding to probabilities α varying from 0·01 to 0·99. Specifically, we evaluate the tail probabilities of the standardized statistic at qα for different n and cutoffs α. For the theoretical distribution, we consider two distributions, namely, the normal distribution N{0,2(1 − 2/π)2} obtained in Theorem 1, and its second-order correction F. In particular, in the latter case, the theoretical quantiles are obtained numerically by generating 100000 samples. Table 2 and Table 3 show the empirical quantiles based on 100000 rounds of simulations. Comparing these two tables, it is clear that the convolution F provides a better approximation of the empirical distribution.
Table 2.
Empirical probabilities at N {0, 2(1 − 2/π)2} quantiles: the Gaussian orthogonal ensemble case
| n | 50 | 100 | 500 | 1000 | 2000 | 5000 |
|---|---|---|---|---|---|---|
| α = 0·01 | 0·0061 | 0·0054 | 0·0059 | 0·0056 | 0·0056 | 0·0057 |
| α = 0·05 | 0·0210 | 0·0226 | 0·0251 | 0·0260 | 0·0268 | 0·0283 |
| α = 0·25 | 0·0914 | 0·1042 | 0·1295 | 0·1364 | 0·1470 | 0·1602 |
| α = 0·50 | 0·2033 | 0·2313 | 0·2927 | 0·3104 | 0·3322 | 0·3564 |
| α = 0·75 | 0·3745 | 0·4221 | 0·5162 | 0·5469 | 0·5743 | 0·6054 |
| α = 0·95 | 0·6668 | 0·7231 | 0·8134 | 0·8397 | 0·8588 | 0·8808 |
| α = 0·99 | 0·8337 | 0·8758 | 0·9326 | 0·9455 | 0·9551 | 0·9652 |
Table 3.
Empirical probabilities at F quantiles: the Gaussian orthogonal ensemble case
| n | 50 | 100 | 500 | 1000 | 2000 | 5000 |
|---|---|---|---|---|---|---|
| α = 0·01 | 0·0191 | 0·0154 | 0·0125 | 0·0128 | 0·0103 | 0·0114 |
| α = 0·05 | 0·0813 | 0·0709 | 0·0580 | 0·0570 | 0·0534 | 0·0526 |
| α = 0·25 | 0·1835 | 0·2016 | 0·2293 | 0·2350 | 0·2353 | 0·2427 |
| α = 0·50 | 0·4078 | 0·4314 | 0·4689 | 0·4804 | 0·4886 | 0·4889 |
| α = 0·75 | 0·6720 | 0·6922 | 0·7257 | 0·7320 | 0·7464 | 0·7405 |
| α = 0·95 | 0·9179 | 0·9286 | 0·9432 | 0·9448 | 0·9469 | 0·9482 |
| α = 0·99 | 0·9808 | 0·9843 | 0·9891 | 0·9882 | 0·9889 | 0·9888 |
3·2. Universality of the limit distribution
In this section, we empirically evaluate the validity of our theoretical limit distribution as well as its second-order approximation under some non-Gaussian orthogonal ensemble matrix ensembles. In particular, as of both theoretical and practical interest, we consider (i) symmetric random matrices with heavy-tailed, nonsymmetric distributions such as the exponential distribution, Exp(1); (ii) the adjacency matrix of sparse Erdős–Rényi random graph (Erdős et al., 2012a, 2013) with p = n−1/4; and (iii) the sample correlation matrix Rn of N independent observations from N(0, In) with N = n5/2. In case (i) and (ii), the entries of the random matrices are normalized to match the first two moments of Gaussian orthogonal ensemble. In case (iii), the modularity is calculated from the normalized matrix N1/2(Rn − In). In Tables 4–6, we show the empirical tail probabilities evaluated at different quantiles of the convolution F. The results concerning tail probabilities evaluated at the quantiles of N{0, 2(1 − 2/π)2} are put in the Supplementary Material. Our numerical results suggest the universality of our limit distribution as well as its second-order approximation over a wide range of non-Gaussian orthogonal ensemble random matrices, which implies its strong potential for practical applications beyond the Gaussian orthogonal ensemble setting.
Table 4.
Empirical probabilities at F quantiles: heavy-tailed nonsymmetric distribution Exp(1)
| n | 50 | 100 | 500 | 1000 | 2000 | 5000 |
|---|---|---|---|---|---|---|
| α = 0·01 | 0·0884 | 0·0796 | 0·0253 | 0·0197 | 0·0155 | 0·0130 |
| α = 0·05 | 0·1900 | 0·1839 | 0·0957 | 0·0795 | 0·0689 | 0·0616 |
| α = 0·25 | 0·4387 | 0·4572 | 0·3513 | 0·3204 | 0·3002 | 0·2800 |
| α = 0·50 | 0·6479 | 0·6790 | 0·6086 | 0·5825 | 0·5570 | 0·5338 |
| α = 0·75 | 0·8271 | 0·8540 | 0·8236 | 0·8078 | 0·7920 | 0·7773 |
| α = 0·95 | 0·9615 | 0·9732 | 0·9689 | 0·9666 | 0·9619 | 0·9595 |
| α = 0·99 | 0·9912 | 0·9950 | 0·9948 | 0·9937 | 0·9928 | 0·9920 |
Table 6.
Empirical probabilities at F quantiles: sample correlation matrix (N = n5/2)
| n | 20 | 50 | 75 | 100 | 150 | 200 |
|---|---|---|---|---|---|---|
| α = 0·01 | 0·0033 | 0·0068 | 0·0059 | 0·0073 | 0·0076 | 0·0083 |
| α = 0·05 | 0·0176 | 0·0310 | 0·0343 | 0·0383 | 0·0480 | 0·0465 |
| α = 0·25 | 0·1280 | 0·1791 | 0·2075 | 0·2186 | 0·2356 | 0·2373 |
| α = 0·50 | 0·3289 | 0·4156 | 0·4561 | 0·4648 | 0·4880 | 0·4905 |
| α = 0·75 | 0·6164 | 0·6914 | 0·7219 | 0·7308 | 0·7479 | 0·7550 |
| α = 0·95 | 0·9197 | 0·9364 | 0·9450 | 0·9491 | 0·9532 | 0·9516 |
| α = 0·99 | 0·9829 | 0·9872 | 0·9912 | 0·9896 | 0·9919 | 0·9908 |
3·3. Empirical power assessment
Under certain alternative models for weighted signed networks that suggest community structure, we numerically assess and compare powers of the modularity based tests and some other methods for community detection. Specifically, we consider the following deformed/spiked Gaussian orthogonal ensemble model where the signed edge weight matrix is symmetric, with , u ∈ Sn−1, D a diagonal matrix, and Z a standard Wigner matrix. In particular, for each n ∈ {100, 200, 300, 400, 500, 600}, we set β = √n, and set u such that its first n/2 coordinates are n−1/2 and the rest of the coordinates are −n−1/2. This implies two clusters of nodes of equal size, where the within-group and cross-group edge weights are two distinct values. The diagonal entries of D are randomly generated from [−√n, √n], to increase heterogeneity. Table 7 shows the empirical powers of: (i) Modularity Test I, the test based on the normalized modularity and its Gaussian limiting distribution in Theorem 1; (ii) Modularity Test II, the test based on the normalized modularity and the convolutional approximation F, whose quantiles are obtained numerically as in previous sections; (iii) Largest Eigenvalue Test, the test based on λ1(W) and its Tracy–Widom limiting distribution (Johnstone & Ma, 2012), and (iv) Entrywise Maximum Test, the test based on the entrywise maxima max1≤i≠j≤n |{cov(W)}ij| and its Gumbel limiting distribution (Jiang, 2004a; Hu et al., 2020). The details of the Largest Eigenvalue Test and the Entrywise Maximum Test and their asymptotic validity under the null model are demonstrated in the Supplementary Material. The empirical powers of these methods at level α = 0.05 are calculated from 100000 rounds of simulations. From Table 7, we find that the Modularity Tests I and II are more powerful than the Largest Eigenvalue Test and the Entrywise Maximum Test when n is large (n ≥ 400), while the Entrywise Maximum Test is more powerful for smaller n.
Table 7.
Empirical powers of four different methods at level α = 0.05
| n | 50 | 100 | 200 | 400 | 600 | 800 |
|---|---|---|---|---|---|---|
| Modularity Test I | 0·3791 | 0·4779 | 0·5819 | 0·6653 | 0·7106 | 0·7358 |
| Modularity Test II | 0·4432 | 0·5569 | 0·6493 | 0·7164 | 0·7525 | 0·7793 |
| Largest Eigenvalue Test | 0·5471 | 0·5804 | 0·6159 | 0·6510 | 0·6788 | 0·6952 |
| Entrywise Maximum Test | 0·7686 | 0·7108 | 0·6655 | 0·6206 | 0·5977 | 0·5969 |
4. Real data analysis
4·1. Analysis of US congressional voting networks
Annual voting records for individuals in the U.S. house of representatives provide a commonly used example of a highly modular network, where nodes stand for representatives and edge weights correspond with the correlation between the voting records of pairs of representatives. Recently, evidence of increased partisan polarization has been observed based on increased modularity in more recent annual voting networks (Neal, 2018). We let W be a centred and scaled correlation matrix with zeroes in the diagonal based on the 1984 congressional voting records. We removed congressmen with more than 50% votes unrecorded for the year for a total of n = 431 congressmen in our network. Letting sgn(u1) determine community membership, representatives were strongly divided based on party affiliation, with 96·9% Democrat and 3·1% Republican membership in one community, and 77·6% Republican and 22·4% Democrat membership in the other community. Modularity was very large, with , which based on Theorem 1 provides overwhelming statistical evidence of community structure.
Given the nature of partisan politics, it is unsurprising that there was strong evidence to reject a null hypothesis of no community structure in congress. Thus we also explore the less obvious question of whether there are additional communities beyond the Republican and Democrat divide. By restricting the data to the 205 congressmen in the Republican-dominated community, namely, 77·6% republican, we recentred and rescaled weights over this subset and applied Theorem 1 again. There again was overwhelming statistical evidence of additional community structure as . The largest eigenvalue and the entrywise maximum-based tests led to consistent conclusions with p-values less than 1·0 × 10−4 in both cases. Therefore, this Republican dominated subset divides into two additional communities: one community with a majority, 58·7%, of Democrats, and another with a large majority, 88·1%, of Republicans. This is evidence of a substantial subset of moderate Democrats that, despite being initially clustered with Republicans based on their voting record, also demonstrated sufficient differences from the Republicans to warrant belonging to a separate and distinct community.
4·2. Network structure of the human cranium
Morphological networks of the human cranium define nodes to be anatomically defined measurements between landmark points on the cranium of a particular individual. Edge weights are defined by Pearson correlations between cranial measurements for each pair of landmarks. In Fig. 2, the corresponding correlation network demonstrates blocks of cranial landmarks with nested correlation structure, such as what can be observed for landmarks 1 through 24. Due to different cranial landmarks developing simultaneously on the cranium for each individual, and therefore subject to the same environmental factors throughout development, this nested structure is an expected feature of this morphological network. Network nestedness occurs when interactions of less connected nodes form proper subsets of the interactions of more connected nodes. Modularity is a type of nestedness where there is no distinct heirarchical structure separating nodes with a low degree from nodes with a high degree within a community, and so modularity can be interpreted as an intermediate form of nestedness.
Fig. 2.
Correlation network of landmark measurements of the human cranium. Cranial landmarks are discrete anatomical points that are homologous across humans. A sample of 1367 male crania were to used to calculate the Pearson correlations between each pair of 44 cranial landmark measurements.
Cantor et al. (2017) constructed morphological networks of the human crania using 1367 males to calculate correlations between each pair of 44 different landmark measurements. They found significant statistical evidence of nestedness in the resulting correlation network. We apply Theorem 1 to this network after proper normalization and find that there is overwhelming statistical evidence of community structure . Similarly, tests for the same null hypothesis based on the asymptotic distributions of the first eigenvalue and the entrywise maximum both lead to the same conclusion with both p-values less than 1·0 × 10−4. This implies that human crania tend to contain clusters of landmarks, likely spatially close to one another, that grow together in parallel throughout development.
5. Discussion
Our numerical results show that, although having a significant improvement upon the original limit distribution, the second-order approximation seems still insufficient for applications with small sample sizes. Hence, it would be interesting to find some more accurate higher-order approximation for the limit distribution in Theorem 1.
In Reichardt & Bornholdt (2006) and Fortunato & Barthelemy (2007), it was shown that the modularity defined as in (1) has its own limits, such as it is unable to find community structure in networks with many small communities. To address the issue, Arenas et al. (2008) proposed a generalized modularity which includes a resolution parameter. Consequently, it would also be of interest to extend our analysis to the generalized modularity (Newman, 2016).
In §2.1, due to complicated dependence structure between the error term and the first order fluctuation An/n, our current analytical framework can only lead us to the limiting distribution of the normalized . We admit this is mainly due to the limitation of our technical tools, and, in light of Remark 1, it is of interest whether a direct limiting distribution for n−1(Q − n3/24/π) can be obtained. Some numerical comparisons of n−1Q, and n−1(Q − n3/24/π) are presented in our Supplementary Material, which suggest that a test based on n−1(Q − n3/24/π) could be more powerful against certain alternatives. We leave the more rigorous theoretical investigations for future research.
Supplementary Material
Table 5.
Empirical probabilities at F quantiles: sparse Erdős-Rényi random graph (p = n−1/4)
| n | 50 | 100 | 500 | 1000 | 2000 | 5000 |
|---|---|---|---|---|---|---|
| α = 0·01 | 0·0006 | 0·0015 | 0·0067 | 0·0094 | 0·0110 | 0·0128 |
| α = 0·05 | 0·0059 | 0·0121 | 0·0374 | 0·0465 | 0·0557 | 0·0606 |
| α = 0·25 | 0·0681 | 0·1085 | 0·2093 | 0·2424 | 0·2646 | 0·2784 |
| α = 0·50 | 0·2261 | 0·3035 | 0·4464 | 0·4921 | 0·5181 | 0·5344 |
| α = 0·75 | 0·4958 | 0·5776 | 0·7069 | 0·7403 | 0·7648 | 0·7759 |
| α = 0·95 | 0·8524 | 0·8899 | 0·9354 | 0·9458 | 0·9541 | 0·9562 |
| α = 0·99 | 0·9633 | 0·9733 | 0·9870 | 0·9894 | 0·9913 | 0·9913 |
Acknowledgement
The authors are grateful to the editor, associate editor and two referees for their comments and suggestions that significantly improved the quality of the manuscript. This research was supported by the National Institute of Mental Health of the National Institutes of Health (R01MH116884 (IB)).
Appendix
Throughout, for sequences {an} and {bn}, we write an = o(bn) (or an = oP(bn)) if limn an/bn = 0 (in probability), and write an = O(bn) (or an = OP(bn)), an ≲ bn or bn ≳ an if there exists a constant C such that an ≤ Cbn for all n (in probability). We write if an ≲ bn and an ≳ bn. For a set A, we denote |A| as its cardinality. Lastly, C,C0,C1, … are constants that may vary from place to place.
In the following, we prove Theorem 1 in the main paper. The proofs of other theorem and some technical lemmas are collected in the Supplementary Material.
Proof of Theorem 1.
We first recall some important results concerning the eigenvectors and the eigenvalues of the Gaussian orthogonal ensemble. Specifically, the eigenvectors u1(W), …, un(W) are uniformly distributed on the half-sphere , and the joint distribution of (u1(W), …, un(W)) is the Haar measure on the orthogonal group O(n), with each column multiplied by −1 or 1 so that the columns all belong to (O’Rourke et al., 2016). An immediate consequence is the following proposition characterizing the joint distribution of the eigenvectors of W.
Proposition A1.
Let v be a random vector uniformly distributed on Sn−1. Then v has the same distribution as where ξ1, …, ξn are independently and identically drawn from N(0, 1).
Another well-known fact related to the modularity under the Gaussian orthogonal ensemble is the limiting distribution of its largest eigenvalue λ1(W), derived in the seminal works of Tracy and Widom (Tracy & Widom, 1994, 1996).
Theorem A4 (The Tracy–Widom Law).
Let λ1(W) denote the largest eigenvalue of W where W is a sample from the Gaussian orthogonal ensemble with dimension n × n, then pr{n1/6(λ1(W) − 2n1/2) ≤ s} → F(s), where F(s) denotes the Tracy–Widom distribution.
Proof of Theorem 1.
By definition and eigen-decomposition of W, we have
| (A1) |
The proof is separated into three parts. Firstly, we show that Bn/n in (A1), after proper centring and scaling, converges weakly to a Tracy-Widom distribution. Secondly, we show that An/n is asymptotically normal. Finally, we deal with the covariance between the two terms.
Part I.
By Proposition 1, we know that u1(W),u2(W), …, un(W) are independent and have the same distribution as
| (A2) |
where ξ1, …, ξn are independently and identically distributed standard normal random variables. Therefore,
On the one hand, are independent χ2 random variables, which satisfy sub-exponential tail bound. By standard concentration inequality for sub-exponential random variables such as Proposition 5.16 in Vershynin (2010), we have, for any ϵ > 0
for some constant c > 0. On the other hand, standard concentration inequality for sub-Gaussian random variables yields, for any ϵ > 0,
for some constant c > 0. Thus with probability at least 1 – O(ϵc) for some c > 0,
| (A3) |
By Theorem A3, we have
| (A4) |
In other words, .
Part II.
We denote the second term in (A1) as
Denote γj for j = 1, …, n, as the classical location of the jth eigenvalue, scaled by n1/2, under the semicircle law ordered in increasing order. In other words,
where ρsc(x) = (2π)−1√(4 − x2)+ is the semicircle law. Define
| (A5) |
In what follows, we show that Ω0 is asymptotically normal with variance 2(1 − 2/π)2, and then conclude by verifying |Ω0 − An/n| → 0 in probability.
Asymptotic normality of Ω0.
The proof of asymptotic normality depends on the following key observations about a single .
Lemma A1.
Suppose (u1, …, un) has a Haar measure on orthogonal group O(n). Then for any i = 2, …, n, it holds that . In particular, we have , where are drawn independently from N(0, σ2) and σ2 = 1 − 2/π + o(1).
Our next result concerns the relation between two elements and where i, j ∈ {2, …, n},i ≠ j. In particular, we show that is an isotropic vector.
Lemma A2.
Suppose (u1, …, un) has a Haar measure on orthogonal group O(n). Then for any i, j ∈ {2, …, n} with i ≠ j, it holds that and .
Now without loss of generality we assume n is even, namely, n = 2m for some integer m > 0. Define
and
Hence
Set for i = 2, …, m. It is easy to check
| (A6) |
suing Lemma A2, the exchangeable property of the Haar measure on O(n) and the symmetry γi = −γn−i+1. The asymptotic normality of Ω0 can be obtained from the following central limit theorem for the symmetric isotropic random vectors and the fact that, by Lemma A1, .
Lemma A3.
Suppose that has a distribution that is invariant under reflections in the coordinate hyperplanes and
for i, j ∈ {1, …, n} and i 6= j. Let θ = (θ1, …, θn) ∈ Sn−1 be a fixed vector and . Then
Lemma A4.
For all i = 2, …, m, it holds that . For any fixed i, j ∈ {2, …, m} and i ≠ j, it holds that .
Let Ω0 = θ⊤α + O(n−1/2), where θ = (1/√m, …, 1/√m)⊤ and α = (α2, …, αm)⊤. Denote . It then follows that . Combining Lemma A3 and A4, we have
for some constant C > 0. Now note that
| (A7) |
and using Lemma A4
it follows that
Note that (A7) holds, by Slutsky’s theorem, we have
| (A8) |
Asymptotic normality of An/n.
We need the following results obtained by Erdős et al. (2012b).
Lemma A5 (Rigidity of Eigenvalues).
For generalized Wigner matrices, if γj is the classical location of the jth eigenvalue under the semicircle law ordered in increasing order, then the scaled j-th eigenvalue λj/n1/2 is close to γj in the sense that for some positive constants C,c
for sufficiently large n.
As a consequence, for any j = 1, …, n, we have
for any small δ > 0. In other words, the eigenvalue is near its classical location with an error of at most N−1(log n)C log log n for generalized Wigner matrices in the bulk and the estimate deteriorates by a factor (n/j)1/3 near the edge j ≪ n. As a consequence, for any sufficiently small δ > 0,
Note that , we have |An/n − Ω0| → 0 in probability. So the asymptotic normality of Ω follows from Slutsky’s theorem and (A8).
Part III.
By definition, we have
It suffices to control for any i = 2, …, n. Now we define
Lemma A6.
Under the conditions of Theorem 1, for any small constant ϵ > 0, it holds that and .
Applying Lemma A6 to the above equation, we complete the third part of our proof.
Footnotes
Supplementary material
Supplementary material available at Biometrika online includes proofs of the other theorems and technical lemmas, as well as some supplementary tables and figures.
References
- Agarwal G & Kempe D (2008). Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66, 409–418. [Google Scholar]
- Arenas A, Fernandez A & Gomez S (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics 10, 053039. [Google Scholar]
- Bai Z-D & Yin Y-Q (1988). Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a wigner matrix. The Annals of Probability, 1729–1741. [Google Scholar]
- Bauerschmidt R, Knowles A & Yau H-T (2017). Local semicircle law for random regular graphs. Communications on Pure and Applied Mathematics 70, 1898–1960. [Google Scholar]
- Bickel PJ & Sarkar P (2016). Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 253–273. [Google Scholar]
- Bloemendal A, Knowles A, Yau H-T & Yin J (2016). On the principal components of sample covariance matrices. Probability Theory and Related Fields 164, 459–552. [Google Scholar]
- Boccaletti S, Latora V, Moreno Y, Chavez M & Hwang D-U (2006). Complex networks: Structure and dynamics. Physics Reports 424, 175–308. [Google Scholar]
- Bourgade P & Yau H-T (2017). The eigenvector moment flow and local quantum unique ergodicity. Communications in Mathematical Physics 350, 231–278. [Google Scholar]
- Bullmore E & Sporns O (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10, 186. [DOI] [PubMed] [Google Scholar]
- Cantor M, Pires MM, Marquitti FM, Raimundo RL, Sebastián-González E, Coltri PP, Perez SI, Barneche DR, Brandt DY, Nunes K et al. (2017). Nestedness across biological scales. PloS one 12, e0171691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen M, Kuzmin K & Szymanski BK (2014). Community detection via maximization of modularity and its variants. IEEE Transactions on Computational Social Systems 1, 46–65. [Google Scholar]
- Chi KT, Liu J & Lau FC (2010). A network perspective of the stock market. Journal of Empirical Finance 17, 659–667. [Google Scholar]
- Ding X (2019). Singular vector distribution of sample covariance matrices. Advances in Applied Probability 51, 236–267. [Google Scholar]
- Dwyer DB, Harrison BJ, Yücel M, Whittle S, Zalesky A, Pantelis C, Allen NB & Fornito A (2014). Large-scale brain network dynamics supporting adolescent cognitive control. Journal of Neuroscience 34, 14096–14107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erdős L, Knowles A, Yau H-T & Yin J (2012a). Spectral statistics of erdős-rényi graphs ii: Eigenvalue spacing and the extreme eigenvalues. Communications in Mathematical Physics 314, 587–640. [Google Scholar]
- Erdős L, Knowles A, Yau H-T, Yin J et al. (2013). Spectral statistics of erdős–rényi graphs i: local semicircle law. The Annals of Probability 41, 2279–2375. [Google Scholar]
- Erdős L, Yau H-T & Yin J (2012b). Rigidity of eigenvalues of generalized wigner matrices. Advances in Mathematics 229, 1435–1515. [Google Scholar]
- Fortunato S & Barthelemy M (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 36–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Good BH, De Montjoye Y-A & Clauset A (2010). Performance of modularity maximization in practical contexts. Physical Review E 81, 046106. [DOI] [PubMed] [Google Scholar]
- Hu J, Zhang J, Qin H, Yan T & Zhu J (2020). Using maximum entry-wise deviation to test the goodness-of-fit for stochastic block models. Journal of the American Statistical Association, 1–30. [Google Scholar]
- Jiang T (2004a). The asymptotic distributions of the largest entries of sample correlation matrices. The Annals of Applied Probability 14, 865–880. [Google Scholar]
- Jiang T (2004b). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā: The Indian Journal of Statistics, 35–48. [Google Scholar]
- Johnstone IM & Ma Z (2012). Fast approach to the Tracy-Widom law at the edge of GOE and GUE. The Annals of Applied Probability 22, 1962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knowles A & Yin J (2013a). Eigenvector distribution of wigner matrices. Probability Theory and Related Fields 155, 543–582. [Google Scholar]
- Knowles A & Yin J (2013b). The isotropic semicircle law and deformation of wigner matrices. Communications on Pure and Applied Mathematics 66, 1663–1749. [Google Scholar]
- Lancichinetti A & Fortunato S (2009). Community detection algorithms: a comparative analysis. Physical review E 80, 056117. [DOI] [PubMed] [Google Scholar]
- Langfelder P & Horvath S (2008). Wgcna: an r package for weighted correlation network analysis. BMC bioinformatics 9, 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lei J (2016). A goodness-of-fit test for stochastic block models. The Annals of Statistics 44, 401–424. [Google Scholar]
- Lei J, Rinaldo A et al. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics 43, 215–237. [Google Scholar]
- Lichoti JK, Davies J, Kitala PM, Githigia SM, Okoth E, Maru Y, Bukachi SA & Bishop RP (2016). Social network analysis provides insights into african swine fever epidemiology. Preventive veterinary medicine 126, 1–10. [DOI] [PubMed] [Google Scholar]
- Löffler M, Zhang AY & Zhou HH (2019). Optimality of spectral clustering for gaussian mixture model. arXiv preprint arXiv:1911.00538. [Google Scholar]
- Lu Y & Zhou HH (2016). Statistical and computational guarantees of lloyd’s algorithm and its variants. arXiv preprint arXiv:1612.02099. [Google Scholar]
- Marchenko VA & Pastur LA (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik 114, 507–536. [Google Scholar]
- Neal ZP (2018). A sign of the times? weak and strong polarization in the us congress, 1973–2016. Social Networks. [Google Scholar]
- Newman ME (2006a). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104. [DOI] [PubMed] [Google Scholar]
- Newman ME (2006b). Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman ME (2016). Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E 94, 052315. [DOI] [PubMed] [Google Scholar]
- Newman ME & Girvan M (2004). Finding and evaluating community structure in networks. Physical Review E 69, 026113. [DOI] [PubMed] [Google Scholar]
- O’Rourke S, Vu V & Wang K (2016). Eigenvectors of random matrices: a survey. Journal of Combinatorial Theory, Series A 144, 361–442. [Google Scholar]
- Reichardt J & Bornholdt S (2006). When are networks truly modular? Physica D: Nonlinear Phenomena 224, 20–26. [Google Scholar]
- Rizkallah J, Benquet P, Wendling F, Khalil M, Mheich A, Dufor O & Hassan M (2016). Brain network modules of meaningful and meaningless objects. In Biomedical Engineering (MECBME), 2016 3rd Middle East Conference on. IEEE. [Google Scholar]
- Springer A, Kappeler PM & Nunn CL (2017). Dynamic vs. static social networks in models of parasite transmission: predicting c ryptosporidium spread in wild lemurs. Journal of Animal Ecology 86, 419–433. [DOI] [PubMed] [Google Scholar]
- Tao T & Vu V (2010). Random matrices: Localization of the eigenvalues and the necessity of four moments. arXiv preprint arXiv:1005.2901. [Google Scholar]
- Tao T & Vu V (2011). Random matrices: universality of local eigenvalue statistics. Acta mathematica 206, 127. [Google Scholar]
- Telesford QK, Lynall M-E, Vettel J, Miller MB, Grafton ST & Bassett DS (2016). Detection of functional brain network reconfiguration during task-driven cognitive states. NeuroImage 142, 198–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traag VA & Bruggeman J (2009). Community detection in networks with positive and negative links. Physical Review E 80, 036115. [DOI] [PubMed] [Google Scholar]
- Tracy CA & Widom H (1994). Level-spacing distributions and the airy kernel. Communications in Mathematical Physics 159, 151–174. [Google Scholar]
- Tracy CA & Widom H (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics 177, 727–754. [Google Scholar]
- Vershynin R (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027. [Google Scholar]
- Wasserman S & Faust K (1994). Social network analysis: Methods and applications, vol. 8. Cambridge university press. [Google Scholar]
- Zhang AY & Zhou HH (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics 44, 2252–2280. [Google Scholar]
- Zhang J & Chen Y (2016). A hypothesis testing framework for modularity based network community detection. Statistica Sinica 27, 437–456. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


