Super-Resolution Community Detection for Layer-Aggregated Multilayer Networks

Dane Taylor; Rajmonda S Caceres; Peter J Mucha

doi:10.1103/PhysRevX.7.031056

. Author manuscript; available in PMC: 2018 Feb 12.

Published in final edited form as: Phys Rev X. 2017 Sep 26;7(3):031056. doi: 10.1103/PhysRevX.7.031056

Super-Resolution Community Detection for Layer-Aggregated Multilayer Networks

Dane Taylor ^1,^2,^*, Rajmonda S Caceres ³, Peter J Mucha ¹

PMCID: PMC5809009 NIHMSID: NIHMS918895 PMID: 29445565

Abstract

Applied network science often involves preprocessing network data before applying a network-analysis method, and there is typically a theoretical disconnect between these steps. For example, it is common to aggregate time-varying network data into windows prior to analysis, and the trade-offs of this preprocessing are not well understood. Focusing on the problem of detecting small communities in multilayer networks, we study the effects of layer aggregation by developing random-matrix theory for modularity matrices associated with layer-aggregated networks with N nodes and L layers, which are drawn from an ensemble of Erdős–Rényi networks with communities planted in subsets of layers. We study phase transitions in which eigenvectors localize onto communities (allowing their detection) and which occur for a given community provided its size surpasses a detectability limit K^*. When layers are aggregated via a summation, we obtain $K^{*} \propto O (\sqrt{N L} / T)$ , where T is the number of layers across which the community persists. Interestingly, if T is allowed to vary with L, then summation-based layer aggregation enhances small-community detection even if the community persists across a vanishing fraction of layers, provided that T/L decays more slowly than 𝒪(L^−1/2). Moreover, we find that thresholding the summation can, in some cases, cause K^* to decay exponentially, decreasing by orders of magnitude in a phenomenon we call super-resolution community detection. In other words, layer aggregation with thresholding is a nonlinear data filter enabling detection of communities that are otherwise too small to detect. Importantly, different thresholds generally enhance the detectability of communities having different properties, illustrating that community detection can be obscured if one analyzes network data using a single threshold.

Subject Areas: Complex Systems, Interdisciplinary Physics, Statistical Physics

I. INTRODUCTION

Network-based modeling provides a powerful framework for analyzing high-dimensional data sets and complex systems [1]. Often, a network is best represented by a set of network layers that encode different types of interactions, such as categorical social ties [2] or a network at different instances in time [3], and an important pursuit involves extending network theory to the multilayer setting [4,5]. Sometimes, however, a multilayer framework can require too much computational overhead or can represent an over-modeling (e.g., when the layers are correlated, either in terms of the edge overlap [6] or other properties [7–9]), and it can be beneficial to aggregate layers [9–11]. In particular, aggregation provides a crucial step for analyzing temporal network data, which is often binned into time windows [12,13] (see Fig. 1). Layer aggregation and other types of network preprocessing (e.g., sparsification [14], network inference [15], and denoising [16,17]) can greatly influence the resulting network structure, which in turn influences the outcomes of network analyses and their many applications. In general, there remains a significant need for improved theoretical understanding for how such network preprocessing influences network-analysis methodology.

FIG. 1 — Preprocessing networks (including multilayer representations of temporal networks) often involves aggregating network data into bins (or time windows). We study how many layers must contain a community in order for aggregation to enhance its detection and introduce layer aggregation with thresholding as a filter enabling super-resolution community detection.

We study the effects of layer aggregation on community detection, one of the widely used methods for studying social, biological, and physical networks [18–21]. Communities are typically studied as dense subgraphs and can represent, for example, coordinating neurons in the brain [13] or a social clique [22] in a social network. (Hereafter, we restrict our usage of the term “clique” to the graph-theoretical meaning of a subgraph with all-to-all coupling.) Of particular interest is the detection of small-scale communities, which is a paradigmatic pursuit for anomaly detection within the fields of signal processing and cybersecurity [23–28]. In this context, small communities can represent anomalous events such as attacks [23], intrusions [24], and fraud [25].

Given these and many other applications, there is great interest in understanding fundamental limitations on community detection [11,26–36]. We highlight recent detectability results for multilayer [10,11,37] and temporal networks [29]. It is worth noting that much of the detectability research has focused on large-scale communities whose sizes are 𝒪(N), where N is the number of nodes in the network [29–35], and the phase transitions are typically driven by varying the prevalence (e.g., edge density) of the communities. In contrast, detectability phase transitions for small communities can also be onset by varying their size K [11,26–28] and are thus a type of resolution limit [36]. We note that the literatures on detectability and resolution limits have developed independently, and there is need for a better understanding of the relationship between these topics. In particular, a planted clique in a single-layer Erdős-Rényi (ER) network is detectable via a spectral analysis only if its size K surpasses a detectability limit $K^{*} \propto O (\sqrt{N})$ [26], in which case, a dominant eigenvector (in this case, that corresponding to the second-largest eigenvalue of the adjacency matrix) localizes onto the clique. Extending previous research for the detectability of a clique planted in single-layer networks [26–28] and a clique that persists across all layers of a multilayer network [11], herein we study the detectability of small communities (including, but not limit to, cliques) planted in a subset of layers in a multilayer network.

With the application of detecting small communities in mind, we study the effects of layer aggregation as a network preprocessing step. We first ask a foundational question: Across how many layers must a community persist in order for layer aggregation to benefit detection. To this end, we study a multilayer network model in which small communities are hidden in network layers generated as ER networks with N nodes and L layers with (possibly) heterogeneous edge probabilities. We study detectability phase transitions wherein eigenvectors localize onto communities, which we analyze by developing random matrix theory for the eigenvectors of modularity matrices associated with an aggregation of the layers. When the aggregation is given by summation of the adjacency matrices, the detectability phase transition occurs when a community’s size K ≪ N surpasses a critical value $K^{*} \propto \sqrt{N L} / T$ , where T is the number of layers across which a community persists. Note that if T depends on L, then summation-based layer aggregation benefits small-community detection even if the fraction T/L of layers containing the community vanishes, provided that the fraction decays more slowly than 𝒪(L^−1/2).

We additionally study network preprocessing via thresholding—that is, we threshold a summation of layers’ adjacency matrices at some value L̃ so that there exists an unweighted edge between two nodes in the aggregated network if and only if there exists at least L̃ edges between them across the L layers. While it is well known that thresholding can be used to simultaneously sparsify and dichotomize a network, here we introduce thresholding as a nonlinear data filter [38] for enhancing small-community detection. Specifically, we find that thresholding can, in some cases, reduce K^* by orders of magnitude, revealing communities that are otherwise too small to detect. We call this phenomenon super-resolution community detection and show, for clique detection in sparse networks, that K^* decays exponentially with $\sqrt{L} / T$ for threshold L̃ = T. Importantly, we find that different thresholds enhance the detection of communities with different properties (e.g., size and edge density), illustrating how community structure can be obscured if one uses a single threshold, which is an important insight for network preprocessing in general.

The remainder of this paper is organized as follows. In Sec. II, we further specify our model. In Sec. III, we study the effects of layer aggregation on detectability phase transitions characterized by eigenvector localization. In Sec. IV, we highlight implications of our findings with a numerical experiment involving small-community detection in a temporal network. We provide a discussion in Sec. V

II. MODEL

A. Multilayer networks with planted small communities

We generate L network layers with N nodes so that each layer l ∈ {1,…, L} is an ER random graph with edge probability p_l ∈ (0, 1), which is allowed to vary across the layers. We plant R communities via the following process. For r ∈ {1…, R}, uniformly at random, we select a set 𝒯_r ⊂ {1,…, L} of layers and a set 𝒦_r ⊂ 𝒱 = {1,…, N} of nodes, and we define an edge probability ρ_r. The variable K_r = |𝒦_r| ≪ N denotes the size of community r, and we refer to T_r = |𝒯 _r| as its persistence across network layers. Then, for each r, we construct a dense subgraph between nodes 𝒦_r in layers 𝒯_r by first removing edges between them occurring under the ER model and creating new edges with probability ρ_r. To ensure that the communities are denser than the remaining network, we assume ρ_r > 〈p_l〉, where 〈·〉 denotes the mean value across all layers. We allow self-edges in both the ER model and the planted communities. We note that the layers are not required to have a particular ordering, and the community is not restricted only to consecutive layers. Moreover, we restrict our study to nonoverlapping communities by assuming that the communities involve different nodes so that 𝒦_r ∩ 𝒦_s = 0 for any r ≠ s. We leave open the study of eigenvector localization in the case of overlapping communities. Finally, we assume Σ_rK_r ≪ N so that only a small fraction of nodes are involved in communities, making them anomalous structures.

B. Layer-aggregation methods

We find that layer aggregation is a preprocessing step for multilayer networks that can be used to reduce data size and/or as a data filter to benefit network-analysis outcomes such as community detection. Following the approach in Ref. [10], we study two methods for aggregating layers of a multilayer network:

The summation network corresponds to the weighted adjacency matrix Ā = Σ_lA⁽^l⁾, where A⁽^l⁾ denotes the symmetric adjacency matrix encoding each network layer l ∈ {1,…, L}.
The family of thresholded networks represented by unweighted adjacency matrices {Â⁽^L̃⁾} are obtained by applying a threshold L̃ ∈ {1,…, L} to the entries {Ā_ij} of matrix Ā,
${\hat{A}}_{i j}^{(\tilde{L})} = {\begin{cases} 1 & if {\bar{A}}_{i j} \geq \tilde{L} \\ 0 & otherwise. \end{cases}$ (1)

Note that thresholding dichotomizes the network, and one can vary L̃ to tunably sparsify the network.

III. DETECTABILITY OF SMALL COMMUNITIES WITH EIGENVECTOR LOCALIZATION

We now develop random matrix theory to analyze how layer aggregation affects small-community detection. In Sec. III A, we present results for aggregation by summation, studying the fraction of layers that must contain a community in order for layer aggregation to enhance detection. In Sec. III B, we present results for layer aggregation with thresholding, highlighting that certain threshold values can yield super-resolution community detection.

A. Layer aggregation via summation

1. Random matrix theory for modularity matrices

We first describe the statistical properties of matrix entries {Ā_ij}. For edges (i, j)∈∉_r{𝒦_r × 𝒦_r}, {Ā_ij} are independent and identically distributed (i.i.d.) random variables following a Poisson binomial distribution, P(Ā_ij = a) = f_PB(a; L, {p_l}), where

f_{PB} (a; L, {p_{l}}) = \sum_{S \in S_{a}} \prod_{l \in S} p_{l} \prod_{m \in {1, \dots, L} \ S} (1 - p_{m}),

(2)

and 𝒮_a denotes the set of $(\begin{matrix} L \\ a \end{matrix})$ different subsets of layers {1,…, L} that have cardinality a (i.e., 𝒮₁ ={{1},{2},…}, 𝒮₂ = {{1, 2}, {1, 3},…}, and so on). We note that f_PB(a; L, {p_l}) has mean L〈p_l〉 and variance L〈p_l(1 − p_l)〉. When the edge probability is identical across the layers (i.e., p_l = p), then Eq. (2) simplifies to the binomial distribution,

f (a; L, p) = (\begin{matrix} L \\ a \end{matrix}) p^{a} {(1 - p)}^{P - a},

(3)

with mean Lp and variance Lp(1 − p).

For within-community edges (i, j) ∈ {𝒦_r × 𝒦_r} associated with community r, the entries {Ā_ij} are i.i.d. random variables following $f_{PB} (a; L, {q_{l}^{(r)}})$ , where $q_{l}^{(r)} = ρ_{r}$ for l ∈ 𝒯_r and otherwise $q_{l}^{(r)} = p_{l}$ . It follows that the entries have mean T_rρ_r +Σ_{l∈{1,…, L}\𝒯 _r}p_l and variance T_rρ_r(1−ρ_r)+Σ_{l∈{1,…, L}\𝒯 _r}p_l(1−p_l). Because the layers 𝒯_r are selected uniformly at random, the expected mean and variance across all possible choices for 𝒯_r are given by T_rρ_r + (L − T_r)〈p_l〉 and T_rρ_r(1 − ρ_r) + (L − T_r)〈p_l(1 − p_l)〉, respectively.

We now study the spectra of the modularity matrix [39],

\bar{B} = \bar{A} - L 〈 p_{i} 〉 1 1^{T},

(4)

based on an ER null model in which each edge has expected weight L〈p_i〉. Importantly, this null model does not use knowledge that edges (i, j) between nodes i, j ∈ 𝒦_r have different expected edge probability [i.e., T_rρ + (L − T_r)〈p_i〉 vs L〈p_i〉], which respects our assumption that it is unknown which nodes are in the hidden community. We note that one could also define the ER null model with the observed mean edge probability $L 〈 p_{i} 〉 + \sum_{r} [(K_{r}^{2} T_{r}) / N^{2} L] (ρ_{r} - 〈 p_{i} 〉)$ to account for the slight increase in overall edge probability due to the presence of small communities. However, this change does not affect the position of the dominant eigenvalues relative to the bulk, which is the relevant issue for community detectability, as we will see below. In particular, since $∣ (K_{r}^{2} T_{r}) / N^{2} L ∣ ≪ 1$ for each r, even the shift of the single associated eigenvalue within the bulk is negligible; therefore, we focus on the null model with expected edge weight L〈p_i〉.

We develop random matrix theory based on the analysis in Refs. [27,40]. To this end, we note that B̄ can be written in the form

\bar{B} = 〈 \bar{B} 〉 + X,

(5)

where

〈 \bar{B} 〉 = \sum_{r} θ_{r} u^{(r)} {(u^{(r)})}^{T}

(6)

is a rank-R matrix with eigenvalues given by

θ_{r} = T_{r} K_{r} (ρ_{r} - 〈 p_{l} 〉),

(7)

and {u⁽^r⁾} are normalized indicator vectors for the R communities that have entries

u_{i}^{(r)} = {\begin{cases} \sqrt{1 / K_{r}} & i \in K_{r} \\ 0 & otherwise. \end{cases}

(8)

The random matrix X has zero-mean entries X_ij with variance Tρ_r(1−ρ_r)+(L–T_r)〈p_l(1–p_l)〉 if (i,j)∈𝒦_r×𝒦_r, and L〈p_l(1 − p_l)〉 otherwise. In the N → ∞ limit, and assuming the sizes {K_r} grow more slowly than N, then the $\sum_{r} K_{r}^{2} ≪ N^{2}$ matrix entries corresponding to communities become negligible and X limits to a Wigner matrix [41]. This allows us to use known results for the limiting dominant eigenvector of low-rank perturbations of Wigner matrices with variance 1/N. Specifically, we define $γ = 1 / \sqrt{N L 〈 p_{l} (1 - p_{l}) 〉}$ so that the matrix γX has entries with variance 1/N in the limit. We similarly define

{\bar{θ}}_{r} = γ θ_{r} = \frac{T_{r} K_{r}}{\sqrt{N L}} \frac{ρ_{r} - 〈 p_{l} 〉}{\sqrt{〈 p_{l} (1 - p_{l}) 〉}}

(9)

so that γB̄ = Σ_rθ̄_ru⁽^r⁾(u⁽^r⁾)^T + γX. It follows that the limiting N → ∞ dominant eigenvectors {v⁽^r⁾} of γB̄ (and of B̄ since scalar multiplication does not affect eigenvectors) satisfy [40,42]

{∣ 〈 v^{(r)}, u^{(r)} 〉 ∣}^{2} = {\begin{cases} 1 - 1 / {\bar{θ}}^{2} & \bar{θ} > 1 \\ 0 & otherwise. \end{cases}

(10)

Note we assume that the dominant eigenvectors have been suitably enumerated so that v⁽^r⁾ corresponds to the eigenvector localizing on community r. The value θ̄_r = 1 identifies critical points at which there is a phase transition in eigenvector localization and detectability for community r, and this gives the critical community size

K_{r}^{*} = \sqrt{T_{r}^{- 2} N L} \frac{\sqrt{〈 p_{l} (1 - p_{l}) 〉}}{ρ_{r} - 〈 p_{l} 〉} .

(11)

In other words, a small community can be detected using a dominant vector v⁽^r⁾ of B̄ only when $K_{r} > K_{r}^{*}$ . We note that setting L = T_r = 1, ρ_r = 1, and p_l = p in Eq. (11) recovers $K_{r}^{*} = \sqrt{N p / (1 - p)}$ , which describes the detectability transition for a single planted clique in a single-layer network [26].

We highlight an important consequence of Eq. (11). First, if the community persists across some fixed fraction of the layers, T(L) = cL, then $K_{r}^{*} \propto \sqrt{N / L}$ ; therefore, if N, p, and T_r/L are held fixed and L increases, then $K_{r}^{*}$ vanishes with scaling 𝒪(L^−1/2). This square-root scaling behavior is similar to that obtained for detection in layer aggregation of large-scale communities that persist across all layers [10]. Second, for fixed N and p, a community of fixed size K_r and persistence T_r will become impossible to detect as L increases because $K_{r}^{*}$ increases with scaling 𝒪(L^1/2). This result highlights the importance of knowing which layers potentially contain the community since the aggregation of layers lacking the community can severely inhibit its detection.

Digging further, one can let T_r vary with L and then ask how $K_{r}^{*}$ depends on the scaling behavior for T_r. For T_r ∝ L^β, Eq. (11) implies $K_{r}^{*} \propto L^{1 / 2 - β}$ so that as L → ∞,

K_{r}^{*} \to {\begin{cases} 0 & β > 1 / 2 \\ \infty & β < 1 / 2. \end{cases}

(12)

In other words, T_r, the number of layers containing the community, must increase with L at least as 𝒪(L^1/2); otherwise, summation-based layer aggregating will inhibit (rather than promote) small-community detection. Note that T ∝ L^−1/2 is a critical case in which $K_{r}^{*}$ is independent of L. We highlight that Eq. (12) is somewhat surprising since summation-based aggregation benefits detection even if the fraction T_r/L of layers containing the community vanishes with L, provided that it decays more slowly than 𝒪(L^−1/2).

2. Numerical validation and scaling behavior

We support Eqs. (10) and (11) in Fig. 2, using numerical experiments with N = 10⁴ nodes and edge probabilities {p_l} drawn from a Gaussian distribution with mean p = 0.01 and standard deviation σ_p = 0.001. We focus on the case of clique detection (i.e., ρ = 1), hiding the clique in T = 2 of the L = 16 layers. In Fig. 2(a), we plot the entries { $v_{i}^{(r)}$ } (symbols) of the dominant eigenvector of the modularity matrix for the summation network as well as the entries { $u_{i}^{(r)}$ } for the indicator vector, which are nonzero only for nodes i ∈ 𝒦 involved in the clique. We show results for community sizes K_r ∈ {6, 26, 86}, which respectively place the system below, just above, and well above the phase transition. The illustration highlights that as K increases, vector v⁽^r⁾ aligns with u⁽^r⁾.We quantify this localization phenomenon by plotting in Fig. 2(b) observed (symbols) and predicted values of |〈v, u〉|² given by Eq. (10) (curve). Note that the values of |〈v⁽^r⁾, u⁽^r⁾〉|² depict a phase transition that occurs at a critical subgraph size $K_{r}^{*}$ given by Eq. (11): |〈v⁽^r⁾, u⁽^r⁾〉|² > 0 when $K_{r} > K_{r}^{*}$ , whereas |〈v, u〉|² = 0 when $K_{r} \leq K_{r}^{*}$ . This phase transition in eigenvector localization drives a phase transition for community detection based on v⁽^r⁾. Arrows indicate the values of K_r used in panel (a).

In Fig. 3(a), we compare observed (symbols) and predicted values of |〈v, u〉|² given by Eq. (10) (curves) for varying K_r with T_r ∈ {1, 2, 4, 8}. Open symbols indicate the parameters used in Fig. 2, whereas filled symbols indicate the mean value of |〈v, u〉|² for 10 trials in which the layers’ edge probabilities {p_l} are drawn uniformly from [0, 0.02]. Note that as T_r increases, the curves shift to the left, illustrating that as the community persists across more layers, the localization phenomenon is stronger and the hidden community is easier to detect. In Fig. 3(b), we study the dependence of $K_{r}^{*}$ on the number of layers, L, and we compare the effect of keeping T_r fixed vs allowing T_r to grow with L. Specifically, we set either T_r(L) = 20 or T_r(L) = L, and we plot the value of $K_{r}^{*}$ given by Eq. (11). Note that if the community persists across a fraction of the layers—that is, T_r(L) = cL for some constant c—then $K_{r}^{*}$ vanishes with scaling 𝒪(L^−1/2). However, if T_r is held fixed, then $K_{r}^{*}$ increases with scaling 𝒪(L^1/2).

In summary, these experiments illustrate how layer aggregation through summation can enhance small-community detection if the community persists across sufficiently many layers, but it can obscure detection if the community is present in too few layers. We will see in the next section that thresholding the summation can help overcome this problem, potentially reducing the detectability limit by orders of magnitude to yield super-resolution community detection.

B. Thresholding as a nonlinear data filter

1. Random matrix theory for modularity matrices

We now study layer aggregation with thresholding as a filter that enhances small-community detection. We begin by solving for effective edge probabilities for the thresholding process [10]. Thresholding the summation Σ_lA⁽^l⁾ at L̃ yields a binary adjacency matrix Â⁽^L̃⁾ with entries ${\hat{A}}_{i j}^{(\tilde{L})} \in {0, 1}$ indicating whether or not Ā_ij ≥ L̃. For edges (i, j)∈∉_r{𝒦_r × 𝒦_r}, Ā_ij follows a Poisson binomial distribution f_PB(a; L, {p_l}) given by Eq. (2), and the inequality is satisfied with probability

{\hat{p}}^{(\tilde{L})} = P [{\bar{A}}_{i j} \geq \tilde{L}] = 1 - F_{PB} (\tilde{L} - 1, L, {p_{l}}),

(13)

where F_PB(a, L, {p_l}) is the associated cumulative distribution function (CDF). For edges (i, j) ∈ {𝒦_r × 𝒦_r}, Ā_ij follows a Poisson binomial distribution $f_{PB} (a; L, {q_{l}^{(r)}})$ given by Eq. (2), and the inequality is satisfied with probability

{\hat{ρ}}_{r}^{(\tilde{L})} = P [{\bar{A}}_{i j} \geq \tilde{L}] = 1 - F_{PB} (\tilde{L} - 1, L, {q_{l}^{(r)}}),

(14)

where $q_{l}^{(r)} = ρ_{r}$ for l ∈ 𝒯_r and otherwise $q_{l}^{(r)} = p_{l}$ . In the case of a clique (i.e., ρ_r = 1), Eq. (14) can be written as

{\hat{ρ}}_{r}^{(\tilde{L})} = 1 - F_{PB} (\tilde{L} - T_{r} - 1, L - T_{r}, {p_{l}}_{l \notin T_{r}}) .

(15)

Given the effective edge probabilities for the network and a community (i.e., p̂⁽^L̃⁾ and ${\hat{ρ}}_{r}^{(L)}$ , respectively), it is straightforward to study the detectability limits of a community for thresholded networks using Eqs. (10) and (11). In particular, we substitute L = T_r = 1 to obtain

{∣ 〈 {\hat{v}}^{(r)}, u^{(r)} 〉 ∣}^{2} = {\begin{cases} 1 - 1 / {\hat{θ}}_{r}^{2} & {\hat{θ}}_{r} > 1 \\ 0 & otherwise, \end{cases}

(16)

where v̂⁽^r⁾ is a dominant eigenvector of modularity matrix

\hat{B} = {\hat{A}}^{(\tilde{L})} - {\hat{p}}^{(\tilde{L})} 1 1^{T}

(17)

and ${\hat{θ}}_{r} = K ({\hat{ρ}}_{r}^{(\tilde{L})} - {\hat{p}}^{(\tilde{L})}) / \sqrt{N {\hat{p}}^{(\tilde{L})} (1 - {\hat{p}}^{(\tilde{L})})}$ . Setting θ̂_r = 1 gives a detectability limit for each community r in terms of the effective edge probabilities p̂⁽^L̃⁾ and ${\hat{ρ}}_{r}^{(\tilde{L})}$ ,

{\hat{K}}_{r}^{*} = \frac{\sqrt{N {\hat{p}}^{(\tilde{L})} (1 - {\hat{p}}^{(\tilde{L})})}}{{\hat{ρ}}_{r}^{(\tilde{L})} - {\hat{p}}^{(\tilde{L})}} .

(18)

Equations (16)–(18) illustrate that the detectability limits for thresholded networks depend only on the effective edge probabilities; however, these depend sensitively on the choice of threshold L̃.

Importantly, ${\hat{K}}_{r}^{*}$ given by Eq. (18) can potentially be orders of magnitude smaller than $K_{r}^{*}$ given by Eq. (11), a phenomenon we call super-resolution detection. In addition to numerical experiments that will follow below, we further study this phenomenon by comparing ${\hat{K}}_{r}^{*}$ and $K_{r}^{*}$ for network parameters wherein we can obtain deeper insight. We consider clique detection (i.e., ρ_r = 1) in a sparse network (i.e., p_l ≪ 1) and focus on the threshold value L̃ = T_r to obtain

{\hat{K}}_{r}^{*} \approx \sqrt{N} \sqrt{{\hat{p}}^{(T_{r})}} .

(19)

Using these assumptions also in Eqs. (13) and (15), we find the effective edge probabilities p̂^(T_r)=1–F_PB(T_r–1, L,{p_l}) and ${\hat{ρ}}_{r}^{(T_{r})} = 1$ . Furthermore, we apply Hoeffding’s inequality [43] to obtain p̂^(T_r) ≤ e^{−2L(〈p_l〉−T_r/L)2}. Noting 0 < 〈p_l〉 ≪ T_r/L, we find the 〈p_l〉 → 0 limiting bound

{\hat{p}}^{(T_{r})} \leq e^{- 2 T_{r}^{2} / L},

(20)

illustrating that p̂^(T_r) and ${\hat{K}}_{r}^{*}$ decay exponentially with $T_{r}^{2} / L$ . On the other hand, we use the sparsity assumption in Eq. (11) to obtain

K_{r}^{*} \approx \frac{\sqrt{N L 〈 p_{l} 〉}}{\sqrt{T_{r}^{2}}} .

(21)

Thus, in this case, $K_{r}^{*}$ decays as $O (1 / \sqrt{T_{r}^{2} / L})$ , whereas ${\hat{K}}_{r}^{*}$ decays exponentially (i.e., considerably faster) with $T_{r}^{2} / L$ .

2. Numerical validation and super-resolution detection

We now support Eqs. (13)–(18) with numerical experiments and illustrate that certain thresholds lead to super-resolution community detection. We consider the detection of a dense subgraph that is hidden in both (a) a dense network with 〈p_l〉 = 0.5 and (b) a sparse network with 〈p_l〉 = 0.01. Both networks were constructed with N = 10⁴, σ_p = 0.001, ρ_r = 1, L = 16, and T_r = 5.

In Fig. 4, we compare observed (symbols) and predicted values (curves) of the effective edge probabilities p̂⁽^L̃⁾ given by Eq. (13) and ${\hat{ρ}}_{r}^{(\tilde{L})}$ given by Eq. (14) as a function of the threshold L̃. Note in both panels that the effective edge probability p̂ ⁽^L̃⁾ of the background network always decays with increasing L̃. In contrast, the effective edge probability between nodes in the community depends on whether or not $\tilde{L} > T_{r} : {\hat{ρ}}_{r}^{(\tilde{L})} = 1$ when L̃ ≤ T_r since ρ = 1, whereas ${\hat{ρ}}_{r}^{(\tilde{L})}$ decays with increasing L̃ for L̃ > T_r. Importantly, the rate of decay depends on the network’s mean edge density 〈p_l〉: ρ̂ ⁽^L̃⁾ slowly decreases for the dense network, whereas it abruptly drops for the sparse network.

In Fig. 5, we plot observed (symbols) and predicted values (curves) for |〈v⁽^r⁾, u⁽^r⁾〉|² given by Eq. (16) vs K for different choices of L̃. The parameters used are identical to those of Fig. 4, and panels (a) and (b) again depict results for 〈p_l〉 = 0.5 and 〈p_l〉 = 0.01, respectively. We highlight several important observations. First, note in both panels that L̃ = T_r = 5 yields better detectability than L̃ = 1. However, when L̃ > T_r, we find contrasting results for sparse and dense networks. For the sparse network shown in Fig. 5(b), the hidden community becomes harder to detect when L̃ > T_r (see curve for L̃ = 16), which intuitively occurs because ${\hat{ρ}}_{r}^{(\tilde{L})}$ rapidly decays and the thresholded networks will no longer contain a dense subgraph. On the other hand, for the dense network depicted in Fig. 5(a), increasing L̃ can improve detectability when L̃ > T_r (see curve for L̃ = 10).

We now present an experiment highlighting the occurrence of super-resolution community detection for certain threshold values. In Fig. 6, we study the dependence of the critical community size $K_{r}^{*}$ on the threshold L̃. We plot ${\hat{K}}_{r}^{*}$ given by Eq. (18) as a function of L̃ for p ∈ {0.01, 0.05, 0.2, 0.5}, N = 10⁴, ρ = 1, σ_p = 0.001, L = 16, and either (a) T_r = 5 or (b) T_r = 10. Note that for the sparsest network, i.e., p = 0.01, the minimum value of K_* occurs when L̃ = T_r (vertical dashed line). Interestingly, as the mean edge density p = 〈p_l〉 increases, the threshold L̃ at which ${\hat{K}}_{r}^{*}$ attains its minimum value shifts from L̃ = T_r towards L̃ = L. The horizontal lines on the right edge of the panels indicate $K_{r}^{*}$ given by Eq. (11) for the summation network.

Importantly, note that for a wide range of parameters, ${\hat{K}}_{r}^{*}$ for the thresholded networks is significantly smaller than $K_{r}^{*}$ for the corresponding summation networks. In particular, one can observe for p = 0.1 and L̃/L = T_r/L in Fig. 6(b) that ${\hat{K}}_{r}^{*}$ is many orders of magnitude smaller than $K_{r}^{*} [O (10^{- 6}) times here]$ . In other words, thresholding the summation can dramatically improve detectability as compared to summation without thresholding. This surprising result contrasts our previous findings for the detectability of large communities that persist across all layers [10], where it was found that thresholding always inhibited detection (although optimal thresholds were found to minimize inhibition).

IV. SMALL-COMMUNITY DETECTION IN TIME-VARYING NETWORKS

We now present an experiment involving small-community detection in time-varying networks to highlight several practical insights following from our theoretical results. Note that unlike Sec. III, where there were no restrictions on which layers a community persists, we now assume that each community persists across consecutive layers. We conducted experiments for a synthetic temporal network with N = 10⁴ nodes and L = 32 time layers, each of which is drawn from an ER network with edge probability p_l, which we drew from a Gaussian distribution with mean p = 0.01 and standard deviation σ_p = 0.001. We then planted R = 4 communities, each involving K_r = K = 8 nodes, in the following sets of layers: 𝒯₁ = {3, 4, 5} for community 1, 𝒯₂ = {7,…, 15} for community 2, 𝒯₃ = {18,…, 22} for community 3, and 𝒯₄ = {24,…, 30} for community 4. In Fig. 7(a), we provide a representative illustration of the temporal network, where we indicate in which layers the communities are present. We also illustrate by the shaded region an example time window, or bin, 𝒲_w(t) = {t − (w − 1)/2,…, t + (w − 1)/2} for t ∈ {(w − 1)/2, L − (w − 1)/2}, that contains layers to be aggregated.

We first consider aggregation by summation. In Fig. 7(b), we illustrate by color the values |〈v⁽^r⁾, u⁽^r⁾〉|² for the aggregation of layers across bins 𝒲_w(t). In particular, we show Eq. (10) under the variable substitutions T_r(𝒲_w(t)) ↦ T and w ↦ L, where T_r(𝒲_w(t)) = |𝒲_w(t) ∩ 𝒯 _r| is the number of layers in which community r is present in bin 𝒲_w(t). We show results for several bin widths w ∈ {1, 3, 5, 7, 9}. The green arrows indicate, for each r, the bin location and w value at which |〈v⁽^r⁾, u⁽^r⁾〉|² obtains its maximum. As expected, |〈v⁽^r⁾, u⁽^r⁾〉|² obtains its maximum for each community r when the bin 𝒲_w(t) is exactly the set of layers in which community r is present, 𝒲_w(t) = 𝒯 _r (i.e., when T_r = w).

Before studying aggregation by summation and thresholding, we first make several important observations using Fig. 7. First, note that for w = 1 in panel (b), no communities are detectable. In other words, all communities are undetectable if the layers are studied in isolation. However, they can be detected if the layers are binned into time windows. Second, because the optimal bin size w is unique to every community (i.e., because they have different persistence T_r ∈ [3, 9]), there is no bin size that is best for all communities. In fact, detectability requires $K_{r} > K_{r}^{*}$ given by Eq. (11), which requires that, for each community, w is not too large or too small. For example, community 1 is only detectable when w = 3, and community 3 is only detectable when w ∈ [3, 7].

One final important observation for Fig. 7(b) is that even when communities are detectable, the values |〈v⁽^r⁾, u⁽^r⁾〉|² are not very large—specifically, |〈v⁽^r⁾, u⁽^r⁾〉|² ≤ 0.7 in all cases. This can be problematic since detection error rates increase as |〈v⁽^r⁾, u⁽^r⁾〉|² decreases, approaching 100% error as |〈v⁽^r⁾, u⁽^r⁾〉|² → 0. (See Ref. [27] for an analysis of error rates based on a hypothesis-testing framework for clique detection in single-layer networks.) Because |〈v⁽^r⁾, u⁽^r⁾〉|² remains small for community 1 for all choices of w, it effectively remains undetectable by summation-based layer aggregation.

We now illustrate layer aggregation with thresholding as a filter that can allow greatly improved small-community detection for the temporal network shown in Fig. 7(a), including the accurate recovery of community 1. In Fig. 8, we plot |〈v̂⁽^r⁾, u⁽^r⁾〉|² given by Eq. (16) with the variable substitutions T_r(𝒲_w(t)) ↦ T and w ↦ L into Eqs. (13)–(18). Results reflect the aggregation of layers into bins 𝒲_w(t) for each of the four communities r ∈ {1, 2, 3, 4} and with bin sizes w ∈ {1, 3, 5, 7, 9}. Panels (a)–(c) indicate results for different thresholds, L̃ ∈ {w, 0.8w, 0.5w}.

Our first observation for Fig. 8 is that none of the communities can be detected (for any threshold) if the layers are analyzed in isolation (see results for window size w = 1). This result is similar to that shown in Fig. 7(b) for summation without thresholding (i.e., whenever w = 1, we find |〈v̂⁽^r⁾, u⁽^r⁾〉|² = |〈v⁽^r⁾, u⁽^r⁾〉|² = 0). In other words, the detectability of communities is only made possible through layer aggregation.

Our next observation is that the values |〈v̂⁽^r⁾, u⁽^r⁾〉|² are either zero or close to one, which is in sharp contrast to the values of |〈v⁽^r⁾, u⁽^r⁾〉|² shown in Fig. 7(b), which can be observed to obtain many values across the range [0, 0.7]. In other words, in this experiment, the use of thresholding as a filter allows small communities to be either strongly detected or not detected—there is no middle ground for weak detection (which is the case for layer aggregation without thresholding). This is important since error rates for community detection vanish as |〈v̂⁽^r⁾, u⁽^r⁾〉|² → 1 [27].

Our final observation is that different threshold values enhance the detectability of different communities. For example, community 1 is detectable when w = 3 for L̃ ≥ 0.8w but not for L̃ = 0.5w [compare panels (a) and (b) to panel (c)]. Similarly, community 3 is detectable when w = 9 for L̃ ≤ 0.8w but not for L̃ = w [compare panels (b) and (c) to panel (a)]. Interestingly, in this experiment, we were able to identify a combination of parameters (L̃,w) that allows accurate detection of all four communities—that is, |〈v̂⁽^r⁾, u⁽^r⁾〉|² ≈ 1 for bin 𝒲_w(t) only when community r is present in time layer t [i.e., t ∈ 𝒯_r]; otherwise, |〈v̂⁽^r⁾, u⁽^r⁾〉|² ≈ 0. We highlight these values of ( L̃,w) in panel (b) with a violet box. However, we stress that these “best” values for ( L̃, w) arise in this experiment because the communities are relatively similar in size (i.e., K_r ∈ [3, 9]) and density (i.e., ρ_r = 1). In general, one should not expect there to exist one choice of parameters ( L̃,w) to work well for all communities since the detectability-limit criterion given by Eq. (18) depends on a complex interplay between the network and community parameters {p_l}, ρ_L, T_r, K_r, L, and L̃.

V. DISCUSSION

There is considerable need to better understand how network preprocessing affects network-analysis methodologies. Herein, we studied how different methods for layer aggregation affect the detectability of small-scale communities in multilayer networks (including multilayer representations of temporal networks). Small-community detection is widely used for anomaly detection in network data [23–28]; in cybersecurity, for example, it allows detection of harmful events such as attacks [23], intrusions [24], and fraud [25]. Understanding limitations on small-community detection provides insight towards the detectability of these harmful activities. Despite most networks inherently changing in time, previous theory for limitations on small-community detection have been restricted to single-layer networks [26,27] or summation-based aggregation [11]. We highlight that our model and analysis generalizes these previous works in several ways: (i) A community has edge probability ρ ∈ (0, 1] and is not necessarily a clique, (ii) a community can persist across a subset of layers, (iii) the mean edge probability p_l can vary across network layers, and (iv) the multilayer or temporal network can simultaneously contain several communities.

Motivated in this way, we developed random matrix theory [27,40] to analyze detectability phase transitions in which the dominant eigenvectors of modularity matrices associated with layer-aggregated multilayer networks localize onto communities, thereby allowing their detection. We developed theory for when a community with K_r ≪ N nodes is hidden (i.e., planted) in T_r ≤ L layers of a multilayer network with N nodes and L layers. We found a detectability phase transition to occur for a given community r when its size K_r surpasses a detectability limit. When layers are aggregated by summation, the detectability limit $K_{r}^{*}$ is given by Eq. (11) and has the scaling behavior $K_{r}^{*} \propto \sqrt{N L} / T_{r}$ . Surprisingly, if L is allowed to vary, this implies that summation-based aggregation enhances community detection even if the community exists in a vanishing fraction T_r/L of layers, provided that T_r/L decays more slowly than 𝒪(L^−1/2). This result is surprising since layer aggregation still benefits community detection despite the fact that most layers carry no information about the community.

We also introduced and studied the utility of layer aggregation with thresholding as a nonlinear data filter to enhance small-community detection. Our analysis [particularly, Eq. (18)] revealed that in addition to implementing sparsification and dichotomization, thresholding can allow super-resolution community detection, whereby the detectability limit decreases by several orders of magnitude (see Fig. 6). In particular, we showed in Sec. III B that ${\hat{K}}_{r}^{*}$ decays exponentially with $\sqrt{L} / T_{r}$ for clique detection in layer-aggregated sparse networks filtered by threshold L̃ = T_r.

To illustrate practical implications of our results, in Sec. IV we presented an experiment involving the detection of small communities in a time-varying network, highlighting the following key insights:

Aggregating time layers into appropriate-sized bins can allow the detection of small communities that would otherwise be undetectable (that is, if the layers were considered in isolation or if all layers were aggregated).
Layer aggregation by summation enhances community detection if the community persists across sufficiently many [specifically, 𝒪(L^1/2)] layers; otherwise, it can obscure detection.
Layer aggregation with thresholding is a filter that can allow super-resolution community detection of small communities that are otherwise too small for detection.
The threshold that best enhances the detection of a small community depends on many parameters, and the detection of multiple communities should, in general, utilize multiple thresholds.

We have thus provided a theoretical framework supporting how small-community detection in temporal network data can be improved through network preprocessing in which network layers are binned into time windows and are aggregated using summation with thresholding. This filtering, however, should not be approached as a “one-size-fits-all” procedure. In particular, we find that there exist optimal time window sizes w and layer-aggregation strategies that, in general, are unique to each community (i.e., depending on its size, density, persistence across the layers, etc.). While it is important to consider a range of window sizes and layer-aggregation methods, this leads to an unavoidable trade-off between computational cost and sufficient exploration of different parameters.

Before concluding, we discuss implications of our work regarding the topic of eigenvector localization in complex networks, which is an important topic in network science [44,45] for the study of centrality [46–48], spatial analysis [49], and core-periphery structure [50,51]. In particular, there is growing interest in extending these ideas to time-varying [52] and multilayer networks [53]. Recently, Ref. [54] showed that an Anderson-localization-type transition occurs for material transport on several real-world networks (e.g., interconnected ponds of melting sea ice, porous human bone, and resistor networks) and noted that they did not observe the wave interference and scattering effects that typically occur for Anderson localization (a widely studied phenomenon in which eigenfunctions localize onto defects in disordered materials [55,56]). Reference [54] found the phase transition to coincide with a phase transition in network connectivity due to eigenvector localization onto different connected components. Our work complements these findings, showing that a similar localization phenomenon can be brought on by small communities—that is, localization does not necessarily require network fragmentation. (We note in passing that connected components can be interpreted as one, and perhaps the strictest, notion of a community.) Future research should further explore the connection between community-based and connected-component-based eigenvector localization on networks, and their relationship to Anderson localization in materials. (See Refs. [57,58] for related research using network-based models for disordered and composite materials.)

Finally, we highlight other extensions to our work that would be interesting to pursue. Motivated by applications for data fusion, recent research [11] considered weighted averaging of adjacency matrices, allowing them to optimize the weights for the different network layers. It would be interesting to extend our research to weighted averages, which should be fairly straightforward by redefining 〈·〉 in Eqs. (9)–(11) with weights. We leave open the joint optimization of weighting and thresholding. Finally, it would also be interesting to use our method to study the temporal behavior of communities [59], such as a set of nodes that form a recurring community in different time windows (i.e., periodically or stochastically).

Acknowledgments

D. T. and P. J. M. were supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health (Grant No. R01HD075712) and a James S. McDonnell Foundation 21st Century Science Initiative–Complex Systems Scholar Award (No. 220020315). R. S. C. was supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or No. FA8702-15-D-0001. Interpretations, opinions, and conclusions of this work are those of the authors and do not reflect the official position of these funding agencies.

References

1.Newman MEJ. The Structure and Function of Complex Networks. SIAM Rev. 2003;45:167. [Google Scholar]
2.Lewis K, Kaufman J, Gonzalez M, Wimmer A, Christakis N. Tastes, Ties, and Time: A New Social Network Dataset Using Facebook.com. Soc Networks. 2008;30:330. [Google Scholar]
3.Holme P, Saramäki J. Temporal Networks. Phys Rep. 2012;519:97. [Google Scholar]
4.Boccaletti S, Bianconi G, Criado R, Del Genio C, Gómez-Gardenes J, Romance M, Sendina-Nadal I, Wang Z, Zanin M. The Structure and Dynamics of Multilayer Networks. Phys Rep. 2014;544:1. doi: 10.1016/j.physrep.2014.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA. Multilayer Networks. J Complex Netw. 2014;2:203. [Google Scholar]
6.Menichetti G, Remondini D, Bianconi G. Correlations Between Weights and Overlap in Ensembles of Weighted Multiplex Networks. Phys Rev E. 2014;90:062817. doi: 10.1103/PhysRevE.90.062817. [DOI] [PubMed] [Google Scholar]
7.De Domenico M, Nicosia V, Arenas A, Latora V. Structural Reducibility of Multilayer Networks. Nat Commun. 2015;6:6864. doi: 10.1038/ncomms7864. [DOI] [PubMed] [Google Scholar]
8.Kleineberg KK, Boguna M, Serrano MA, Papadopoulos F. Hidden Geometric Correlations in Real Multiplex Networks. Nat Phys. 2016;12:1076. doi: 10.1103/PhysRevLett.118.218301. [DOI] [PubMed] [Google Scholar]
9.Stanley N, Shai S, Taylor D, Mucha PJ. Clustering Network Layers with the Strata Multilayer Stochastic Block Model. IEEE Trans Network Sci Eng. 2016;3:95. doi: 10.1109/TNSE.2016.2537545. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Taylor D, Shai S, Stanley N, Mucha PJ. Enhanced Detectability of Community Structure in Multilayer Networks through Layer Aggregation. Phys Rev Lett. 2016;116:228301. doi: 10.1103/PhysRevLett.116.228301. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Nayar H, Miller BA, Geyer K, Caceres RS, Smith ST, Nadakuditi RR. Improved Hidden Clique Detection by Optimal Linear Fusion of Multiple Adjacency Matrices. 49th Asilomar Conference on Signals, Systems and Computers, 1520; New York: IEEE; 2015. [Google Scholar]
12.Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science. 2010;328:876. doi: 10.1126/science.1184819. [DOI] [PubMed] [Google Scholar]
13.Bassett DS, Wymbs NF, Porter MA, Mucha PJ, Carlson JM, Grafton ST. Dynamic Reconfiguration of Human Brain Networks During Learning. Proc Natl Acad Sci USA. 2011;108:7641. doi: 10.1073/pnas.1018985108. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Chung F, Zhao W. A Sharp PageRank Algorithm with Applications to Edge Ranking and Graph Sparsification. Proceedings of the 2010 International Workshop on Algorithms and Models for the Web-Graph; London: Nature Publishing; 2010. pp. 2–14. [Google Scholar]
15.Hill SM, et al. Inferring Causal Molecular Networks: Empirical Assessment through a Community-Based Effort. Nat Methods. 2016;13:310. doi: 10.1038/nmeth.3773. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Clauset A, Moore C, Newman MEJ. Hierarchical Structure and the Prediction of Missing Links in Network. Nature (London) 2008;453:98. doi: 10.1038/nature06830. [DOI] [PubMed] [Google Scholar]
17.Newman MEJ. Measurement Errors in Network Data. arXiv:1703.07376. [Google Scholar]
18.Fortunato S. Community Detection in Graphs. Phys Rep. 2010;486:75. [Google Scholar]
19.Rosvall M, Bergstrom CT. Maps of Random Walks on Complex Networks Reveal Community Structure. Proc Natl Acad Sci USA. 2008;105:1118. doi: 10.1073/pnas.0706851105. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lancichinetti A, Fortunato S, Radicchi F. Benchmark Graphs for Testing Community Detection Algorithms. Phys Rev E. 2008;78:046110. doi: 10.1103/PhysRevE.78.046110. [DOI] [PubMed] [Google Scholar]
21.Sales-Pardo M, Guimera R, Moreira AA, Amaral LAN. Extracting the Hierarchical Organization of Complex Systems. Proc Natl Acad Sci USA. 2007;104:15224. doi: 10.1073/pnas.0703740104. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Moody J. Peer Influence Groups: Identifying Dense Clusters in Large Networks. Soc Networks. 2001;23:261. [Google Scholar]
23.Mavroeidis D, Batina L, van Laarhoven T, Marchiori E. PCA, Eigenvector Localization and Clustering for Side-Channel Attacks on Cryptographic Hardware Devices. Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Berlin: Springer; 2012. pp. 253–268. [Google Scholar]
24.Ding Q, Katenka N, Barford P, Kolaczyk E, Crovella M. Intrusion as (Anti) Social Communication: Characterization and Detection. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York: ACM; 2012. pp. 886–894. [Google Scholar]
25.Chen S, Gangopadhyay A. A Novel Approach to Uncover Health Care Frauds Through Spectral Analysis. IEEE International Conference on Healthcare Informatics; New York: IEEE; 2013. pp. 499–504. [Google Scholar]
26.Alon N, Krivelevich M, Sudakov B. Finding a Large Hidden Clique in a Random Graph. Random Struct Algorithm. 1998;13:457. [Google Scholar]
27.Nadakuditi RR. On Hard Limits of Eigen-Analysis Based Planted Clique Detection. IEEE Statistical Signal Processing Workshop; New York: IEEE; 2012. p. 129. [Google Scholar]
28.Miller BA, Beard MS, Wolfe PJ, Bliss NT. Spectral Framework for Anomalous Subgraph Detection. IEEE Trans Signal Process. 2015;63:4191. [Google Scholar]
29.Ghasemian A, Zhang P, Clauset A, Moore C, Peel L. Detectability Thresholds and Optimal Algorithms for Community Structure in Dynamic Networks. Phys Rev X. 2016;6:031005. [Google Scholar]
30.Kawamoto T, Kabashima Y. Detectability of the Spectral Method for Sparse Graph Partitioning. Europhys Lett. 2015;112:40007. doi: 10.1103/PhysRevE.91.062803. [DOI] [PubMed] [Google Scholar]
31.Decelle A, Krzakala F, Moore C, Zdeborová L. Inference and Phase Transitions in the Detection of Modules in Sparse Networks. Phys Rev Lett. 2011;107:065701. doi: 10.1103/PhysRevLett.107.065701. [DOI] [PubMed] [Google Scholar]
32.Nadakuditi RR, Newman MEJ. Graph Spectra and the Detectability of Community Structure in Networks. Phys Rev Lett. 2012;108:188701. doi: 10.1103/PhysRevLett.108.188701. [DOI] [PubMed] [Google Scholar]
33.Radicchi F. Detectability of Communities in Heterogeneous Networks. Phys Rev E. 2013;88:010801. doi: 10.1103/PhysRevE.88.010801. [DOI] [PubMed] [Google Scholar]
34.Peixoto TP. Eigenvalue Spectra of Modular Networks. Phys Rev Lett. 2013;111:098701. doi: 10.1103/PhysRevLett.111.098701. [DOI] [PubMed] [Google Scholar]
35.Chen PY, Hero AO. Phase Transitions in Spectral Community Detection. IEEE Trans Signal Process. 2015;63:4339. [Google Scholar]
36.Fortunato S, Barthelemy M. Resolution Limit in Community Detection. Proc Natl Acad Sci USA. 2007;104:36. doi: 10.1073/pnas.0605965104. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Chen PY, Hero AO. Multilayer Spectral Graph Clustering via Convex Layer Aggregation: Theory and Algorithms. arXiv:1708.02620. [Google Scholar]
38.Anderson BDO, Moore JB. Optimal Filtering. Prentice-Hall; Englewood Cliffs, New Jersey: 1979. [Google Scholar]
39.Newman MEJ, Girvan M. Finding and Evaluating Community Structure in Networks. Phys Rev E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]
40.Benaych-Georges F, Nadakuditi RR. The Eigenvalues and Eigenvectors of Finite, Low Rank Perturbations of Large Random Matrices. Adv Math. 2011;227:494. [Google Scholar]
41.Bai Z, Silverstein JW. Spectral Analysis of Large Dimensional Random Matrices. Springer; New York: 2010. [Google Scholar]
42.Capitaine M, Donati-Martin C, Féral D. The Largest Eigenvalues of Finite Rank Deformation of Large Wigner Matrices: Convergence and Nonuniversality of the Fluctuations. Ann Prob. 2009;37:1. [Google Scholar]
43.Hoeffding W. Probability Inequalities for Sums of Bounded Random Variables. J Am Stat Assoc. 1963;58:13. [Google Scholar]
44.Méndez-Bermúdez JA, Alcazar-Lopez A, Martinez-Mendoza AJ, Rodrigues FA, Peron TKDM. Universality in the Spectral and Eigenfunction Properties of Random Networks. Phys Rev E. 2015;91:032122. doi: 10.1103/PhysRevE.91.032122. [DOI] [PubMed] [Google Scholar]
45.Pastor-Satorras R, Castellano C. Distinct Types of Eigenvector Localization in Networks. Sci Rep. 2016;6:18847. doi: 10.1038/srep18847. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Martin T, Zhang X, Newman MEJ. Localization and Centrality in Networks. Phys Rev E. 2014;90:052808. doi: 10.1103/PhysRevE.90.052808. [DOI] [PubMed] [Google Scholar]
47.Kawamoto T. Localized Eigenvectors of the Non-Backtracking Matrix. J Stat Mech. 2016:023404. [Google Scholar]
48.Nassar H, Kloster K, Gleich DF. Strong Localization in Personalized Page Rank Vectors. Algorithms and Models for the Web Graph; Proceedings of the 2015 Workshop on Algorithms for the Web-Graph; Cham: Springer; 2015. pp. 190–202. [Google Scholar]
49.Cucuringu M, Blondel VD, Van Dooren P. Extracting Spatial Information from Networks with Low-Order Eigenvectors. Phys Rev E. 2013;87:032803. [Google Scholar]
50.Barucca P, Tantari D, Lillo F. Centrality Metrics and Localization in Core-Periphery Networks. J Stat Mech. 2016;2:023401. [Google Scholar]
51.Suweis S. Effect of Localization on the Stability of Mutualistic Ecological Networks. Nat Commun. 2015;6:10179. doi: 10.1038/ncomms10179. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Taylor D, Myers SA, Clauset A, Porter MA, Mucha PJ. Eigenvector-Based Centrality Measures for Temporal Networks. Multiscale Modeling Sim. 2017;15:537. doi: 10.1137/16M1066142. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Méndez-Bermúdez JA, de Arruda GF, Rodrigues FA, Moreno Y. Scaling Properties of Multilayer Random Networks. Phys Rev E. 2017;96:012307. doi: 10.1103/PhysRevE.96.012307. [DOI] [PubMed] [Google Scholar]
54.Murphy NB, Cherkaev E, Golden KM. Anderson Transition for Classical Transport in Composite Materials. Phys Rev Lett. 2017;118:036401. doi: 10.1103/PhysRevLett.118.036401. [DOI] [PubMed] [Google Scholar]
55.Anderson PW. Localized Magnetic States in Metals. Phys Rev Lett. 1961;124:41. [Google Scholar]
56.Abrahams E, Anderson PW, Licciardello DC, Ramakrishnan TV. Scaling Theory of Localization: Absence of Quantum Diffusion in Two Dimensions. Phys Rev Lett. 1979;42:673. [Google Scholar]
57.Shi F, Wang S, Forest MG, Mucha PJ. Percolation-Induced Exponential Scaling in the Large Current Tails of Random Resistor Networks. Multiscale Modeling Sim. 2013;11:1298. [Google Scholar]
58.Shi F, Wang S, Forest MG, Mucha PJ, Zhou R. Network-Based Assessments of Percolation-Induced Current Distributions in Sheared Rod Macromolecular Dispersions. Multiscale Modeling Sim. 2014;12:249. [Google Scholar]
59.Sekara V, Stopczynski A, Lehmann S. Fundamental Structures of Dynamic Social Networks. Proc Natl Acad Sci USA. 2016;113:9977. doi: 10.1073/pnas.1602803113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Newman MEJ. The Structure and Function of Complex Networks. SIAM Rev. 2003;45:167. [Google Scholar]

[R2] 2.Lewis K, Kaufman J, Gonzalez M, Wimmer A, Christakis N. Tastes, Ties, and Time: A New Social Network Dataset Using Facebook.com. Soc Networks. 2008;30:330. [Google Scholar]

[R3] 3.Holme P, Saramäki J. Temporal Networks. Phys Rep. 2012;519:97. [Google Scholar]

[R4] 4.Boccaletti S, Bianconi G, Criado R, Del Genio C, Gómez-Gardenes J, Romance M, Sendina-Nadal I, Wang Z, Zanin M. The Structure and Dynamics of Multilayer Networks. Phys Rep. 2014;544:1. doi: 10.1016/j.physrep.2014.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA. Multilayer Networks. J Complex Netw. 2014;2:203. [Google Scholar]

[R6] 6.Menichetti G, Remondini D, Bianconi G. Correlations Between Weights and Overlap in Ensembles of Weighted Multiplex Networks. Phys Rev E. 2014;90:062817. doi: 10.1103/PhysRevE.90.062817. [DOI] [PubMed] [Google Scholar]

[R7] 7.De Domenico M, Nicosia V, Arenas A, Latora V. Structural Reducibility of Multilayer Networks. Nat Commun. 2015;6:6864. doi: 10.1038/ncomms7864. [DOI] [PubMed] [Google Scholar]

[R8] 8.Kleineberg KK, Boguna M, Serrano MA, Papadopoulos F. Hidden Geometric Correlations in Real Multiplex Networks. Nat Phys. 2016;12:1076. doi: 10.1103/PhysRevLett.118.218301. [DOI] [PubMed] [Google Scholar]

[R9] 9.Stanley N, Shai S, Taylor D, Mucha PJ. Clustering Network Layers with the Strata Multilayer Stochastic Block Model. IEEE Trans Network Sci Eng. 2016;3:95. doi: 10.1109/TNSE.2016.2537545. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Taylor D, Shai S, Stanley N, Mucha PJ. Enhanced Detectability of Community Structure in Multilayer Networks through Layer Aggregation. Phys Rev Lett. 2016;116:228301. doi: 10.1103/PhysRevLett.116.228301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Nayar H, Miller BA, Geyer K, Caceres RS, Smith ST, Nadakuditi RR. Improved Hidden Clique Detection by Optimal Linear Fusion of Multiple Adjacency Matrices. 49th Asilomar Conference on Signals, Systems and Computers, 1520; New York: IEEE; 2015. [Google Scholar]

[R12] 12.Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science. 2010;328:876. doi: 10.1126/science.1184819. [DOI] [PubMed] [Google Scholar]

[R13] 13.Bassett DS, Wymbs NF, Porter MA, Mucha PJ, Carlson JM, Grafton ST. Dynamic Reconfiguration of Human Brain Networks During Learning. Proc Natl Acad Sci USA. 2011;108:7641. doi: 10.1073/pnas.1018985108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Chung F, Zhao W. A Sharp PageRank Algorithm with Applications to Edge Ranking and Graph Sparsification. Proceedings of the 2010 International Workshop on Algorithms and Models for the Web-Graph; London: Nature Publishing; 2010. pp. 2–14. [Google Scholar]

[R15] 15.Hill SM, et al. Inferring Causal Molecular Networks: Empirical Assessment through a Community-Based Effort. Nat Methods. 2016;13:310. doi: 10.1038/nmeth.3773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Clauset A, Moore C, Newman MEJ. Hierarchical Structure and the Prediction of Missing Links in Network. Nature (London) 2008;453:98. doi: 10.1038/nature06830. [DOI] [PubMed] [Google Scholar]

[R17] 17.Newman MEJ. Measurement Errors in Network Data. arXiv:1703.07376. [Google Scholar]

[R18] 18.Fortunato S. Community Detection in Graphs. Phys Rep. 2010;486:75. [Google Scholar]

[R19] 19.Rosvall M, Bergstrom CT. Maps of Random Walks on Complex Networks Reveal Community Structure. Proc Natl Acad Sci USA. 2008;105:1118. doi: 10.1073/pnas.0706851105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Lancichinetti A, Fortunato S, Radicchi F. Benchmark Graphs for Testing Community Detection Algorithms. Phys Rev E. 2008;78:046110. doi: 10.1103/PhysRevE.78.046110. [DOI] [PubMed] [Google Scholar]

[R21] 21.Sales-Pardo M, Guimera R, Moreira AA, Amaral LAN. Extracting the Hierarchical Organization of Complex Systems. Proc Natl Acad Sci USA. 2007;104:15224. doi: 10.1073/pnas.0703740104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Moody J. Peer Influence Groups: Identifying Dense Clusters in Large Networks. Soc Networks. 2001;23:261. [Google Scholar]

[R23] 23.Mavroeidis D, Batina L, van Laarhoven T, Marchiori E. PCA, Eigenvector Localization and Clustering for Side-Channel Attacks on Cryptographic Hardware Devices. Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Berlin: Springer; 2012. pp. 253–268. [Google Scholar]

[R24] 24.Ding Q, Katenka N, Barford P, Kolaczyk E, Crovella M. Intrusion as (Anti) Social Communication: Characterization and Detection. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York: ACM; 2012. pp. 886–894. [Google Scholar]

[R25] 25.Chen S, Gangopadhyay A. A Novel Approach to Uncover Health Care Frauds Through Spectral Analysis. IEEE International Conference on Healthcare Informatics; New York: IEEE; 2013. pp. 499–504. [Google Scholar]

[R26] 26.Alon N, Krivelevich M, Sudakov B. Finding a Large Hidden Clique in a Random Graph. Random Struct Algorithm. 1998;13:457. [Google Scholar]

[R27] 27.Nadakuditi RR. On Hard Limits of Eigen-Analysis Based Planted Clique Detection. IEEE Statistical Signal Processing Workshop; New York: IEEE; 2012. p. 129. [Google Scholar]

[R28] 28.Miller BA, Beard MS, Wolfe PJ, Bliss NT. Spectral Framework for Anomalous Subgraph Detection. IEEE Trans Signal Process. 2015;63:4191. [Google Scholar]

[R29] 29.Ghasemian A, Zhang P, Clauset A, Moore C, Peel L. Detectability Thresholds and Optimal Algorithms for Community Structure in Dynamic Networks. Phys Rev X. 2016;6:031005. [Google Scholar]

[R30] 30.Kawamoto T, Kabashima Y. Detectability of the Spectral Method for Sparse Graph Partitioning. Europhys Lett. 2015;112:40007. doi: 10.1103/PhysRevE.91.062803. [DOI] [PubMed] [Google Scholar]

[R31] 31.Decelle A, Krzakala F, Moore C, Zdeborová L. Inference and Phase Transitions in the Detection of Modules in Sparse Networks. Phys Rev Lett. 2011;107:065701. doi: 10.1103/PhysRevLett.107.065701. [DOI] [PubMed] [Google Scholar]

[R32] 32.Nadakuditi RR, Newman MEJ. Graph Spectra and the Detectability of Community Structure in Networks. Phys Rev Lett. 2012;108:188701. doi: 10.1103/PhysRevLett.108.188701. [DOI] [PubMed] [Google Scholar]

[R33] 33.Radicchi F. Detectability of Communities in Heterogeneous Networks. Phys Rev E. 2013;88:010801. doi: 10.1103/PhysRevE.88.010801. [DOI] [PubMed] [Google Scholar]

[R34] 34.Peixoto TP. Eigenvalue Spectra of Modular Networks. Phys Rev Lett. 2013;111:098701. doi: 10.1103/PhysRevLett.111.098701. [DOI] [PubMed] [Google Scholar]

[R35] 35.Chen PY, Hero AO. Phase Transitions in Spectral Community Detection. IEEE Trans Signal Process. 2015;63:4339. [Google Scholar]

[R36] 36.Fortunato S, Barthelemy M. Resolution Limit in Community Detection. Proc Natl Acad Sci USA. 2007;104:36. doi: 10.1073/pnas.0605965104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Chen PY, Hero AO. Multilayer Spectral Graph Clustering via Convex Layer Aggregation: Theory and Algorithms. arXiv:1708.02620. [Google Scholar]

[R38] 38.Anderson BDO, Moore JB. Optimal Filtering. Prentice-Hall; Englewood Cliffs, New Jersey: 1979. [Google Scholar]

[R39] 39.Newman MEJ, Girvan M. Finding and Evaluating Community Structure in Networks. Phys Rev E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]

[R40] 40.Benaych-Georges F, Nadakuditi RR. The Eigenvalues and Eigenvectors of Finite, Low Rank Perturbations of Large Random Matrices. Adv Math. 2011;227:494. [Google Scholar]

[R41] 41.Bai Z, Silverstein JW. Spectral Analysis of Large Dimensional Random Matrices. Springer; New York: 2010. [Google Scholar]

[R42] 42.Capitaine M, Donati-Martin C, Féral D. The Largest Eigenvalues of Finite Rank Deformation of Large Wigner Matrices: Convergence and Nonuniversality of the Fluctuations. Ann Prob. 2009;37:1. [Google Scholar]

[R43] 43.Hoeffding W. Probability Inequalities for Sums of Bounded Random Variables. J Am Stat Assoc. 1963;58:13. [Google Scholar]

[R44] 44.Méndez-Bermúdez JA, Alcazar-Lopez A, Martinez-Mendoza AJ, Rodrigues FA, Peron TKDM. Universality in the Spectral and Eigenfunction Properties of Random Networks. Phys Rev E. 2015;91:032122. doi: 10.1103/PhysRevE.91.032122. [DOI] [PubMed] [Google Scholar]

[R45] 45.Pastor-Satorras R, Castellano C. Distinct Types of Eigenvector Localization in Networks. Sci Rep. 2016;6:18847. doi: 10.1038/srep18847. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Martin T, Zhang X, Newman MEJ. Localization and Centrality in Networks. Phys Rev E. 2014;90:052808. doi: 10.1103/PhysRevE.90.052808. [DOI] [PubMed] [Google Scholar]

[R47] 47.Kawamoto T. Localized Eigenvectors of the Non-Backtracking Matrix. J Stat Mech. 2016:023404. [Google Scholar]

[R48] 48.Nassar H, Kloster K, Gleich DF. Strong Localization in Personalized Page Rank Vectors. Algorithms and Models for the Web Graph; Proceedings of the 2015 Workshop on Algorithms for the Web-Graph; Cham: Springer; 2015. pp. 190–202. [Google Scholar]

[R49] 49.Cucuringu M, Blondel VD, Van Dooren P. Extracting Spatial Information from Networks with Low-Order Eigenvectors. Phys Rev E. 2013;87:032803. [Google Scholar]

[R50] 50.Barucca P, Tantari D, Lillo F. Centrality Metrics and Localization in Core-Periphery Networks. J Stat Mech. 2016;2:023401. [Google Scholar]

[R51] 51.Suweis S. Effect of Localization on the Stability of Mutualistic Ecological Networks. Nat Commun. 2015;6:10179. doi: 10.1038/ncomms10179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Taylor D, Myers SA, Clauset A, Porter MA, Mucha PJ. Eigenvector-Based Centrality Measures for Temporal Networks. Multiscale Modeling Sim. 2017;15:537. doi: 10.1137/16M1066142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Méndez-Bermúdez JA, de Arruda GF, Rodrigues FA, Moreno Y. Scaling Properties of Multilayer Random Networks. Phys Rev E. 2017;96:012307. doi: 10.1103/PhysRevE.96.012307. [DOI] [PubMed] [Google Scholar]

[R54] 54.Murphy NB, Cherkaev E, Golden KM. Anderson Transition for Classical Transport in Composite Materials. Phys Rev Lett. 2017;118:036401. doi: 10.1103/PhysRevLett.118.036401. [DOI] [PubMed] [Google Scholar]

[R55] 55.Anderson PW. Localized Magnetic States in Metals. Phys Rev Lett. 1961;124:41. [Google Scholar]

[R56] 56.Abrahams E, Anderson PW, Licciardello DC, Ramakrishnan TV. Scaling Theory of Localization: Absence of Quantum Diffusion in Two Dimensions. Phys Rev Lett. 1979;42:673. [Google Scholar]

[R57] 57.Shi F, Wang S, Forest MG, Mucha PJ. Percolation-Induced Exponential Scaling in the Large Current Tails of Random Resistor Networks. Multiscale Modeling Sim. 2013;11:1298. [Google Scholar]

[R58] 58.Shi F, Wang S, Forest MG, Mucha PJ, Zhou R. Network-Based Assessments of Percolation-Induced Current Distributions in Sheared Rod Macromolecular Dispersions. Multiscale Modeling Sim. 2014;12:249. [Google Scholar]

[R59] 59.Sekara V, Stopczynski A, Lehmann S. Fundamental Structures of Dynamic Social Networks. Proc Natl Acad Sci USA. 2016;113:9977. doi: 10.1073/pnas.1602803113. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Super-Resolution Community Detection for Layer-Aggregated Multilayer Networks

Dane Taylor

Rajmonda S Caceres

Peter J Mucha

Abstract

I. INTRODUCTION

FIG. 1.

II. MODEL

A. Multilayer networks with planted small communities

B. Layer-aggregation methods

III. DETECTABILITY OF SMALL COMMUNITIES WITH EIGENVECTOR LOCALIZATION

A. Layer aggregation via summation

1. Random matrix theory for modularity matrices

2. Numerical validation and scaling behavior

FIG. 2.

FIG. 3.

B. Thresholding as a nonlinear data filter

1. Random matrix theory for modularity matrices

2. Numerical validation and super-resolution detection

FIG. 4.

FIG. 5.

FIG. 6.

IV. SMALL-COMMUNITY DETECTION IN TIME-VARYING NETWORKS

FIG. 7.

FIG. 8.

V. DISCUSSION

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Super-Resolution Community Detection for Layer-Aggregated Multilayer Networks

Dane Taylor

Rajmonda S Caceres

Peter J Mucha

Abstract

I. INTRODUCTION

FIG. 1.

II. MODEL

A. Multilayer networks with planted small communities

B. Layer-aggregation methods

III. DETECTABILITY OF SMALL COMMUNITIES WITH EIGENVECTOR LOCALIZATION

A. Layer aggregation via summation

1. Random matrix theory for modularity matrices

2. Numerical validation and scaling behavior

FIG. 2.

FIG. 3.

B. Thresholding as a nonlinear data filter

1. Random matrix theory for modularity matrices

2. Numerical validation and super-resolution detection

FIG. 4.

FIG. 5.

FIG. 6.

IV. SMALL-COMMUNITY DETECTION IN TIME-VARYING NETWORKS

FIG. 7.

FIG. 8.

V. DISCUSSION

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases