Significance
Real-world networks are complex, comprising vast webs of interconnected elements performing a diverse array of social and biological functions. Common among many networks, however, is the pressure to be efficiently compressed—either in the brain or in the genetic code. But just as files on a computer can be compressed to differing degrees, what makes one network more compressible than another? To answer this question, we adapt tools from information theory to quantify the compressibility of a network. Studying real-world and model networks, we find that hierarchical organization—with tight clustering and heterogeneous degrees—increases compressibility, enabling compressed representations across scales. Generally, our framework provides an information-theoretic method for investigating the interplay between network structure and compression.
Keywords: information theory, complex networks, rate distortion, compression
Abstract
Many complex networks depend upon biological entities for their preservation. Such entities, from human cognition to evolution, must first encode and then replicate those networks under marked resource constraints. Networks that survive are those that are amenable to constrained encoding—or, in other words, are compressible. But how compressible is a network? And what features make one network more compressible than another? Here, we answer these questions by modeling networks as information sources before compressing them using rate-distortion theory. Each network yields a unique rate-distortion curve, which specifies the minimal amount of information that remains at a given scale of description. A natural definition then emerges for the compressibility of a network: the amount of information that can be removed via compression, averaged across all scales. Analyzing an array of real and model networks, we demonstrate that compressibility increases with two common network properties: transitivity (or clustering) and degree heterogeneity. These results indicate that hierarchical organization—which is characterized by modular structure and heterogeneous degrees—facilitates compression in complex networks. Generally, our framework sheds light on the interplay between a network’s structure and its capacity to be compressed, enabling investigations into the role of compression in shaping real-world networks.
Complex networks are often encoded in biology and, thereby, utilized and replicated by biological systems. The brain encodes language (1), knowledge (2), music (3), social (4, 5), and transportation networks (6); the human mind uses these internal representations to engage in linguistic communication, build on existing understanding, sing a victorious melody, strengthen a valuable friendship, and walk the covered holloways (7). Similarly, biological networks among molecular and cellular components are encoded at various scales in genetic material (8–11), and evolution uses these encodings to propagate network topologies in a surviving species. From brains to genes, the biological materials that encode complex networks operate under marked constraints on time, energy, metabolism, and physical extent, among others. Such constraints determine which networks persist into the future—in particular, those whose topology can be efficiently encoded. These shared constraints raise a fundamental question: How does the structure of a network facilitate efficient encodings?
Encoding a network (indeed, encoding any piece of information) involves a natural trade-off between simplicity and accuracy. One could construct a simple representation that omits the fine-scale details of a network. Or one could build a representation that captures a network’s intricate structure, but is complicated and unwieldy. An efficient encoding strikes an optimal balance between simplicity and accuracy; that is, it is a compression (12, 13). In fact, compression—a foundational branch of information theory—has provided key insights into optimal network representations, yielding principled algorithms for constructing coarse-grained maps of complex systems (14–16).
Building upon this progress, here, we investigate how the structure of complex networks facilitates compression. Intuitively, just as natural images are easier to compress than white noise due to their visual patterns and regularities, so, too, should networks with strong structural regularities be more compressible than random networks. But do homogeneous topologies, such as those found in lattice-like networks, make systems more compressible, or is compression facilitated by the hierarchical organization found in many real networks? To answer these questions, here, we develop a framework for quantifying the compressibility of complex networks. Applying our framework to several real and model networks, we identify specific network features that facilitate compression. Together, these results elucidate how a network’s topology impacts its compressibility and suggest that many real-world networks may be shaped by the pressure to be compressed.
Rate-Distortion Theory of Network Clustering
In compression (13), one begins with an information source, a sequence of items that defines the object of interest. For networks, the details of information flow often vary from one context to another. Therefore, a logical choice for the information source is a random walk, which contains all of the details about the structure of a network and nothing more (14).
One then seeks to reduce the amount of information in the sequence, which can be accomplished in two complementary, yet distinct, ways. In lossless compression, one removes statistical redundancy in the sequence while maintaining an exact description of the network. This approach has provided important information-theoretic perspectives on the problem of community detection, wherein one constructs a coarse-grained representation at a specific scale of description (14). By contrast, here, we seek to quantify the compressibility of a network itself, without selecting a desired scale. To do so, we employ rate-distortion theory, the foundation of lossy compression. In lossy compression, rather than removing statistical redundancy in the sequence, one instead removes redundant features of the network directly. As we will see, directly coarse-graining the network will enable tractable strategies for compressing networks across all scales and, in doing so, will allow us to develop an intuitive definition for network compressibility.
Compressing Random Walks.
To see how compression unfolds in practice, consider the network in Fig. 1A. A random walk on the network defines a sequence of nodes , with each node transitioning to one of its four neighbors uniformly at random. The rate at which this sequence generates information is given by the entropy , which (because there are four possible nodes at each step) equals 2 bits (see Materials and Methods for a definition of ). To reduce the amount of information in the sequence, we can construct a coarse-grained representation by clustering nodes together (14–16). This clustering yields a new sequence , where is the cluster containing node (Fig. 1B), which communicates information at a rate equal to the mutual information (12, 13, 15, 16). If the clusters are chosen deterministically, as is common (4, 14, 17), then the conditional entropy vanishes, and the information rate simplifies to the entropy of the clustered sequence, .
Fig. 1.
Rate-distortion theory of random walks on networks. (A) A simple network with nodes, each with constant degree . A random walk generates information at a rate of bits. (B) Network clusterings across various scales of description. (B, Top) For clusters, each containing its own node, the sequence communicates bits of information. (B, Middle) For clusters, each corresponding to one of the three modules in the original network, the information rate is bits. (B, Bottom) For cluster containing the entire network, the sequence no longer communicates information. (C) Schematic of the optimal information rate as a function of the scale of description for networks that are either more compressible (black) or less compressible (gray). For more compressible networks, one can achieve a lower information rate at a given scale of description (vertical line), and one can achieve a more fine-grained description for a given information rate (horizontal line).
Consider, for example, a trivial clustering in which each node belongs to its own cluster (Fig. 1 B, Top). In this case, we maintain a complete description of the network, but we have not reduced the information rate, since bits. By contrast, consider the opposite setting in which all nodes belong to the same large cluster (Fig. 1 B, Bottom). Now, we have reduced the information rate to zero ( bits), but all details about the network structure have been lost. Between these two extremes lies a range of clusterings (such as that in Fig. 1 B, Middle), each inducing its own information rate and yielding a unique distortion of the network structure.
Scale as a Measure of Distortion.
Building representations that strike an optimal balance between minimizing information rate while also minimizing distortion is precisely the purview of rate-distortion theory (12, 13). As in any rate-distortion problem, one must choose a specific definition for the distortion of the object of interest. When clustering a network, a natural choice for the distortion presents itself: the scale of description. Specifically, for a network with nodes and a clustering with clusters, we define the scale to be . For example, if , then we have an exact fine-grained description of the network at a scale (Fig. 1 B, Top), whereas if , then one cluster encloses the entire network and (Fig. 1 B, Bottom).
At each scale (equivalently, for each number of clusters ), we seek to identify the clustering that minimizes the information rate . This optimal information rate, denoted , defines a unique rate-distortion curve for each network (Fig. 1C). If a network is easier to compress, then at each scale , one should be able to find a clustering that is more efficient, thereby reducing the information rate (Fig. 1C, vertical line); similarly, for a given information rate , one should be able to construct a more fine-grained clustering, thereby decreasing the scale (Fig. 1C, horizontal line). Thus, in order to quantify the compressibility of a network, we must first be able to compute its rate-distortion curve.
Computing the Rate-Distortion Curve of a Network
Computing the rate-distortion curve of a network—in particular, doing so efficiently to enable applications to large systems—poses two distinct challenges. First, we must estimate the mutual information for different clusterings; and second, we must identify the clusterings that minimize this information rate across all scales.
Although estimating mutual information is generally difficult (18), the simplicity of our setup allows for tractable upper and lower bounds (Materials and Methods). Of particular interest is the upper bound , which follows by approximating the clustered sequence as Markovian [a property that we note is not guaranteed, even though the original random walk is Markovian (13)]. Rather than minimizing the information rate directly, we instead minimize the upper bound , thereby yielding an upper bound on the rate-distortion curve. For simplicity, in what follows, we often refer to as the information rate and as the rate-distortion curve.
To compute —that is, to find clusterings that minimize the information rate —we employ a greedy clustering algorithm. Beginning with clusters, each containing its own node, we combine the pair of clusters that yields the largest reduction in the information rate . Repeating this agglomerative process across all scales (until only one cluster remains), we arrive at an estimate for the rate-distortion curve . To speed up the calculation, rather than searching through all pairs of clusters at each step, we only consider a limited number of pairs chosen via principled heuristics (Materials and Methods). Importantly, these heuristics do not affect the definitions of information-theoretic quantities, such as the rate and upper bound . In practice, not only do these heuristics enable applications to networks of approximately nodes, they also improve the accuracy of the rate-distortion estimates themselves (SI Appendix, Fig. S1).
We are now prepared to compute the rate-distortion curve for a specific system. In Fig. 2A, we plot the upper and lower bounds on the rate-distortion curve for Zachary’s karate club network (19). As is true for all networks (Materials and Methods), the two bounds are exact at both the minimum scale (when the information rate simply equals the entropy of random walks ) and the maximum scale (when the information rate is zero). Moreover, the two bounds remain close across all intermediate scales (Fig. 2A), demonstrating that the upper bound provides a good approximation to the true rate-distortion curve . To understand how the rate-distortion curve depends on the structure of a network, however, it helps to examine the properties of optimal compressions themselves.
Fig. 2.
Properties of optimal clusterings. (A, Left) Upper bound (solid line) and lower bound (dashed line) on the optimal information rate as a function of the scale of description for Zachary’s karate club network (19). (A, Right) Across all scales, the optimal compression includes one large cluster, which we illustrate for , 0.5, and 0.75. (B) Size of the largest cluster in a compression, normalized by the size of the network , as a function of the scale for the real networks in SI Appendix, Table S1 (20–23). The median over real networks (solid line) matches the largest possible normalized cluster size, , indicating that (across all scales) most networks admit one large cluster of maximal size. (C) Illustration of edges within the one large cluster (blue), on the boundary of the cluster (purple), and outside the cluster (red). (D) Fraction of the edges emanating from the large cluster that either connect to nodes outside the cluster (purple) or remain within the cluster (blue) as a function of the scale . (E) Average degree of nodes inside (blue) and outside (red) the large cluster, normalized by the average degree of the network, as a function of the scale . In D and E, solid lines and shaded regions represent averages and one-SD error bars, respectively, over the real networks (SI Appendix, Table S1) (20–23), and dashed lines correspond to clusters with nodes selected at random.
Properties of Optimal Compressions
Using the framework developed above, we are ultimately interested in studying compression in real systems. The networks chosen for analysis span from communication networks (including semantic, language, and music networks) and information networks (including hyperlinks on the web and citations in science) to social networks, animal and protein interactions, transportation networks, and structural and functional connections in the brain (Materials and Methods; SI Appendix, Table S1) (20–23). Although these networks encompass a wide range of systems bridging several orders of magnitude in size, they are all encoded biologically, either in genetic material or in the neural code.
Emergence of One Large Cluster.
To begin, we compute the rate-distortion curve for each of the above networks, and we confirm that these upper bounds provide good approximations to the true rate-distortion curves (SI Appendix, Fig. S2). In the process of computing , our compression algorithm also provides estimates for the optimal clusterings over all scales. Examining the structure of these compressions, we find a striking consistency across different networks. As can be observed in Zachary’s karate club (Fig. 2 A, Right), rather than dividing the network into multiple clusters of moderate size, optimal compressions tend to comprise one large cluster containing nodes and minimal clusters each containing one node. In fact, among the networks studied, this tendency to form one large cluster is a nearly ubiquitous feature of optimal compressions (Fig. 2B).
We remark that the clustering that minimizes the information rate need not (and, indeed, does not) provide a faithful characterization of a network’s community structure, as is the goal in community detection (14–16). Instead, we find that optimal compressions seek to identify the group of nodes that can be combined to maximally reduce the information rate. By dividing the network into two parts—one inside the large cluster and the other outside—the challenge of compressing random walks thus resembles the graph-partitioning problem (24), which has generated key insights about the modular structure of networks across scales (17). This simplification, in turn, allows us to develop analytic predictions about the properties of optimal compressions and the structures of compressible networks.
Information Rate of Optimal Compressions.
Although our framework is general, applying to any weighted, directed network (Materials and Methods), in order to make analytic progress, here, we focus on the special case of an unweighted, undirected network with adjacency matrix . For such a network, the entropy of random walks takes the simple form , where is the degree of node , is the number of edges in the network, and is base two such that information is measured in bits.
Now consider forming one large cluster . One can show (Materials and Methods) that the information rate of the clustered network is given by
| [1] |
where is the sum of the degrees of the nodes in , is the number of edges connecting nodes in to a given node , and is the number of edges connecting nodes within .
Information Content of Different Edges.
Using Eq. 1, can we predict the properties of the optimal cluster ? More broadly, can we anticipate the types of network topologies that facilitate compression? To answer these questions, it helps to group the edges in a network into three distinct categories (Fig. 2C): those connecting nodes within , those connecting nodes outside of , and those on the boundary of (connecting nodes within to nodes outside of ). We can gauge which type of edge is preferred over the others by comparing their contributions to the information rate (Eq. 1). An optimal compression will maximize the number of edges that are informationally preferred (contributing only weakly to the information rate), while limiting edges that are informationally costly.
For example, adding an edge within increases the information rate by . By contrast, adding an edge on the boundary of (say, connecting to a node ) yields an increase of roughly . For a large cluster , we have , from which one can show that (SI Appendix). Thus, edges within the large cluster are informationally preferred to those on the boundary, suggesting that the large cluster will seek to combine groups of nodes that are tightly connected to one another and sparsely connected to the rest of the network. Indeed, in real networks, we find that among the edges emanating from the large cluster, the proportion that connects to the rest of the network is much smaller than chance (Fig. 2D). This proportion of edges leaving the cluster is a well-studied quantity, known as the conductance or Cheeger constant of a network (17). Thus, networks with low conductance—such as those with modular structure and strong transitivity (the tendency for nodes to form triangles, also known as clustering)—should be highly compressible (17, 25). This is our first hypothesis about the impact of network structure on compressibility.
We now consider an edge connecting two nodes and outside of , which increases the information rate by approximately . As before, one can show that (SI Appendix), hence demonstrating that edges within the large cluster are informationally preferred to those outside the cluster. In turn, this preference for the large cluster to include as many edges as possible suggests that will favor high-degree nodes over low-degree nodes, which we confirm in real networks (Fig. 2E). This result leads to our second hypothesis: Networks should be more compressible if they have heterogeneous degrees (or heavy-tailed degree distributions), containing “rich clubs” of high-degree hub nodes (26, 27). Given the predictions that modular and heterogeneous topologies facilitate compression, we now propose a quantitative definition for the compressibility of a network.
Quantifying Network Compressibility
Intuitively, a network should be compressible if one can achieve a large reduction in the information rate at a given scale (Fig. 1C). However, rather than choosing a specific scale (equivalently, a specific number of clusters ), we would like our definition of compressibility to be a property of the network itself. We therefore define the compressibility of a network to be the amount of information that can be removed via compression, averaged across all scales,
| [2] |
Visually, the compressibility represents the area above a network’s rate-distortion curve (Fig. 3A). In practice, plugging our tractable upper bound on the rate-distortion curve into Eq. 2 yields a lower bound , which (for simplicity) we will refer to as compressibility.
Fig. 3.
Quantifying compressibility. (A) The compressibility of a network (shaded region) is the area between the rate-distortion curve (solid line) and the entropy of random walks (dashed line). (B) A -regular network, characterized only by the requirement that all nodes have constant degree . (C) Rate-distortion curves for -regular networks with different degrees . (D) Compressibility of -regular networks versus degree . In C and D, solid lines and data points are averages over 50 randomly generated networks, each of size , and dashed lines indicate analytic predictions (Eqs. 3 and 4). (E) Compressibility versus average degree for the real networks in SI Appendix, Table S1 (20–23). We note that average degree is plotted on a log scale. Dashed line indicates a logarithmic fit. For networks of size , data points and error bars represent means and SDs over 50 randomly sampled subnetworks of nodes each (Materials and Methods).
To make the notion of compressibility concrete, consider the class of random -regular networks (Fig. 3B). On average, these networks have no structure (besides the requirement that nodes have uniform degree ), which allows us to derive an analytic approximation for the rate-distortion curve (SI Appendix),
| [3] |
Each individual network, however, contains small structural variations, such as groups of nodes that are more tightly connected than expected. Generating random -regular networks and computing their rate-distortion curves directly, we find that optimal compressions are able to capitalize on these structural variations (SI Appendix, Fig. S3), thereby achieving lower information rates than the approximation in Eq. 3 (Fig. 3C). By contrast, as the degree increases, the networks become uniform in structure, and the analytic approximation becomes exact (Fig. 3C).
Using Eq. 3, one can predict the compressibility of -regular networks. Specifically, noting that the entropy of -regular networks is (Materials and Methods), and approximating the average in Eq. 2 by an integral over , we arrive at the analytic form
| [4] |
which we verify numerically (Fig. 3D). We note that the compressibility grows logarithmically with the degree , reflecting the fact that networks with larger degrees have more information to be removed via compression (Materials and Methods). Indeed, computing the compressibility of the real networks in SI Appendix, Table S1 (20–23), we find precisely the same logarithmic dependence on the average degree (Fig. 3E). Furthermore, we verify that this logarithmic dependence generalizes to directed versions of the networks (SI Appendix, Fig. S5) and is not simply due to our clustering heuristics (SI Appendix, Fig. S6). These results demonstrate that the compressibility of a network increases predictably with average degree. But how does compressibility depend on the topology of a complex network?
Impact of Network Structure on Compressibility
Based on the properties of optimal compressions (Fig. 2), we hypothesized that the compressibility of a network should increase with both 1) transitivity and 2) degree heterogeneity. To investigate the impact of transitivity on compressibility, we consider a class of stochastic block networks (Fig. 4A), wherein nodes are grouped into modules of equal size, and a specified fraction of the edges in the network connect nodes within the same module. We find that optimal compressions take advantage of this modular structure by clustering together nodes within the same module (SI Appendix, Fig. S3). Indeed, strengthening the modular structure—that is, increasing the fraction of within-module edges—decreases the rate-distortion curve (Fig. 4B). We therefore find that compressibility increases with both modularity (Fig. 4C) and transitivity (Fig. 4D). Importantly, these results on stochastic block networks generalize to real networks, with increases in transitivity yielding significant improvements in network compressibility (Fig. 4E).
Fig. 4.
Compressibility increases with transitivity and degree heterogeneity. (A) Stochastic block network, characterized by dense connectivity within modules and sparse connectivity between modules. (B) Rate-distortion curves for Erdös-Rényi (ER) networks (black line) and stochastic block networks (colored lines) with 10 modules and different fractions of within-module edges. Undulations in the rate-distortion curves result from compressing each of the 10 modules (SI Appendix, Fig. S3). (C) Compressibility of stochastic block networks versus the fraction of within-module edges . (D) Compressibility of stochastic block networks (colored points) and Erdös-Rényi networks (black point) versus transitivity (quantified by the average clustering coefficient). In B–D, data reflect averages over 50 randomly generated networks, each of size and average degree . (E) Compressibility versus transitivity for the real networks in SI Appendix, Table S1 (20–23) with a linear best fit (dashed line). (F) Scale-free network, characterized by a power-law degree distribution and the presence of high-degree hubs. (G) Rate-distortion curves for Erdös-Rényi networks (black line) and scale-free networks (colored lines) with different scale-free exponents . (H) Compressibility of scale-free networks versus the scale-free exponent . (I) Compressibility of scale-free networks (colored points) and Erdös-Rényi networks (black point) versus degree heterogeneity . In G–I, data reflect averages over 50 networks generated by using the static model (28), each of size and average degree . (J) Compressibility versus degree heterogeneity for the real networks in SI Appendix, Table S1 (20–23) with a linear best fit (dashed line). In E and J, for networks of size , data points and error bars represent means and SDs over 50 randomly sampled subnetworks of nodes each (Materials and Methods).
To examine the dependence of compressibility on degree heterogeneity, we study scale-free networks (Fig. 4F), which have heavy-tailed degree distributions characterized by a power-law exponent (27). Optimal compressions exploit this heterogeneous structure by clustering together high-degree hub nodes (SI Appendix, Fig. S3). As decreases, accentuating the heterogeneity in node degrees, the rate-distortion curve increases at small scales and decreases at intermediate and large scales (Fig. 4G). Both of these rate-distortion effects serve to improve the compressibility of scale-free networks (Fig. 4H). Moreover, rather than indirectly investigating the impact of heavy-tailed structure via the scale-free exponent , we can directly quantify the degree heterogeneity of a given network , where is the absolute difference in degrees averaged over all pairs of nodes and is the average degree. We find that the compressibility of scale-free networks grows linearly with degree heterogeneity (Fig. 4I), a result that generalizes to real networks (Fig. 4J). Furthermore, we confirm that the dependencies of compressibility on both transitivity and degree heterogeneity extend to directed networks (SI Appendix, Fig. S5) and are robust to our choice of clustering heuristics (SI Appendix, Fig. S6).
The above results demonstrate that network compressibility increases with both transitivity and degree heterogeneity, the two defining features of hierarchical structure (29). Indeed, in networks with explicit hierarchical organization (such as those examined in ref. 29), we verify that optimal compressions capitalize on both modular structure and heterogeneous degrees in order to reduce the information rate (SI Appendix, Fig. S3). The high compressibility of hierarchical networks highlights a key distinction between lossy and lossless compression. In lossless compression, a network is more compressible if it has lower entropy , thereby admitting a more concise exact encoding (12, 13). The networks with the lowest entropies (and therefore the highest compressibilities from a lossless perspective) are those with homogeneous structure, such as Erdös-Rényi and -regular networks (30). By contrast, lossy compression exploits structural regularities to remove redundant features of a network (Fig. 2), much like real-space renormalization (31). This direct coarse-graining renders hierarchical networks, which have strong structural regularities, highly compressible; similarly, it renders homogeneous networks, which have little to no structure, highly incompressible (Fig. 4 and SI Appendix, Fig. S3).
Finally, by focusing on specific families of networks, we discover variations in compressibility that reflect a network’s specific function. Road networks, for example, exhibit the lowest transitivity and degree heterogeneity, and therefore the lowest compressibility, among the networks studied. This low compressibility is likely due to the fact that, unlike the other networks, road networks are confined to exist in two dimensions, severely constraining their topology (32). Besides road networks, we find that protein interactions have the lowest transitivity and brain networks have the lowest degree heterogeneity, leading both classes of networks to be relatively incompressible. Interestingly, these two families are unique among the networks studied in that they are only encoded genetically and need not be represented cognitively by a human or animal. By contrast, language networks are highly compressible, perhaps reflecting the primary function of language as a means for encoding and communicating information. Thus, although many networks are encoded biologically, the pressure for these encodings to be efficient manifests to varying degrees in different families of networks, yielding a spectrum of compressibilities.
Discussion
Complex networks perform an astonishing array of functions, which are supported by a multitude of topological structures. Many networks, however, are unified by a common constraint: that they rely on biological entities to encode them and pass them on. Encoding a network efficiently—that is, striking an optimal balance between simplicity and accuracy—requires compression, an insight that has provided information-theoretic perspectives on network structure (14–16). Naturally, some networks should be more compressible than others, with structural regularities enabling efficient representations across multiple scales. To investigate this hypothesis, here, we introduce a rate-distortion theory of network compression (Fig. 1) and propose a quantitative definition for the compressibility of a network (Eq. 2; Fig. 3A).
Applying our framework to a number of real and model networks, we demonstrate that network compressibility increases with both transitivity and degree heterogeneity (Fig. 4). Importantly, these two features are frequently observed across an array of real-world networks, from social, scientific, and biological interactions (29, 33, 34) to the internet (2), language (29), music (35), and the brain (36). Moreover, the combination of transitivity (with tightly connected modules) and heterogeneous degrees (with well-connected hubs) defines hierarchical organization (29), which has been shown to support multiscale representations of complex networks (37, 38) and enable efficient information processing in neural and communication systems (30, 39). In fact, when encoding information about the world, the brain itself often employs hierarchical representations (40–42). Our results lend to these perspectives an additional outlook on the role of hierarchical structure: that it supports the efficient compression of complex networks.
The interplay between network structure and compressibility opens the door for a number of future directions. For example, given that transitivity and heterogeneous degrees are nearly ubiquitous features of information, social, and biological networks (2, 29, 33–36), it is tempting to suspect that these networks have been shaped, at least in part, by the pressure to be compressed. Future work could directly address this hypothesis by investigating whether real-world networks, from language and music to protein interactions and the internet, have evolved over time to become more compressible. From a complementary perspective, one could develop methods for designing artificial networks that are optimally compressible. What might such optimally compressible networks look like? And how close to optimal are the networks that we observe in nature and society? The framework presented here provides the quantitative tools to begin answering these questions.
Materials and Methods
Entropy of Random Walks.
Given a (possibly weighted, directed) network with adjacency matrix , the probability of one node transitioning to another node in a random walk is , where is the (out) degree of node (Fig. 1A). The entropy of random walks is given by
| [5] |
where is the stationary distribution defined by the condition (which we note is uniquely defined if the network is strongly connected and aperiodic). For undirected networks, Eq. 5 simplifies significantly. In this case, the stationary distribution is proportional to the node degrees , where is the number of edges in the network, and, thus, the entropy takes the form
| [6] |
If, in addition, the nodes have uniform degree (as in the -regular networks in Fig. 3), then the entropy equals . For example, in the simple network in Fig. 1, the nodes have uniform degree four, and thus the entropy is 2 bits.
Bounding the Information Rate.
After clustering a network, a random walk gives rise to a new sequence , where is the cluster containing node (Fig. 1B). The information rate of this sequence is given by the mutual information , which for deterministic clusterings (such as those considered here) is equivalent to the entropy . However, even though the random walk is Markovian (yielding a simple form for the entropy [Eq. 5]), the clustered sequence need not be (13), and, thus, it is generally difficult to derive an analytic form for .
Despite this hurdle, there exist simple bounds on the information rate , summarized by the inequalities
| [7] |
where and are the conditional entropies of on and , respectively (13). These bounds are tight at the minimum scale , when each cluster contains one node, and so . The bounds are also tight at the maximum scale , when there is one cluster, and so .
To compute the lower bound at intermediate scales, we begin with the conditional probability of node in the random walk transitioning to cluster in the clustered sequence , . Then, the lower bound is given by
| [8] |
where the second sum runs over all clusters . Similarly, to compute the upper bound, we consider the probability of one cluster transitioning to another cluster ,
| [9] |
where is the stationary distribution over clusters. We then arrive at the following upper bound,
| [10] |
which is exact if the clustered sequence is Markovian. In practice, when estimating the optimal information rate for a network, we minimize the upper bound in Eq. 10 over clusterings, resulting in an upper bound on the rate-distortion curve.
The upper bound simplifies significantly for unweighted, undirected networks. In this case, the cluster transition probabilities take the form , where is the induced network of clusters and is the sum of the degrees of the nodes in . Recalling that the stationary distribution simplifies to , one can manipulate Eq. 10 into the form
| [11] |
Under the further simplification of a clustering with one large cluster and minimal clusters of one node each (Fig. 2), this upper bound can be fashioned into Eq. 1.
Clustering Algorithm.
To compute the rate-distortion curve , we use an agglomerative clustering algorithm. Beginning with clusters (corresponding to the minimum scale ), each containing an individual node, we iteratively combine pairs of clusters until we eventually arrive at one large cluster containing the entire network (corresponding to the maximum scale ). At each step, we greedily select the pair of clusters to combine that minimizes the information rate (Eq. 10). However, rather than searching through all pairs of clusters at each iteration (which would limit applications to small networks), we instead focus on a subset of pairs chosen through one of two heuristics.
The first heuristic, motivated by the observation that optimal clusterings tend to combine clusters with large degrees (Fig. 2E), selects the pairs of clusters and with the largest combined stationary probabilities . For unweighted, undirected networks, we note that this choice is equivalent to selecting the pairs of clusters with the largest combined degrees, since . The second heuristic, motivated by the fact that optimal compressions tend to form clusters with tight intracluster connectivity (Fig. 2D), selects the pairs of clusters and with the largest combined joint transition probabilities . For unweighted, undirected networks, we remark that this second heuristic is equivalent to selecting the pairs of clusters with the largest number of connecting edges, since . In practice, we consider pairs of clusters at each iteration. In SI Appendix, Fig. S1, we compare these two heuristics to the brute-force approach that searches through all pairs of clusters at each iteration of the clustering algorithm. In addition to significantly speeding up the algorithm, we find that these two heuristics often yield more accurate estimates of the rate-distortion curve than the brute-force implementation.
Network Datasets.
The networks analyzed in this paper are listed and described in SI Appendix, Table S1 (20–23). While we study unweighted, undirected versions of the networks in Figs. 2, 3E, and 4 E and J, similar results hold for directed versions of the networks (SI Appendix, Figs. S2 and S3). For networks of size , we perform analyses directly. For larger networks with , we analyze 50 subnetworks of nodes each. Each subnetwork is generated by performing a random walk beginning at a randomly selected node until nodes have been reached. This sampling method has been shown to give accurate estimates of network statistics (43).
Data and Code Availability.
The data analyzed in this paper and the code used to perform the analyses are openly available at GitHub (https://github.com/ChrisWLynn/Network_compressibility).
Citation Diversity Statement.
Recent work in several fields of science has identified a bias in citation practices such that papers from women and other minorities are undercited relative to the number of such papers in the field (44–49). Here, we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, and other factors. We obtained predicted gender of the first and last author of each reference by using databases that store the probability of a name being carried by a woman (48, 50). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain woman(first)/woman(last), man/woman, woman/man, and man/man. This method is limited in that 1) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity; and 2) it cannot account for intersex, nonbinary, or transgender people. Second, we obtained the predicted racial/ethnic category of the first and last author of each reference by databases that store the probability of a first and last name being carried by an author of color (51, 52). By this measure (and excluding self-citations), our references contain author of color(first)/author of color(last), white author/author of color, author of color/white author, and white author/white author. This method is limited in that 1) names, Census entries, and Wikipedia profiles used to make the predictions may not be indicative of racial/ethnic identity; and 2) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.
Supplementary Material
Acknowledgments
We thank Christopher Kroninger, Dr. Lia Papadopoulos, Dr. Pragya Srivastava, Mathieu Ouellet, and Dale Zhou for helpful feedback on earlier versions of this manuscript. C.W.L. was supported by the James S. McDonnell Foundation Century Science Initiative Understanding Dynamic and Multiscale Systems–Postdoctoral Fellowship Award. This work was also supported by the John D. and Catherine T. MacArthur Foundation; the Institute for Scientific Interchange Foundation; the Paul G. Allen Family Foundation; Army Research Laboratory Grant W911NF-10-2-0022; Army Research Office Grants Bassett-W911NF-14-1-0679, Falk-W911NF-18-1-0244, Grafton-W911NF-16-1-0474, and DCIST-W911NF-17-2-0181; the Office of Naval Research; National Institute of Mental Health Grants 2-R01-DC-009209-11, R01-MH112847, R01-MH107235, and R21-M MH-106799; and NSF Grants PHY-1554488, BCS-1631550, and NCS-FO-1926829.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission. A.-L.B. is a guest editor invited by the Editorial Board.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2023473118/-/DCSupplemental.
References
- 1.Sizemore A. E., Karuza E. A., Giusti C., Bassett D. S., Knowledge gaps in the early growth of semantic feature networks. Nat. Hum. Behav. 2, 682 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Vázquez A., Pastor-Satorras R., Vespignani A., Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130 (2002). [DOI] [PubMed] [Google Scholar]
- 3.Liu X. F., Chi K. T., Small M., Complex network structure of musical compositions: Algorithmic generation of appealing music. Physica A 389, 126–132 (2010). [Google Scholar]
- 4.Girvan M., Newman M. E. J., Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, 7821–7826 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brush E. R., Krakauer D. C., Flack J. C., Conflicts of interest improve collective computation of adaptive social structures. Sci. Adv. 4, e1603311 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kalakoski V., Saariluoma P., Taxi drivers’ exceptional memory of street names. Mem. Cognit. 29, 634–638 (2001). [DOI] [PubMed] [Google Scholar]
- 7.Lynn C. W., Bassett D. S., How humans learn and represent networks. Proc. Natl. Acad. Sci. U.S.A. 117, 29407–29415 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gavin A.-C., et al. , Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006). [DOI] [PubMed] [Google Scholar]
- 9.Lynn C. W., Bassett D. S., The physics of brain network structure, function and control. Nat. Rev. Phys. 1, 318–332 (2019). [Google Scholar]
- 10.Vértes P. E., et al. , Gene transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic resonance imaging networks. Philos. Trans. R. Soc. B 371, 20150362 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Whitaker K. J., et al. , Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc. Natl. Acad. Sci. U.S.A. 113, 9105–9110 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shannon C. E., A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948). [Google Scholar]
- 13.Cover T. M., Thomas J. A., Elements of Information Theory (John Wiley & Sons, Hoboken, NJ, 2012). [Google Scholar]
- 14.Rosvall M., Bergstrom C. T., Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. U.S.A. 105, 1118–1123 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rosvall M., Bergstrom C. T., An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. U.S.A. 104, 7327–7331 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Slonim N., Atwal G. S., Tkačik G., Bialek W., Information-based clustering. Proc. Natl. Acad. Sci. U.S.A. 102, 18297–18302 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Leskovec J., Lang K. J., Dasgupta A., Mahoney M. W., Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 29–123 (2009). [Google Scholar]
- 18.Archer E., Park I. M., Pillow J. W., Bayesian and quasi-Bayesian estimators for mutual information from discrete data. Entropy 15, 1738–1755 (2013). [Google Scholar]
- 19.Zachary W. W., An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977). [Google Scholar]
- 20.Lynn C. W., Bassett D. S., Network compressibility. Github. https://github.com/ChrisWLynn/Network_compressibility. Deposited 10 November 2020.
- 21.Kunegis J., KONECT: The Koblenz network collection. KONECT. http://konect.cc/. Accessed 1 August 2020.
- 22.Leskovec J., Krevl A., Stanford Large Dataset Collection. SNAP. https://snap.stanford.edu/data. Accessed 1 August 2020.
- 23.Batagelj V., Mrvar A., Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/. Accessed 1 August 2020.
- 24.Buluç A., Meyerhenke H., Safro I., Sanders P., Schulz C., “Recent advances in graph partitioning” in Algorithm Engineering, Kliemann L., Sanders P., Eds. (Lecture Notes in Computer Science, Springer, Cham, Switzerland, 2016), vol. 9220, pp. 117–158. [Google Scholar]
- 25.Newman M. E. J., Reinert G., Estimating the number of communities in a network. Phys. Rev. Lett. 117, 078301 (2016). [DOI] [PubMed] [Google Scholar]
- 26.Benson A. R., Gleich D. F., Leskovec J., Higher-order organization of complex networks. Science 353, 163–166 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Barabási A.-L., Albert R., Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
- 28.Goh K.-I., Kahng B., Kim D., Universal behavior of load distribution in scale-free networks. Phys. Rev. Lett. 87, 278701 (2001). [DOI] [PubMed] [Google Scholar]
- 29.Ravasz E., Barabási A.-L., Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003). [DOI] [PubMed] [Google Scholar]
- 30.Lynn C. W., Papadopoulos L., Kahn A. E., Bassett D. S., Human information processing in complex networks. Nat. Phys. 16, 965–973 (2020). [Google Scholar]
- 31.Efrati E., Wang Z., Kolan A., Kadanoff L. P., Real-space renormalization in statistical mechanics. Rev. Mod. Phys. 86, 647 (2014). [Google Scholar]
- 32.Sperry M. M., Telesford Q. K., Klimm F., Bassett D. S., Rentian scaling for the measurement of optimal embedding of complex networks into physical space. J. Complex Netw. 5, 199–218 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Tomassini M., Luthi L., Empirical analysis of the evolution of a scientific collaboration network. Physica A 385, 750–764 (2007). [Google Scholar]
- 34.Ravasz E., “Detecting hierarchical modularity in biological networks” in Computational Systems Biology, Ireton R., Montgomery K., Bumgarner R., Samudrala R., McDermott J., Eds. (Methods in Molecular Biology, Humana Press, Totowa, NJ, 2009), vol. 541, pp. 145–160. [DOI] [PubMed] [Google Scholar]
- 35.Farbood M. M., Heeger D. J., Marcus G., Hasson U., Lerner Y., The neural processing of hierarchical structure in music and speech at different timescales. Front. Neurosci. 9, 157 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bassett D. S., et al. , Hierarchical organization of human cortical networks in health and schizophrenia. J. Neurosci. 28, 9239–9248 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sales-Pardo M., Guimera R., Moreira A. A., Amaral L. A. N., Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. U.S.A. 104, 15224–15229 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rosvall M., Bergstrom C. T., Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS One 6, e18209 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bassett D. S., et al. , Efficient physical embedding of topologically complex information processing networks in brains and computer circuits. PLoS Comput. Biol. 6, e1000748 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Balaguer J., Spiers H., Hassabis D., Summerfield C., Neural mechanisms of hierarchical planning in a virtual subway network. Neuron 90, 893–903 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Diaconescu A. O., et al. , Inferring on the intentions of others by hierarchical Bayesian learning. PLoS Comput. Biol. 10, e1003810 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Friston K., Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Leskovec J., Faloutsos C., “Sampling from large graphs” in KDD’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computer Machinery, New York, NY, 2006), pp. 631–636.
- 44.Mitchell S. M. L., Lange S., Brus H., Gendered citation patterns in international relations journals. Int. Stud. Perspect. 14, 485–492 (2013). [Google Scholar]
- 45.Dion M. L., Sumner J. L., Mitchell S. M. L., Gendered citation patterns across political science and social science methodology fields. Polit. Anal. 26, 312–327 (2018). [Google Scholar]
- 46.Caplar N., Tacchella S., Birrer S., Quantitative evaluation of gender bias in astronomical publications from citation counts. Nat. Astron. 1, 1–5 (2017). [Google Scholar]
- 47.Maliniak D., Powers R., Walter B. F., The gender citation gap in international relations. Int. Organ. 67, 889–922 (2013). [Google Scholar]
- 48.Dworkin J. D., et al. , The extent and drivers of gender imbalance in neuroscience reference lists. Nat. Neurosci. 23, 918–926 (2020). [DOI] [PubMed] [Google Scholar]
- 49.Bertolero M. A., et al. , Racial and ethnic imbalance in neuroscience reference lists and intersections with gender. bioRxiv [Preprint] (2020). 10.1101/2020.10.12.336230 (Accessed 1 November 2020). [DOI]
- 50.Zhou D., et al. , Diversity statement and code notebook (v1.1. 2020). https://github.com/dalejn/cleanBib. Accessed 1 November 2020.
- 51.Ambekar A., Ward C., Mohammed J., Male S., Skiena S., “Name-ethnicity classification from open sources” in KDD’09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York, NY, 2009), pp. 49–58. [Google Scholar]
- 52.Sood G., Laohaprapanon S., Predicting race and ethnicity from the sequence of characters in a name. arXiv [Preprint] (2018). https://arxiv.org/abs/1805.02109 (Accessed 1 November 2020).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data analyzed in this paper and the code used to perform the analyses are openly available at GitHub (https://github.com/ChrisWLynn/Network_compressibility).




