Abstract
In this letter the authors discuss the relationship between structure and random walk dynamics in directed complex networks, with an emphasis on identifying whether a topological hub is also a dynamical hub. They establish the necessary conditions for networks to be topologically and dynamically fully correlated (e.g., word adjacency and airport networks), and show that in this case Zipf’s law is a consequence of the match between structure and dynamics. They also show that real-world neuronal networks and the world wide web are not fully correlated, implying that their more intensely connected nodes are not necessarily highly active.
We address the relationship between structure and dynamics in complex networks by taking the steady-state distribution of the frequency of visits to nodes–a dynamical feature–obtained by performing random walks1 along the networks. A complex network2–5 is taken as a graph with directed edges and associated weights, which are represented in terms of the weight matrix . The nodes in the network are numbered as , and a directed edge with weight , extending from node to node , is represented as . No self-connections (loops) are considered. The in and out strengths of a node , abbreviated as and , correspond to the sum of the weights of its in- and outbound connections, respectively. The stochastic matrix for such a network is
(1) |
The matrix is assumed to be irreducible; i.e., any of its nodes can be accessible from any other node, which allows the definition of a unique and stable steady state. An agent, placed at any initial node , chooses among the adjacent outbound edges of node with probability equal to . This step is repeated a large number of times , and the frequency of visits to each node is calculated as . In the steady state (i.e., after a long time period ), and the frequency of visits to each node along the random walk may be calculated in terms of the eigenvector associated with the unit eigenvalue (e.g., Ref. 6). For proper statistical normalization we set . The dominant eigenvector of the stochastic matrix has theoretically and experimentally been verified to be remarkably similar to the corresponding eigenvector of the weight matrix, implying that the adopted random walk model shares several features with other types of dynamics, including linear and nonlinear summations of activations and flow in networks.
In addition to providing a modeling approach intrinsically compatible with dynamics involving successive visits to nodes by a single or multiple agents, such as is the case with world wide web (WWW) navigation, text writing, and transportation systems, random walks are directly related to diffusion. More specifically, as time progresses, the frequency of visits to each network node approaches the activity values which would be obtained by the traditional diffusion equation. A full congruence between such frequencies and activity diffusion is obtained at the equilibrium state of the random walk process. Therefore, random walks are also directly related to the important phenomenon of diffusion, which plays an important role in a large number of linear and nonlinear dynamic systems including disease spreading and pattern formation. Random walks are also intrinsically connected to Markov chains, electrical circuits, and flows in networks, and even dynamical models such as Ising. For such reasons, random walks have become one of the most important and general models of dynamics in physics and other areas, constituting a primary choice for investigating dynamics in complex networks.
The correlations between activity (the frequency of visits to nodes ) and topology (out strength or in strength ) can be quantified in terms of the Pearson correlation coefficient . For full activity-topology correlation in directed networks, i.e., between and or between and , it is enough that (i) the network must be strongly connected, i.e., is irreducible, and (ii) for any node, the in strength must be equal to the out strength. The proof of the statement above is as follows. Because the network is strongly connected, its stochastic matrix has a unit eigenvector in the steady state, i.e., . Since , the element of the vector is given as
(2) |
By hypothesis, for any and, therefore, both and are eigenvectors of associated with the unit eigenvalue. Then , implying full correlation between frequency of visits and both in and out strengths.
An implication of this derivation is that for perfectly correlated networks, the frequency of symbols produced by random walks will be equal to the out strength or in strength distributions. Therefore, an out strength scale-free3 network must produce sequences obeying Zipf’s law7 and vice versa. If, on the other hand, the node distribution is Gaussian, the frequency of visits to nodes will also be a Gaussian function; that is to say, the distribution of nodes is replicated in the node activation. Although the correlation between node strength and random walk dynamics in undirected networks has been established before8 (including full correlation9,10), the findings reported here are more general since they are related to any directed weighted network, such as the WWW and the airport network. Indeed, the correlation conditions for undirected networks can be understood as a particular case of the conditions above.
A fully correlated network will have . We obtained for texts by Darwin11 and Wodehouse12 and for the network of airports in the USA.13 The word association network was obtained by representing each distinct word as a node, while the edges were established by the sequence of immediately adjacent words in the text after the removal of stopwords14 and lemmatization.15 More specifically, the fact that word has been followed by word , times during the text, is represented as . Zipf’s law is known to apply to this type of network.16 The airport network presents a link between two airports if there exists at least one flight between them. The number of flights performed in one month was used as the strength of the edges.
We obtained for various real networks (Table I), including the fully correlated networks mentioned above. To interpret these data, we recall that a small means that a hub (large in or out strength) in topology is not necessarily a center of activity. Notably, in all cases considered is greater for the in strength than for the out strength. This may be understood with a trivial example of a node from which a high number of links emerge (implying large out strength) but which has only very few inbound links. This node, in a random walk model, will be rarely occupied and thus cannot be a center of activity, though it will strongly affect the rest of the network by sending activation to many other targets. Understanding why a hub in terms of in strength may fail to be very active is more subtle. Consider a central node receiving links from many other nodes arranged in a circle, i.e., the central node has a large in strength but with the surrounding nodes possessing small in strength. In other words, if a node receives several links from nodes with low activity, this node will likewise be fairly inactive. In order to further analyze the latter case, we may examine the correlations between the frequency of visits to each node and the cumulative hierarchical in and out strengths of that node. The hierarchical degree17–19 of a network node provides a natural extension of the traditional concept of node degree. The immediate neighbors of a node are called the first hierarchical level of . The subsequent hierarchical levels are obtained as follows. The level contains the neighbors of the nodes of level . The cumulative hierarchical out strength of a node at the hierarchical level corresponds to the sum of the weights of the edges extending from the hierarchical level to the level , plus the out strengths obtained from hierarchy to . Similarly, the cumulative in strength of a node at hierarchical level is the sum of the weights of the edges from hierarchical level to the previous level , plus the in strengths obtained from hierarchy 1 to . The traditional in and out strengths are, respectively, the cumulative hierarchical in and out strengths at hierarchical level 1 (see Supplementary Methods in Refs. 20 for an illustration of hierarchical levels). Because complex networks are also small world structures, it suffices to consider hierarchies up to two or three levels.
Table I.
Cortex | C. elegans | Airports | Darwin | Wodehouse | WWW | |
---|---|---|---|---|---|---|
No. nodes | 53 | 191 | 280 | 3678 | 3705 | 10 810 |
No. edges | 826 | 2449 | 4160 | 22 095 | 16 939 | 158 102 |
CC | ||||||
IS1 | ||||||
IS2 | ||||||
IS3 | ||||||
IS4 | ||||||
OS1 | ||||||
OS2 | ||||||
OS3 | ||||||
OS4 | ||||||
0.83 | 0.78 | 1.00 | 1.00 | 1.00 | 0.15 | |
0.58 | 0.84 | 0.33 | 0.86 | 0.82 | 0.09 | |
0.24 | 0.43 | 0.11 | 0.42 | 0.43 | 0.13 | |
0.24 | 0.35 | 0.08 | 0.20 | 0.22 | 0.11 | |
0.39 | 0.20 | 1.00 | 1.00 | 1.00 | 0.00 | |
0.30 | 0.01 | 0.33 | 0.87 | 0.81 | ||
0.11 | 0.42 | 0.43 | ||||
0.07 | 0.20 | 0.22 |
For the least correlated network analyzed, viz., that of the largest strongly connected cluster in the network of WWW links in the domain of Ref. 21 (Massey University, New Zealand) (Refs. 22 and 23) activity could not be related to in strength at any hierarchical level. Because the Pearson coefficient corresponds to a single real value, it cannot adequately express the coexistence of the many relationships between activity and degrees present in this specific network as well as possibly heterogeneous topologies. Very similar results were obtained for other WWW networks, which indicate that the reasons why topological hubs have not been highly active cannot be identified at the present moment (see, however, discussion for higher correlated networks below).
However, for the two neuronal structures of Table I that are not fully correlated (network defined by the interconnectivity between cortical regions of the cat24 and network of synaptic connections in C. elegans25), activity was shown to increase with the cumulative first and second hierarchical in strengths. In the cat cortical network, each cortical region is represented as a node, and the interconnections are reflected by the network edges. Significantly, in a previous paper,26 it was shown that when connections between cortex and thalamus were included, the correlation between activity and outdegree increased significantly. This could be interpreted as a result of increased efficiency with the topological hubs becoming highly active. Furthermore, for the fully correlated networks, such as word associations obtained for texts by Darwin and Wodehouse, activity increased basically with the square of the cumulative second hierarchical in strength (see Supplementary Fig. 2. in Ref. 20). In addition, the correlations obtained for these two authors are markedly distinct, as the work of Wodehouse is characterized by substantially steeper increase of frequency of visits for large in strength values (see Supplementary Fig. 3 in Ref. 20). Therefore, the results considering higher cumulative hierarchical degrees may serve as a feature for authorship identification.
In conclusion, we have established (i) a set of conditions for full correlation between topological and dynamical features of directed complex networks and demonstrated that (ii) Zipf’s law can be naturally derived for fully correlated networks. Result (i) is of fundamental importance for studies relating the dynamics and connectivity in networks, with critical practical implications. For instance, it not only demonstrates that hubs of connectivity may not correspond to hubs of activity but also provides a sufficient condition for achieving full correlation. Result (ii) is also of fundamental importance as it relates two of the most important concepts in complex systems, namely, Zipf’s law and scale-free networks. Even though sharing the feature of power law, these two key concepts had been extensively studied on their own. The result reported in this work paves the way for important additional investigations, especially by showing that Zipf’s law may be a consequence of dynamics taking place in scale-free systems. In the cases where the network is not fully correlated, the Pearson coefficient may be used as a characterizing parameter. For a network with very small correlation, such as the WWW links between the pages in a New Zealand domain analyzed here, the reasons for hubs failing to be active could not be identified, probably because of the substantially higher complexity and heterogeneity of this network, including varying levels of clustering coefficients, as compared to the neuronal networks.
Acknowledgments
This work was financially supported by FAPESP and CNPq (Brazil). Luciano da F. Costa thanks grants 05/00587-5 (FAPESP) and 308231/03-1 (CNPq).
REFERENCES
- 1.Brémaud P., Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues (Springer-Verlag, New York, 1999). [Google Scholar]
- 2.Watts D. J. and Strogatz S. H., Nature (London) 10.1038/30918 393, 440 (1998). [DOI] [PubMed] [Google Scholar]
- 3.Barabási A. L. and Albert R., Science 10.1126/science.286.5439.509 286, 509 (1999). [DOI] [PubMed] [Google Scholar]
- 4.Newman M. E. J., SIAM Rev. 10.1137/S003614450342480 45, 167 (2003). [DOI] [Google Scholar]
- 5.Boccaletti S., Latora V., Moreno Y., Chavez M., and Hwang D., Phys. Rep. 10.1016/j.physrep.2005.10.009 424, 175 (2006). [DOI] [Google Scholar]
- 6.Eriksen K. A., Simonsen I., Maslov S., and Sneppen K., Phys. Rev. Lett. 10.1103/PhysRevLett.90.148701 90, 148701 (2003). [DOI] [PubMed] [Google Scholar]
- 7.Newman M. E. J., Contemp. Phys. 10.1080/00107510500052444 46, 323 (2005). [DOI] [Google Scholar]
- 8.Eisler Z. and Kertész J., Phys. Rev. E 10.1103/PhysRevE.71.057104 71, 057104 (2005). [DOI] [PubMed] [Google Scholar]
- 9.Noh J. D. and Rieger H., Phys. Rev. Lett. 10.1103/PhysRevLett.92.118701 92, 118701 (2004). [DOI] [PubMed] [Google Scholar]
- 10.Wu A.-C., Xu X.-J., Wu Z.-X., and Wang Y.-H., Chin. Phys. Lett. 10.1088/0256-307X/24/2/077 24, 577 (2007). [DOI] [Google Scholar]
- 11.Darwin C., The Formation of Vegetable Mould through the Action of Worms, with Observations on their Habits (Murray, London, 1881). [Google Scholar]
- 12.Wodehouse P. G., The Pothunters (A & C Black, London, 1902). [Google Scholar]
- 13.Bureau of Transportation Statistics: Airline On-Time Performance Data, 2006. (http://www.bts.gov).
- 14.Baeza-Yates R. and Ribeiro-Neto B., Modern Information Retrieval (Addison-Wesley, New York, 1999). [Google Scholar]
- 15.Mitkov R., The Oxford Handbook of Computational Linguistics (Oxford University Press, New York, 2003). [Google Scholar]
- 16.Zipf G. K., Human Behaviour and the Principle of Least Effort (Addison-Wesley, Reading, 1949). [Google Scholar]
- 17.da F. Costa L., Phys. Rev. Lett. 10.1103/PhysRevLett.93.098702 93, 098702 (2004). [DOI] [PubMed] [Google Scholar]
- 18.da F. Costa L. and Sporns O., Eur. Phys. J. B 10.1140/epjb/e2006-00017-1 48, 567 (2005). [DOI] [Google Scholar]
- 19.da F. Costa L. and Silva F. N., J. Statistical Phys. 125, 841 (2006). [Google Scholar]
- 20.da F. Costa L., Sporns O., Antiqueira L., Nunes M. G. V., and O. N. Oliveira, Jr., e-print arXiv:physics/0611247.
- 21.massey.ac.nz
- 22.The Academic Web Link Database Project: New Zealand University Web Sites, 2006. (http://cybermetrics.wlv.ac.uk/database/).
- 23.Thelwall M., Cybermetrics 6/7 (2003). [Google Scholar]
- 24.Scannell J. W., Burns G. A. P. C., Hilgetag C. C., O’Neil M., and Young M. P., Cereb. Cortex 9, 277 (1999). [DOI] [PubMed] [Google Scholar]
- 25.White J. G., Southgate E., Thompson J. N., and Brenner S., Philos. Trans. R. Soc. London, Ser. B 10.1098/rstb.1986.0056 314, 1 (1986). [DOI] [PubMed] [Google Scholar]
- 26.da F. Costa L. and Sporns O., Appl. Phys. Lett. 10.1063/1.2219736 89, 013903 (2006). [DOI] [Google Scholar]