Abstract
Networks are widely used as a tool for describing diverse real complex systems and have been successfully applied to many fields. The distance between networks is one of the most fundamental concepts for properly classifying real networks, detecting temporal changes in network structures, and effectively predicting their temporal evolution. However, this distance has rarely been discussed in the theory of complex networks. Here, we propose a graph distance between networks based on a Laplacian matrix that reflects the structural and dynamical properties of networked dynamical systems. Our results indicate that the Laplacian-based graph distance effectively quantifies the structural difference between complex networks. We further show that our approach successfully elucidates the temporal properties underlying temporal networks observed in the context of face-to-face human interactions.
Describing real complex systems as coupled dynamical systems is a basic and powerful approach for enhancing our understanding of their dynamical properties. If we can appropriately evaluate the distance between graphs or networks, this distance can be used as a fundamental and effective tool to characterise differences in the dynamical properties of two different coupled dynamical systems, because couplings or network structures affect collective behaviour of the coupled dynamical systems including convergence of consensus dynamics1,2,3, controllability of coupled linear systems4,5,6,7,8, and basin stability of coupled, especially nonlinear, dynamical systems9,10.
Although several definitions of the distance between networks have been proposed11, they all simply focus on the differences in the number of nodes and links. For example, the Hamming distance is the sum of the simple differences between the adjacency matrices of two graphs11, and the graph edit distance is the minimum cost for transforming one graph into another by deletions and insertions of nodes and links12. These distances simply focus on the differences in the number of nodes and links, and lack the perspective of both dynamical and structural properties of complex networks. Here, we propose a graph distance for complex networks that reflects both their structural and dynamical properties, focusing on the Laplacian matrix that represents the coupling topology. We define a graph distance between networks through comparing their Laplacian matrices. Because the Laplacian matrix contains significant information about the structural and dynamical properties of networks, the Laplacian-based graph distance directly reflects the differences in both the structural and dynamical properties characterised by the Laplacian matrix. We show that the Laplacian-based graph distance is effectively applied to mathematical models and several real networks, even in a situation where their sizes differ from one another. We further apply the proposed distance to a real temporal network whose structure changes with time13,14. Our results indicate that the distance for complex networks is a key concept for understanding the dynamical properties of temporal networks.
Graph Distance for Complex Networks
Let A be an N × N adjacency matrix of a given undirected network in which Aij = 1 if the node i connects to the node j and Aij = 0 otherwise. The Laplacian matrix associated with the adjacency matrix A is defined by
where K is the diagonal matrix, whose ith diagonal element is the degree of the ith node ki. The Laplacian matrix appears in several contexts of network science, such as detecting community structures in the networks15,16, consensus dynamics1, and describing diffusive coupling of nonlinear dynamical systems9,10. For example, a network consisting of connected N identical dynamical systems is governed by
where xi is the multidimensional state vector of the ith dynamical system, ε is the coupling strength, F determines the intrinsic dynamics of each isolated uncoupled dynamical system, Jij is a coupling coefficient which satisfies , and H is the coupling function that defines the dynamics of the links. Equation (2) covers several coupling forms and indicates that the Laplacian matrix is crucial for estimating the stability of the various coupled dynamical systems9. In this paper, even though we mainly focus on the situation where the coupling matrix J = {Jij} is just the Laplacian matrix L, our approach can be naturally extended to other types of the coupling matrix J.
On the other hand, the detection problem of communities in networks is formulated as one of discrete optimisation problems. A well-known technique for detecting communities is spectral partitioning based on the direct relation between the structural properties of the network and the spectrum of the Laplacian matrix16. In this sense, the Laplacian matrix reflects both structural and dynamical properties of the networked dynamical systems.
In our approach utilising the spectrum of the Laplacian matrix, we propose a graph distance between two networks by calculating the distance between the spectra of the corresponding Laplacian matrices. Let G(i) be the ith network with N(i) nodes and L(i) be its Laplacian matrix. The Laplacian matrix L(i) is rewritten by , where Λ(i) is the diagonal matrix , is the rth eigenvalue , and is the eigenvector corresponding to the rth eigenvalue of G(i). The Laplacian matrix is a semi-definite matrix, whose smallest eigenvalue, λ1, is zero and the eigenvector corresponding to λ1 is 1 which is a column vector with N ones. Let be the distribution of the elements in the rth eigenvector of the ith network. Given two networks G(i) and G(j), calculating the distance between pairs of the distributions and , we define the distance between the two networks as
where is the distance between and , and 1 < Mij ≤ min(N(i), N(j)). Throughout this paper, Mij = min(N(i), N(j)). To calculate , we introduce the Kruglov distance with the absolute value function from the viewpoint of computational convenience (see Methods and Supplementary Information). On the right-hand side of Eq. (3), is not included in the summation because both eigenvectors corresponding to zero eigenvalues are , and do not give information concerning differences between two networks. Figure 1a shows the schematic diagram of Eq. (3). We call the distance defined by Eq. (3) the spectral graph distance (See Supplementary Information for detail of properties of the spectral graph distance).
Results and Discussion
Classifying network structures
To evaluate how the spectral graph distance performs, we first apply our method to the Watts–Strogatz (WS) model, which can generate small-world and random networks by rewiring edges in the initial ring lattices with a probability p17. If p = 0, the WS model generates ring lattices, but if p = 1, the WS model generates random networks. In the experiments, a network G(po) is generated from the WS model with a fixed rewiring probability, po, in advance. We then calculate distances between G(po) and networks G(p), which are generated from the WS model with several values of the rewiring probability, p. If the proposed distance appropriately evaluates the distance between G(po) and G(p), d(G(po), G(p)) should take a minimal value when p = po.
Figure 1b shows the results of the distances between G(po) and G(p). Here, the spectral graph distance takes its minimum value when po ≈ p. On the other hand, the Hamming distance does not take its minimum value even when the value of p is close to that of po. This result clearly shows that our distance effectively evaluates the difference between the two networks, whereas a simple distance focusing on the number of nodes and links cannot appropriately reflect the differences between the networks’ structural properties.
Next, we apply our method to real networks. Figure 2a shows the distance matrix in which the real networks as well as mathematical models are listed on the vertical and horizontal axes, and colours show their inter-network distances. In Fig. 2b, each network is arranged in the two-dimensional Euclidean space such that their spectral graph distances are preserved as much as possible using multidimensional scaling (see for example ref. 18 and Supplementary Information). The hierarchical tree described on the right-hand side of Fig. 2a is obtained by the classical hierarchical clustering method (see also Methods).
As Fig. 2a,b shows, the real networks are mainly classified on the basis of their spectral graph distances into two groups: one consists of biological and technological networks, and the other consists mainly of social networks, including the dolphin social network, the network of words and the network of football games. These results agree with the previous research19, which discussed why the social networks differ from other types of real networks. On the other hand, the one-dimensional lattice is relatively far from other networks because its structure is regular and different from those of many real biological, technological and social networks.
Analyzing temporal properties in face-to-face human interactions
We next apply our method to temporal networks. One example is a contact network in which face-to-face interactions between human individuals are recorded using radio frequency identifiers20,21. The temporal network is described as a set of networks , where G(t) is a set of nodes and links that are observed within a certain period of time from t × Δt to (t + 1) × Δt. The index, t, of G(t) corresponds to the discrete temporal index. In real contact networks, individuals are likely to have many contacts in the daytime, but few during the night. If we simply focus on temporal changes in the number of links and nodes, these temporal networks seemingly have a one-day cycle (see also Supplementary Information). However, even if the temporal changes in the number of nodes and links are periodic, this does not directly indicate that network structures of the temporal networks also change periodically with time. This periodicity in the structures of the temporal networks is considerably important, especially when we evaluate the dynamical properties of temporal networks, because the structural properties characterised by the Laplacian matrix directly affect the dynamical properties2,14,22.
If the temporal change in the structures of the contact networks is periodic, there exists a non-zero positive constant τ* such that the network structure of G(t) is equivalent to that of for all values of t. We distinguish this periodicity in the structures of the temporal networks from the periodicity simply observed from the temporal changes in the number of nodes and links, and then we call the former structural periodicity. The final purpose here is to evaluate the structural periodicity of these real temporal networks.
The spectral graph distance enables us to determine whether the temporal networks have structural periodicity, by evaluating temporal differences between the network, G(t), and its m-nearest neighbours for all values of t as follows:
where t ≠ t′ and is a set of m-nearest neighbours of G(t) given by the spectral graph distance. If the contact network exhibits structural periodic behaviour, the values of τ(t, t′) should be cτ* (c = 1, 2,…), where τ* is its period, but if the contact network exhibits random behaviour, τ(t, t′) takes several values. We simply describe τ(t, t′) as τ hereafter.
In the experiments, we apply the proposed method to three types of temporal networks of contacts between individuals observed in a hospital, a high school and a science gallery20,21,23. Each contact network is divided into T smaller networks every 60 min, and the number of networks, T, in the dataset of the hospital data is 97, that of the high school is 202, and that of the science gallery is 1,929. Evaluating the value of Eq. (4) for the networks G(t) (t = 1, …, T), we first obtain the probability distribution of the value of τ, P(τ). Next, we calculate the probability distribution, P(τr), of the temporal difference, τr, between G(t) and , which is randomly selected from . In the real contact networks, due to few contacts in the night time, the contact networks observed in the daytime are likely to be selected, even in the case of random selections, and thereby, the occurrence frequency of τ is biased. To avoid such a bias, we calculate the Z score (P(τ) − 〈P(τr)〉)/σ(τr), where 〈P(τr)〉 is the expected value of P(τr) and σ(τr) is its standard deviation.
From Fig. 3, we find peaks at τ = 24, 48 and 72 in the results of the contact networks observed in the hospital and the high school. This result indicates that these contact networks exhibit recurring 24-h patterns. These results agree with intuition; most nodes in the contact network observed in the hospital are staff members including doctors and nurses who work there and patients who are admitted. Many of them participate in the temporal network every day. In the ward, the staff members work according to their fixed schedule and the temporal networks observed in the hospital are likely to exhibit periodicity. In the case of the high school, the students also behave according to a certain fixed schedule. In such cases, the network structure of G(t) is likely to be similar to that of , where τ*(> 0) is the period of the contact network. The spectral graph distance effectively elucidates the structural periodicity in the contact networks. On the other hand, we also find the peaks at τ = 24, 48, 72, 168, and 192 in the results of the science gallery to be against intuition; most nodes in the contact networks observed in the science gallery are visitors and their network structures seem to be randomly varied every day. However, our results reveal that even if the visitors change day by day, the temporal networks exhibit recurring patterns.
In summary, we have proposed the spectral graph distance that can evaluate the distance between complex networks on the basis of their spectra of the Laplacian matrices. The spectral graph distance was successfully applied to the classification of several networks and to the analysis of the contact networks, and has elucidated the temporal property underlying real contact networks, namely that which we call structural periodicity. In particular, revealing temporal properties underlying temporal networks would involve several applications, such as time series analysis of networks24,25, prediction and prevention of the spread of infectious diseases on temporal networks26,27,28, and elucidation of the relationship between functions and network structures in many real systems29,30. The spectral graph distance could be quite an important tool for answering these open questions, and carrying out more in-depth discussion on the proper distance between complex networks would give us a novel perspective on the dynamical properties of temporal networks. In addition to these applications, the notion of the spectral distance between networks is important and useful when we assess the properties of networks, for example, quantum networks31,32. The application of our distance to the quantum networks is one of the most important future works.
Methods
Distance between distributions
In Eq. (3), we need to calculate the distances between distributions and , which are the distributions of elements in the rth eigenvectors of the Laplacian matrices corresponding to the ith and jth networks, G(i) and G(j). Let be a cumulative distribution function of the elements in the rth eigenvector, , of the Laplacian matrix of the ith network. The cumulative distribution function is calculated by , where H(y) is the step function in which H(y) is unity if y ≥ 0, and zero otherwise. Although there are several choices for the distance between distributions, we use the following:
To numerically estimate the distribution function , one needs to determine a suitable bin size, but this selection is complicated because of a wide variety of distributions of elements in the eigenvectors obtained from complex networks. However, the cumulative distribution function, , is directly calculated from the elements of the eigenvector without any parameter. In our method, the elements in the eigenvector are normalised such that their minimum and maximum values are zero and unity. Transforming each eigenvector into the distribution function of its elements enables us to compare two eigenvectors of different sizes.
Hierarchical clustering
We used a classical hierarchical clustering to classify several complex networks. We first calculated the spectral graph distances between the networks. In the initial step, the individual networks are considered as clusters, that is, each cluster includes only one network. Next, using the obtained spectral graph distances, we merged the two closest clusters until only one cluster remained on the basis of the classical Ward’s method33. This hierarchical clustering is visualised by using a tree-like diagram.
We calculated the spectral graph distance between real networks and then applied the classical hierarchical clustering. Here, the real networks are the adjacency network of adjectives and nouns in the David Copperfield (Word)16, a social network of 62 dolphins (Dolphin)34, human relations between 34 members of the Zachary karate club (Karate)35, a co-appearance network between 77 characters in the famous novel Les Miserables (LesMis)36, the neural network of C. elegans (NeuralNet)17, the power grid (Powergrid)17, the C. elegans metabolic network (Metabolic)37, the Internet (Internet)38, the e-mail network (E-mail)39, the network of users of the Pretty-Good-Privacy algorithm (PGP)40, the protein interaction network of yeast (Yeast)41, and the airline network (Airline)42,43. We also used the networks generated from the WS model17 and the Barabási–Albert (BA) model44. In the WS model, the networks are generated with the rewiring probabilities p = 0 (1-Dim. Lattice), p = 0.05 (WS(p = 0.05)) and p = 1 (WS(p = 1)), and the initial one-dimensional lattice has N = 5,000 nodes, with the degree of each node being 10. In the BA model, a single new node with links is repeatedly added to the initial complete graph consisting of nodes. The network grows until the number of nodes reaches 5,000, and is set to three (BA). The texts in parentheses in the above list of real networks correspond to the labels of real networks in Fig. 2.
Additional Information
How to cite this article: Shimada, Y. et al. Graph distance for complex networks. Sci. Rep. 6, 34944; doi: 10.1038/srep34944 (2016).
Supplementary Material
Acknowledgments
This research is supported by the Aihara Innovative Mathematical Modelling Project, the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program),” initiated by the Council for Science and Technology Policy (CSTP). The research of Y.S. is supported by Grant-in-Aid for Research Activity Start-up (No. 26880020) and Grant-in-Aid for Young Scientists B (No. 16K16126) from JSPS. The research of T.I. is partially supported by a Grant-in-Aid for Challenging Exploratory Research (No. 24650116) from JSPS. The research of T.I. and Y.S. is also supported by a Grant-in-Aid for Scientific Research (C) (Generative Research Fields) (No.15KT0112) from JSPS. The research of K.A. is also supported by CREST, JST and JSPS KAKENHI 15H05707.
Footnotes
Author Contributions Y.S. conceived the study, performed the numerical simulations and prepared the manuscript. All authors discussed the results, drew conclusions and edited the manuscript.
References
- Olfati-Saber R., Fax J. A. & Murray R. M. Consensus and cooperation in networked multi-agent systems. P. IEEE 95, 215–233 (2007). [Google Scholar]
- Baronchelli A. & Daz-Guilera A. Consensus in networks of mobile communicating agents. Phys. Rev. E 85, 016113 (2012). [DOI] [PubMed] [Google Scholar]
- Estrada E. & Vargas-Estrada E. How peer pressure shapes consensus, leadership, and innovations in social groups. Sci. Rep. 3, 2905 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y.-Y., Slotine J.-J. & Barabási A.-L. Controllability of complex networks. Nature 473, 167–173 (2011). [DOI] [PubMed] [Google Scholar]
- Galbiati M., Delpini D. & Battiston S. The power to control. Nat. Phys. 9, 126–128 (2013). [Google Scholar]
- Sun J. & Motter A. E. Controllability transition and nonlocality in network control. Phys. Rev. Lett. 110, 208701 (2013). [DOI] [PubMed] [Google Scholar]
- Jia T. et al. Emergence of bimodality in controlling complex networks. Nat. Commun. 4, 2002 (2013). [DOI] [PubMed] [Google Scholar]
- Menichetti G., Dall’Asta L. & Bianconi G. Network controllability is determined by the density of low in-degree and out-degree nodes. Phys. Rev. Lett. 113, 078701 (2014). [DOI] [PubMed] [Google Scholar]
- Pecora L. M. & Carroll T. L. Master stability functions for synchronized coupled systems. Phys. Rev. Lett. 80, 2109–2112 (1998). [Google Scholar]
- Menck P. J., Heitzig J., Marwan N. & Kurths J. How basin stability complements the linear-stability paradigm. Nat. Phys. 9, 89–92 (2013). [Google Scholar]
- Deza M. M. & Deza E. Encyclopedia of distances 2nd ed. (Springer-Verlag, Berlin Heidelberg, 2013). [Google Scholar]
- Gao X., Xiao B., Tao D. & Li X. A survey of graph edit distance. Pattern Anal. Appl. 13, 113–129 (2010). [Google Scholar]
- Holme P. & Saramäki J. Temporal networks. Phys. Rep. 519, 97–125 (2012). [Google Scholar]
- Masuda N., Klemm K. & Eguluz V. M. Temporal networks: slowing down diffusion by long lasting interactions. Phys. Rev. Lett. 111, 188701 (2013). [DOI] [PubMed] [Google Scholar]
- Fortunato S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010). [Google Scholar]
- Newman M. E. J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006). [DOI] [PubMed] [Google Scholar]
- Watts D. J. & Strogatz S. H. Collective dynamics of ‘small-world’networks. Nature 393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]
- Cox T. & Cox M. Multidimensional scaling (Chapman & Hall/CRC, Boca Raton, 2001). [Google Scholar]
- Newman M. & Park J. Why social networks are different from other types of networks. Phys. Rev. E 68, 036122 (2003). [DOI] [PubMed] [Google Scholar]
- Isella L. et al. What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 271, 166–180 (2011). [DOI] [PubMed] [Google Scholar]
- Vanhems P. et al. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8, e73970 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fujiwara N., Kurths J. & Daz-Guilera A. Synchronization in networks of mobile oscillators. Phys. Rev. E 83, 025101 (2011). [DOI] [PubMed] [Google Scholar]
- Fournet J. & Barrat A. Contact patterns among high school students. PLoS ONE 9, e107878 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shimada Y., Ikeguchi T. & Shigehara T. From networks to time series. Phys. Rev. Lett. 109, 158701 (2012). [DOI] [PubMed] [Google Scholar]
- Weng T., Zhao Y., Small M. & Huang D. D. Time-series analysis of networks: Exploring the structure with random walks. Phys. Rev. E 90, 022804 (2014). [DOI] [PubMed] [Google Scholar]
- Holme P. Epidemiologically optimal static networks from temporal network data. PLoS Comput. Biol. 9, e1003142 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holme P. Information content of contact-pattern representations and predictability of epidemic outbreaks. Sci. Rep. 5, 14462 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pastor-Satorras R., Castellano C., Van Mieghem P. & Vespignani A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925–979 (2015). [Google Scholar]
- Iwayama K. et al. Characterizing global evolutions of complex systems via intermediate network representations. Sci. Rep. 2, 423 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sporns O. Networks of the brain (MIT Press, Cambridge, MA, 2010). [Google Scholar]
- Paparo G. D. & Martin-Delgado M. A. Google in a Quantum Network. Sci. Rep. 2, 444 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paparo G. D., Müller M., Comellas F. & Martin-Delgado M. A. Quantum Google in a Complex Network. Sci. Rep. 3, 2773 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan P.-N., Steinbach M. & Kumar V. Introduction to data mining (Addison-Wesley, 2006). [Google Scholar]
- Girvan M. & Newman M. E. Community structure in social and biological networks. P. Natl. Acad. Sci. USA 99, 7821–7826 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zachary W. W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977). [Google Scholar]
- Knuth D. E. The stanford graphBase: A platform for combinatorial computing (Addison-Wesley, 1993). [Google Scholar]
- Duch J. & Arenas A. Community detection in complex networks using extremal optimization. Phys. Rev. E 72, 027104 (2005). [DOI] [PubMed] [Google Scholar]
- Newman M. E. J. Network data. http://www-personal.umich.edu/mejn/netdata/ (Date of access: 15th October 2013). [Google Scholar]
- Guimerà R., Danon L., Daz-Guilera A., Giralt F. & Arenas A. Self-similar community structure in a network of human interactions. Phys. Rev. E 68, 065103(R) (2003). [DOI] [PubMed] [Google Scholar]
- Boguná M., Pastor-Satorras R., Daz-Guilera A. & Arenas A. Models of social networks based on social distance attachment. Phys. Rev. E 70, 056122 (2004). [DOI] [PubMed] [Google Scholar]
- Jeong H., Mason S. P., Barabasi A. L. & Oltvai Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001). [DOI] [PubMed] [Google Scholar]
- Handcock M. S., Hunter D. R., Butts C. T., Goodreau S. M. & Morris M. Statnet: software tools for the statistical modeling of network data. Seattle, WA. http://statnetproject.org. (Date of access: 11th March 2013) (2003). [Google Scholar]
- Batagelj V. & Mrvar A. Pajek datasets http://vlado.fmf.uni-lj.si/pub/networks/data/. (Date of access: 15th October 2013) (2006). [Google Scholar]
- Barabási A. & Albert R. Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.