Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2015 Jun 11;5:10350. doi: 10.1038/srep10350

Measuring multiple evolution mechanisms of complex networks

Qian-Ming Zhang 1,2,4, Xiao-Ke Xu 3,a, Yu-Xiao Zhu 1,4, Tao Zhou 1,4,b
PMCID: PMC4464182  PMID: 26065382

Abstract

Numerous concise models such as preferential attachment have been put forward to reveal the evolution mechanisms of real-world networks, which show that real-world networks are usually jointly driven by a hybrid mechanism of multiplex features instead of a single pure mechanism. To get an accurate simulation for real networks, some researchers proposed a few hybrid models by mixing multiple evolution mechanisms. Nevertheless, how a hybrid mechanism of multiplex features jointly influence the network evolution is not very clear. In this study, we introduce two methods (link prediction and likelihood analysis) to measure multiple evolution mechanisms of complex networks. Through tremendous experiments on artificial networks, which can be controlled to follow multiple mechanisms with different weights, we find the method based on likelihood analysis performs much better and gives very accurate estimations. At last, we apply this method to some real-world networks which are from different domains (including technology networks and social networks) and different countries (e.g., USA and China), to see how popularity and clustering co-evolve. We find most of them are affected by both popularity and clustering, but with quite different weights.


Many social, technological networks evolve over time after they are established. Previous studies have revealed that real networks possess many different structural features, like various degree distribution1, different levels of clustering2, existent or nonexistent communities3, assortative or disassortative mixing pattern4, long or short average shortest distance, and so on, which attract much attention on building models to mimic the network evolution5,6. Meanwhile, the latent mechanisms are also fruitful such as the rich-get-richer7, the good-get-richer8, the stability constrains9, homophily10, clustering11 etc. However, using one pure mechanism is usually insufficient to depict real-world networks precisely because of those different aspects of features. Therefore, researchers mixed different mechanisms in order to get better simulation, like the mixture of clustering and preferential attachment11,12, popularity and randomness13, popularity and similarity14, topology distance and geographical distance15, and so on. In all, networks are likely to be driven by multiple mechanisms, and we are inspired to raise a question: is it possible to measure the contribution of each mechanism in the network evolution?

The inchoate way to evaluate network model or underlying mechanism is based on the comparison between some selected structural features. It supposes a model is better than another one if its generated network is more close to the target network in terms of those selected features. But such method cannot be well validated since no one has the fair standard to select representative ones from countless structural features. Without considering any specific structural feature, we had proposed a method based on likelihood analysis to fairly evaluate network models16. Therein, we can calculate the appearing likelihood for each newly created link according to the model’s mechanism, and then multiply them together to get the likelihood of the set of new links. For a group of models, the one giving the highest likelihood is considered to be the most suitable one. This method is inspired by the link prediction approach, which aims at estimating the likelihood of the existence of a link based on the observed links17. According to this definition, if the principle of a link prediction algorithm is consistent to the mechanism of a given network, this algorithm should provide accurate predictions. Therefore, one can also evaluate the latent mechanisms according to the prediction results of the corresponding link prediction algorithms18,19. In this paper, we take the likelihood analysis and link prediction methods into consideration because they are both free of any specific structural features. To our knowledge, the above methods have only been applied to judge which mechanism is better given a series of mechanisms, but have never been applied to measure the contributions of multiple mechanisms in network evolution.

The core idea of the above methods is to estimate the appearing likelihood of links, which inspires us to measure the contributions of multiple mechanisms by calculating the likelihood using all the mechanisms simultaneously. Therefore, we design a formula to re-calculate the likelihood for every link by assigning each mechanism an tunable weight. The optimal group of weights are the ones maximizing the likelihood of all links (likelihood analysis method) or the prediction accuracy (link prediction method). To testify the effectiveness, we produce numerous model networks which can be controlled to follow multiple mechanisms with different weights, such as popularity, clustering and randomness. Through comparing the estimated contributions with the known weights, we find both of the methods are effective to judge which mechanism is stronger. In particular the one based on likelihood analysis can give very accurate estimations. Further, we discuss the advantage of likelihood analysis method and the disadvantage of the link prediction method which leads to its worse performance. At last, we apply the likelihood analysis method to different kinds of real-world networks to see how popularity and clustering co-evolve in real complex networks. These networks are collected from different domains, including technology networks and social networks, and from different countries, e.g. USA and China. The results show that most of these networks evolve with both mechanisms but with quite different weights.

The main contributions are two folds. In the theoretical aspect, we clarify that the multiple mechanisms of complex systems can be measured in a quantitative way, and provide a unified, efficient and extensible measurement method. In the aspect of specific conclusions, we find some interesting properties for real-life networks. For example, the clustering mechanism widely exists in any social networks, while in the platform mainly designed for social activities (Facebook and Flickr) the clustering effect is much stronger than in the platform where the primary demands of users are not social intercourse, such as to watch videos in Youtube and to read blogs in ScienceNet. In addition, we showed that the evolving mechanisms may remarkably change in time for some real networks (e.g., Internet), so the links associated with new nodes are created with different reasons by links between old nodes, which are usually ignored in known models, but in accordance with some experimental studies on Internet, such as20,21.

Results

Measurement methods

Given two snapshots of an evolving undirected network at time Inline graphic and Inline graphic (Inline graphic), denoted by Inline graphic and Inline graphic respectively, where Inline graphic (Inline graphic) and Inline graphic (Inline graphic) are the sets of nodes and links respectively. The set of new links is Inline graphic. In the following we firstly introduce two previous methods of evaluating underlying mechanisms in network evolution, and then present how we measure contributions of multiple mechanisms.

One method is based on likelihood analysis16, of which the key idea is to estimate the appearing likelihood for each new link by multiply the probabilities of selecting its two endpoints. For example, if the links are all randomly created, the likelihood of each link Inline graphic can be calculated by Inline graphic where Inline graphic is the number of nodes of the network. Then, we can get the likelihood for all the new links according to Inline graphic. For a group of models, we can calculate Inline graphic for each of them, and the one with the highest likelihood Inline graphic is considered to be the most suitable one.

The other method is based on link prediction18,19. The link prediction index would assign a score, following some certain principle, to each non-observed links, including new links Inline graphic and nonexistent links Inline graphic (Inline graphic, where Inline graphic is the universal set containing all Inline graphic links). Then we can rank these links in descending order. A link prediction index is good if it can assign the new links higher rankings compared with the nonexistent links. To measure it in a quantified way, we introduce the AUC value (area under the receiver operating characteristic curve17,27) which will be discussed in detail in Materials and Methods. Then we assume that a mechanism is more suitable to depict the network evolution if the corresponding link prediction algorithm results in a higher AUC.

As described above, the key points are both to estimate the likelihoods of links. We are motivated to re-estimate the likelihood by considering all the mechanisms with tunable parameters (which must sum to 1) indicating their contributions. According to the probability theory, we define the likelihood of link Inline graphic as the expectation of the likelihoods for all the mechanisms, written as

graphic file with name srep10350-m23.jpg

where Inline graphic is the number of considered mechanisms. Thus, for the method based on likelihood analysis, we expect the group of parameters which maximize Inline graphic would indicate the contribution of each mechanism. Similarly, for the method based on link prediction model, the group of parameters which maximize the prediction accuracy (AUC) would indicate the contribution of each mechanism.

Comparisons between the two methods

To examine the effectiveness of the measurement methods, we apply them to model networks of which the evolution can be controlled. Two well-known mechanisms, popularity and clustering, are firstly taken into consideration. Popularity denotes that the nodes with higher degree are more attractive, while clustering suggests that the links which can form more triangles is more preferred. The model network evolves beginning with a loop consisting of five nodes. It grows following two rules at each step:

  1. add one new node with one new link which connects this new node to one old node;

  2. add Inline graphic links, but self-loops and multi-links are not accepted.

Every new link is created following either popularity mechanism or clustering mechanism, which is controlled by a tunable parameter Inline graphic ranging from 0 to 1. Inline graphic means all the links are created following popularity mechanism, while Inline graphic means all the links are created following clustering mechanism.

To implement popularity mechanism, we choose preferential attachment which was depicted by Barabási and Albert in7. They defined the probability of selecting node Inline graphic for new links as Inline graphic. Similarly, for clustering mechanism we use the number of common neighbors to measure the likelihood of creating a link between Inline graphic and Inline graphic. In detail, we firstly select a node Inline graphic for the new link, and then select the other node preferentially according to the probability Inline graphic, where Inline graphic is the set of neighbors of Inline graphic. Node Inline graphic is selected randomly to differ from popularity mechanism. Notice that, the new link which is added with the new node at each step, cannot be created if following the current clustering mechanism. So we randomly select an old node to form this new link to differ from preferential attachment. By tuning Inline graphic from 0 to 1 with step-length 0.1, we respectively produce 100 model networks for every Inline graphic. Then the question can be simplified to estimate the value Inline graphic for each model network through equation (1).

Link prediction method

Corresponding to the implementation of popularity mechanism, there has been proposed a link prediction index named Preferential Attachment (PA) index which is defined as the product of the degrees of two nodes, written as Inline graphic7,17,22. There also has been proposed Common Neighbor (CN) index22 which is accordant with the clustering mechanism, written as Inline graphic. Notice that, many node pairs have the same number of common neighbors, or no common neighbor, which leads to the indistinguishable Inline graphic and the degeneracy of states23. To tackle such problems but keeping the predictive power of CN index invariant, we add a small random number Inline graphic to every Inline graphic, rewritten as Inline graphic. Because Inline graphic is much larger than Inline graphic, we must normalize the Inline graphic and Inline graphic when we combine them. Otherwise Inline graphic will not function unless it is strengthened. Thus we define the hybrid index as

graphic file with name srep10350-m53.jpg

where Inline graphic and Inline graphic are the normalized values by the mean Inline graphic and Inline graphic respectively. In detail, Inline graphic, and Inline graphic, where Inline graphic is the mean value of Inline graphic. By tuning Inline graphic ranging from 0 to 1, we can easily find the optimal Inline graphic which maximizes the prediction accuracy (AUC). Need to notice that, CN index can not work if any endpoint of a new link appears after Inline graphic. So we remove all the new links with such nodes when to implement the link prediction method. To keep unanimous, such new links are also ignored when applying the likelihood analysis method.

Likelihood analysis method

This method16 defines the likelihood of a link Inline graphic as the multiplication of the likelihoods of selecting node Inline graphic and Inline graphic. Thus, Inline graphic can be easily defined as Inline graphic, and Inline graphic can be defined as Inline graphic. Then the likelihood of Inline graphic has the format

graphic file with name srep10350-m73.jpg

This model aims to maximize the likelihood of all the new links, written as

graphic file with name srep10350-m74.jpg

Thus, we can also obtain the optimal Inline graphic which maximizes Inline graphic. Notice that if Inline graphic, Inline graphic will be meaningless. Please see the solution in Materials and Methods, where we also define Inline graphic if we consider new links without the limitation of new nodes.

In Fig. 1, we present the trends of AUC values (subfigure (a) and (b)) and Inline graphic (subfigure (c)-(h)) with the increasing Inline graphic. The contributions of popularity mechanism and clustering mechanism can be estimated through the peak values. We can see that the optimal Inline graphic resulted from both the two methods increase when Inline graphic grows bigger. For intuitive observation, we figure out the correlation between Inline graphic and the optimal Inline graphic in Fig. 2(a). The likelihood analysis method gives very accurate estimation while the link prediction method fails when Inline graphic is large. The reasons of such failure are three folds: (i) CN mechanism embodies the principle of preferential attachment to some extent; (ii) the link prediction method provides too rough descriptions for the links; (iii) the link prediction model is not appropriate to measure the mechanisms’ contributions.

Figure 1.

Figure 1

Measuring popularity and clustering based on link prediction method and likelihood analysis method respectively. The contributions are estimated through the peak values. Subfigure (a) and (b) present the average values of AUC resulted by link prediction method, which are obtained by averaging 100 implementations through 100 model networks. The others present the values of Inline graphic resulted from likelihood analysis method. Therein, each curve corresponds to one model network. Inline graphic corresponds to the coefficient in equation (1). Inline graphic denotes the contribution of clustering mechanism in the model networks. Because the likelihoods for the networks are not in the same order of magnitude, we use 12xxx instead of the exact values. 12xxx means an uncertain value above 11999 and below 13000.

Figure 2.

Figure 2

Correlation between the optimal Inline graphic and Inline graphic. Inline graphic is the known proportion of clustering mechanism compared to popularity mechanism. Inline graphic is the estimated value by the measurement method in this paper. Subfigure (a) represents the comparison between link prediction method and likelihood analysis method, where no new links with new nodes are considered. Subfigure (b) only shows the results of likelihood analysis method without the limitation of new nodes.

Firstly, CN mechanism embodies the principle of preferential attachment because two nodes with large degrees have higher chance to have common neighbors. However, PA never considers the number of common neighbors shared by any node pair. When Inline graphic is small, few new links are restricted to form triangles. It’s easy to distinguish CN mechanism from PA mechanism because most new links shares few, even no common neighbors. When Inline graphic becomes larger, although the formation of triangles become popular, the new links with many common neighbors also tend to have high-degree endpoints. There also exist many new links with few common neighbors but high-degree endpoints. These links lead to the failure of the link prediction method. We will explain it in detail through an example along with the third reason. However, this problem caused by the network model restricts the link prediction method but does not influence the likelihood analysis method. That should be due to the advantages of the likelihood analysis method, which are discussed as below.

The second reason is the loser’s rough descriptions of the links compared with the winner. For example, suppose there are two pairs of unconnected nodes Inline graphic and Inline graphic, which both have two common neighbors, but the degrees of Inline graphic and Inline graphic are much higher than those of Inline graphic and Inline graphic. The probabilities that these links appear is obviously quite different, but the CN index assigns them the same values, i.e., Inline graphic. In contrast, the likelihood analysis method can strongly distinguish them by applying probabilistic methods. Following the definition, we can get the likelihoods,

graphic file with name srep10350-m96.jpg

and Inline graphic in the similar form, which are proved in Materials and Methods. Obviously, Inline graphic is far different from Inline graphic, because Inline graphic and Inline graphic are much larger than both Inline graphic and Inline graphic.

At last, in link prediction method, each new link needs to be compared with all the (sampled) nonexistent links. So that we can find the best link prediction index which assigns the new links with higher rankings compared with those nonexistent links. But when we try to improve the new links’ rankings by tuning Inline graphic, there always exist some links whose rankings fall because of the improved rankings of some nonexistent links. That is to say, the nonexistent links, which are indispensable in the link prediction model, become the barriers to measuring the mechanisms’ contributions. By comparison, the likelihood analysis method aims to optimize the overall likelihood of the new links as a whole. Until now, many researches discussed that some properties only emerge at the global level but vanish at the individual level, such as the function of the organs, the power-law distribution of displacement on the group level but not on the individual level24, and so on. In our case, although the new links are created following CN mechanism when Inline graphic, some of them might seem to be following PA mechanism as they have high-degree endpoints. Unless we consider the overall likelihood of these links, we cannot obtain the accurate estimation. Moreover, the likelihood analysis method shakes off the effect of the nonexistent links. In fact, many pairs of nonexistent nodes are deemed to be linked with high probability. These pairs of nodes would lead us astray if they are treated as the reference standard in the link prediction method. For clarity, we generate a small network following CN mechanism to explain such failure. As shown in Fig. 3, new links are marked by red dash lines and NewInline graphic. We also select six nonexistent links marked by NonInline graphic to make comparisons. Clearly we can see that the node pair with high Inline graphic usually has high Inline graphic, which is caused by the embodied preferential attachment principle. Such effect makes the estimation difficult. At first, we rank the links according to Inline graphic, Non1 and Non2 are only behind New1. Then we introduce Inline graphic, the rankings of New2 and New3 are improved due to their larger Inline graphic, while Non2 with lower Inline graphic gets a lower ranking. Notice that, the prediction accuracy can benefit from such changes. However, we also need to notice the change happened on Non1, which will lower the accuracy. Non1 has both high Inline graphic and Inline graphic but belongs to nonexistent links. This is the ungovernable effect what we referred before. Adopting such link as the reference standard, it is difficult to obtain the accurate estimation.

Figure 3.

Figure 3

Example network driven by clustering mechanism only, and comparisons between the new links and some selected nonexistent links. Red dash links represent new links which are created following the clustering mechanism. NewInline graphic represents the IDs of new links, while NonInline graphic represents the IDs of nonexistent links. The two end nodes of the link are labeled as Inline graphic and Inline graphic. Inline graphic is the number of common neighbors between Inline graphic and Inline graphic, corresponding to Common Neighbor Index. Inline graphic is calculated through Preferential Attachment Index. The numbers in “rankCN” column are the rankings based on Inline graphic (corresponding to λ = 1), while those in “rankHyb” column are the rankings based on Inline graphic(corresponding to Inline graphic).

As above, the likelihood analysis method wins due to its two advantages: the exact description of individual link, and the global perspective of description of all the new links. These two points are both indispensable. By comparison, the link prediction method is limited by its rough description of individual link, and the ungovernable effect of nonexistent links. To be more stringent, we redefine the CN index to get more accurate description of individual link by Inline graphic, which has the same form to the equation of the likelihood analysis method. But it still failed, as shown in Figure S1 in the Supporting Information. The result implies the effect of the nonexistent links is the main reason.

In Fig. 2(b), we show another advantage of the likelihood analysis method. Due to the drawback of link prediction model, we do not consider the new links with new nodes in Fig. 2(a), but such new links do not limit the effectiveness of likelihood analysis method. Actually, they can improve the accuracy of the estimation a little bit.

Verification through model networks with more mechanisms

Without loss of generality, we examine the winner through model networks driven by more mechanisms. Thus we introduce randomness mechanism, which means that the endpoints of new links are all randomly selected. Similarly, the model networks start evolving from a loop consisting of five nodes. At each step, one new node with one new link and three other links are added. Every link is created following Randomness mechanism with probability Inline graphic, clustering mechanism with probability Inline graphic or popularity mechanism with probability Inline graphic, where Inline graphic, and Inline graphic.

By calculating the Inline graphic through equation (4), we can plot every group of estimated values Inline graphic, Inline graphic, Inline graphic, in a three-dimensional space. As shown in Fig. 4, red spots denote the estimated values, while green rectangles show the locations of the theoretical values. The tight fitting again reflects the accurate estimation resulted by likelihood analysis method.

Figure 4.

Figure 4

The fitting degree of the estimated contribution and the theoretical values Inline graphic, Inline graphic and Inline graphic. Red spots denote the estimated values resulted from likelihood analysis method. Green rectangles mean the theoretical values.

Measuring popularity and clustering for real networks

Inspired by the effectiveness of the measurement method, we try to understand how popularity and clustering mechanism affect real-world networks. We collected nine networks including internet, social networks, communication networks and collaboration networks. Each of them is divided into two parts based on time stamps — observed links and new links (see details in Materials and Methods and Table 1).

Table 1. The basic information of the real networks. Inline graphic is the number of nodes and Inline graphic is the number of links before Inline graphic . Inline graphic and Inline graphic are clustering coefficient25 and assortative coefficient4, respectively. Inline graphic is the average degree of network. Inline graphic denotes the degree heterogeneity defined as Inline graphic . Inline graphic and Inline graphic are the numbers of new nodes and links during Inline graphic . Inline graphic denotes the number of new links among old nodes only.

Networks Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
AS 22960 49545 0.354 −0.196 4.32 62.34 2143 9723 6346
Internet 23670 47079 0.334 −0.202 3.98 64.63 1856 5333 2824
SN 39748 249685 0.271 −0.163 12.56 33.44 692 8213 6541
Epinions 117719 640152 0.251 −0.07 10.88 21.21 13861 71058 29548
Youtube 1022090 2690294 0.177 −0.033 5.26 90.03 116409 300149 122287
Flickr 1486725 11786888 0.379 −0.02 15.86 50.59 4060 64734 57882
FB 59699 735380 0.25 0.181 24.64 3.47 4032 81710 70850
FBC 43590 165070 0.130 0.22 7.57 3.15 2223 18342 15249
Coauthor 10093 15432 0.704 −0.017 3.06 4.66 838 1716 459

By calculating the likelihood of new links with equation (4), we can also easily find the optimal Inline graphic for every real network, indicated by the peaks of blue dash curves in Fig. 5. Obviously, the clustering mechanism widely exists in any social networks, but takes on different roles. The clustering effect is much stronger in the platform Facebook and Flickr, which are mainly designed for social activities where people tend to form clusters. Differently, in the platform of Youtube, ScienceNet and Epinions, the clustering effect loses to the popularity effect, because the primary demands of their users are not social intercourse but to watch videos (in Youtube), read blogs (in ScienceNet) and rate products (in Epinions). It does make sense because people who have better resources (e.g., excellent videos, great blogs) also hold greater appeal. In the collaboration network (Coauther), clustering and popularity also co-exist. The existence of Clustering mechanism is natural, because many scientists have their own groups where advisors and students usually collaborate with each other. Popularity mechanism is also plausible, because famous groups are more competitive to attract researchers. In the next experiment, we can see that clustering effect would be a little stronger after they created the first link.

Figure 5.

Figure 5

The optimal Inline graphic of likelihood analysis method for real networks. Blue dash curves represent the likelihood calculated through new links without the limitation of new nodes, while red curves represent the likelihood calculated through new links without new nodes.

We further study the mechanisms for the new links among old nodes only, to observe the effect of new users. As shown by the red curves in Fig. 5, the optimal Inline graphic tends to fall on different positions compared with the blue dash curves. The differences are not obvious in the online social platforms, but is significant in technology networks and collaboration networks. Such differences show that the evolving mechanisms may remarkably change in time, and the links associated with new nodes are created with different reasons by links between old nodes. This result on Internet is accordance with some previous experimental results20,21. Similarly, in the collaboration network, after a researcher joins a new group, he will develop more cooperations with other members.

Discussion

Analyzing network evolution is not only a fundamental problem, but also a long-standing challenge in the network science domain. Previous studies focused on uncovering new mechanisms or improving some known mechanisms. In this paper, we started a new question that is to quantitatively measure the contributions of multiple mechanisms which affect the evolution of complex networks simultaneously. Motivated by previous studies, we compared two measurement methods which are based on link prediction and likelihood analysis respectively. Although the core ideas are both to estimate the likelihood for newly created links, the link prediction method fails in some cases. By analyzing their differences, we found the likelihood analysis method successfully captures the characteristics of new links on the individual level, and the overall property of new links on the group level as well. In fact, many researches have discussed that some features or functions emerge on the group level but vanish on the individual level, such as the function of the organs, the collective behaviors of the ant colonies, the power-law distribution of displacement on the group level but not on the individual level24, etc. As a result the likelihood analysis method has the ability of producing very accurate estimations.

The likelihood analysis method is promising because it is highly extensible. The likelihood of new links can be easily estimated by counting the probabilities of choosing the two endpoints when given a mechanism. Moreover, this method is very efficient. Most of the computing time is consumed by the process of maximizing the likelihood, but this is a mature question in engineering. Therefore, it is possible to trace the evolution of complex systems in real time.

From the results of the real-world networks, we can clearly observe the combined action of popularity and clustering. The results here match our intuitive knowledge, but are more significant. For example, a network with high clustering coefficient25 is not necessarily driven by clustering mechanism, but probably the byproduct of another mechanism such as the spatially preferential attachment mechanism26. Moreover, the value of clustering coefficient is usually dependent on the scale of networks, i.e., large scale networks usually have small clustering coefficient compared with small scale networks. None of the above cases can limit the likelihood analysis method, because the measurement of the links is directly based on the probability of selecting the endpoints following the given mechanism. In addition, we also showed that the evolving mechanisms may remarkably change in time for some real networks. Due to the efficiency of the likelihood analysis method, it is possible to trace the evolution of the networks and even the mechanisms. Our results suggests that the multiple mechanisms of complex networks can be measured in a quantitatively unified and efficient way. In future, we expect that the framework in this study can be used to provide some insights in understanding complex systems.

Materials and Methods

Link Prediction Method

Given Inline graphic, a link prediction index can assign every non-observed link (including Inline graphic and Inline graphic) a score, according which we can rank these links in descending order. An index is regarded as better if it can order the links in Inline graphic with higher rankings than another index does. This is how we seek optimal Inline graphic in this paper.

To compare the indices in a quantified way, we introduce AUC (area under the receiver operating characteristic curve27) to measure the accuracy of prediction based on the rankings. It can be interpreted as the probability that a randomly chosen new link (a link in Inline graphic) is given a higher score than a randomly chosen nonexistent link. In the implementation, among Inline graphic times of independent comparisons, if there are Inline graphic times the new link having higher score and Inline graphic times the new link and the nonexistent link having the same score, we define the AUC value as17:

graphic file with name srep10350-m137.jpg

If all the scores are generated from an independent and identical distribution, the AUC value should be about 0.5. Therefore, the degree to which the AUC value exceeds 0.5 indicates how much better the algorithm performs than pure chance. Need to notice that, the calculation of AUC is based on statistical theory, so the result of equation (5) will be more approximate to the real value if we assign Inline graphic a larger number. We have discussed the proper value of Inline graphic in the book named Link Prediction28. That is, if we expect to get the AUC value with error less than 0.001 at the 90% confidence level, Inline graphic should be no less than 672400. So in our experiments, we set Inline graphic. The derivation process is presented in Supplementary Information.

Likelihood Analysis Method

In this method, we need to consider three cases for a chosen link Inline graphic: (i) either Inline graphic or Inline graphic is a new node, which appears after Inline graphic; (ii) both Inline graphic and Inline graphic are new nodes; (iii) both of them are old nodes.

For popularity mechanism, if one of them is new node, supposed as Inline graphic, then Inline graphic, where Inline graphic. If both of them are new nodes, Inline graphic. And if both of them are old nodes, Inline graphic

For clustering mechanism, once Inline graphic or/and Inline graphic are new nodes, no common neighbor they would share. Then we define, according to the implementation of clustering mechanism, Inline graphic if one of them is new node, and Inline graphic if both of them are new nodes. If both of Inline graphic and Inline graphic are old nodes, Inline graphic. Denote that, if Inline graphic and Inline graphic do not share any common neighbors, Inline graphic here needs be modified to keep Inline graphic away from 0. In such case, we re-define Inline graphic due to two reasons: (i) Inline graphic can not be 0, or else the product will be 0 too; (ii) Inline graphic must be small and may be variant for different networks. So we adopt the certain value which is not more than the probability of select one node following popularity mechanism.

For randomness mechanism, if one of Inline graphic and Inline graphic is new node, Inline graphic. If both of them are new nodes, Inline graphic. And if both of them are old nodes, Inline graphic.

Proof of Equation (5)

The proof of Inline graphicInline graphic can be reduced to proving Inline graphic. The number of common neighbors between Inline graphic and Inline graphic is equal to the number of the 2-steps paths, denoted as Inline graphic, where Inline graphic if the path Inline graphic exists, namely Inline graphic is the common neighbor of Inline graphic and Inline graphic; otherwise Inline graphic. Then Inline graphic. Given the nodes Inline graphic and Inline graphic, Inline graphic can be considered as the amount of the 2-steps paths (Inline graphic). That is to say, both Inline graphic and Inline graphic must be the neighbors of Inline graphic. Therefore, the amount of the 2-steps paths is equal to Inline graphic because Inline graphic, namely Inline graphic. Moreover, Inline graphic if Inline graphic is not connected to Inline graphic directly, we can eventually prove that Inline graphic.

Data Description

We collect nine networks and divide every one of them into two parts --- observed links and future links (corresponding to Inline graphic and Inline graphic respectively defined in the previous section), basing on the time-stamps. The basic features are listed in Table 1.

  1. AS — Autonomous system (AS) within Internet is a collection of connected Internet Protocol networks and routers under the control of one entity. Route-views Project collected the Internet at the AS level at many different times, and here we use the data of June 2006 to compose the Observed Links and that of December 2006 to compose the Future Links21,29.

  2. Internet — The Internet can be viewed as a collection of autonomous systems (AS) whose snapshots was created weekly by CAIDA (Center for Applied Internet Data Analysis). Mislove downloaded the entire history of their measurements which covered the period from January 5th, 2004 until July 9th, 200730. In this paper, we choose the date November 20th, 2006 as the watershed of Observed Links and Future Links so the size of future links can be approximated to 10% of observed links.

  3. SN — ScienceNet (www.sciencenet.cn) is a virtual community for Chinese-speaking scientists. This data consisting of two snapshots — July 22nd 2013 and August 12th 2013, is newly crawled from the web site by Xing Yu.

  4. Epinion — Epinions (www.epinions.com) is an online product rating site where users are connected by trust or distrust relationships. In the simplest case, we neglect the types of connections. The earliest link in the initial data31 was collected on September 1st, 2001, while the latest was on August 11th, 2003.

  5. Youtube — YouTube (www.youtube.com) is a popular video-sharing site that also involves a social network. The initial data, consisting of links created before Jan. 15th 2007, was collected by Mislove30.

  6. Flickr — Flickr (www.flickr.com) is a photo-sharing site based on a social network. This data is collected by Mislove et al.32 and consisting of Inline graphic users and Inline graphic links in total. Here we only use a small sample by choosing out the links with time stamps 2006-11-02 and 2006-11-03. The links created at 2006-11-03 are considered as future links and the rest of links compose the observed network.

  7. FB — Facebook (www.facebook.com) is a social networking service and has over one billion users. The initial data in33 are crawled between January 20th, 2009 and January 22nd, 2009. The time of link establishment is signed by a UNIX time-stamp unless it can not be determined. We set all the undetermined time-stamps as 1.

  8. FBC — This data is from www.facebook.com but different from the friendships in FB. In this data, if a user Inline graphic post to another user Inline graphic's wall on Facebook, the directed link will be created from Inline graphic to Inline graphic. Since users may write multiple posts on a wall or their own wall, the network collected in33 allowed multiple edges and loops. In this paper, we remove the loops and redundant edges (multiple edges which have appeared before).

  9. Coauthor — This is a collaboration network from the e-print arXiv, which covers scientific collaborations between authors whose papers are submitted to High Energy Physics - Theory category. The data covers papers in the period from January 1993 to April 200334. Notice that two authors may collaborate multi-times, which is simply represented by an unweighted link in this paper. The time-stamps are determined by their first collaboration.

Additional Information

How to cite this article: Zhang, Q.-M. et al. Measuring multiple evolution mechanisms of complex networks. Sci. Rep. 5, 10350; doi: 10.1038/srep10350 (2015).

Supplementary Material

Supporting Information
srep10350-s1.pdf (36.9KB, pdf)

Acknowledgments

We acknowledge the useful discussion with Junming Huang and Wen-Qiang Wang. We also want to thank Xing Yu for collecting data. This work is jointly supported by the National Natural Science Foundation of China under Nos. 11222543 and 11205042. XKX was supported by the National Natural Science Foundation of China (Nos. 61004104, 61374170) and CCF-Tencent Open Research Fund (AGR20130112). QMZ and YXZ acknowledge the support from the Program of Outstanding PhD Candidate in Academic Research by UESTC (Nos. YBXSZC20131034 and YBXSZC20131035) and China Scholarship Council (No. 201306070064 and 201206070003).

Footnotes

Author Contributions Z.Q.M., X.X.K. and Z.T. designed the experiments. Z.Q.M. and Z.Y.X. implemented the experiments. Z.Q.M. and X.X.K. interpreted the experimental findings. Z.Q.M. and Z.T. wrote the main manuscript, which was revised by all authors.

References

  1. Amaral L. A. N., Scala A., Barthélémy M. & Stanley H. E. Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97, 11149–11152 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Szabó G., Alava M. & Kertész J. Clustering in complex networks. Lect. Notes Phys. 650, 139–162 (2004). [Google Scholar]
  3. Backstrom L., Huttenlocher D. P., Kleinberg J. M. & Lan X. Group formation in large social networks: membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, KDD) pp 44–54 (2006). [Google Scholar]
  4. Newman M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002). [DOI] [PubMed] [Google Scholar]
  5. Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1, 226–251 (2004). [Google Scholar]
  6. Goldenberg A., Zheng A. X., Fienberg S. E. & Airoldi E. M. A survey of statistical network models. Found. Trends Mach. Learn. 2, 129–233 (2010). [Google Scholar]
  7. Barabási A. L. & Albert R. Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
  8. Zhou T., Medo M., Cimini G., Zhang Z. K. & Zhang Y. C. Emergence of scalefree leadership structure in social recommender systems. PLoS ONE 6, e20648 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Perotti J. I., Billoni O. V., Tamarit F. A., Chialvo D. R. & Cannas S. A. Emergent self-organized complex network topology out of stability constraints. Phys. Rev. Lett. 103, 108701 (2009). [DOI] [PubMed] [Google Scholar]
  10. McPherson M., Smith-Lovin L. & Cook J. Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001). [Google Scholar]
  11. Newman M. E. J. Clustering and preferential attachment in growing networks. Phys. Rev. E. 64, 025102 (2001). [DOI] [PubMed] [Google Scholar]
  12. Holme P. & Kim B. J. Growing scale-free networks with tunable clustering. Phys. Rev. E. 65, 026107 (2002). [DOI] [PubMed] [Google Scholar]
  13. Liu Z., Lai Y. C., Ye N. & Dasgupta P. Connectivity distribution and attack tolerance of general networks with both preferential and random attachments. Phys. Lett. A 303, 337–344 (2002). [Google Scholar]
  14. Papadopoulos F., Kitsak M., Serrano M. A., Boguñá M. & Krioukov D. Popularity versus similarity in growing networks. Nature 489, 537–540 (2012). [DOI] [PubMed] [Google Scholar]
  15. Xie Y. B. et al. Geographical networks evolving with an optimal policy. Phys. Rev. E. 75, 036106 (2007). [DOI] [PubMed] [Google Scholar]
  16. Wang W. Q., Zhang Q. M. & Zhou T. Evaluating network models: a likelihood analysis. Europhys. Lett. 98, 28004 (2012). [Google Scholar]
  17. Lü L. & Zhou T. Link Prediction in Complex Networks: a Survey. Physica A 390, 1150–1170 (2011). [Google Scholar]
  18. Zhang Q. M., Lü L., Wang W. Q., Zhu Y. X. & Zhou T. Potential theory for directed networks. PLoS ONE 8, e55437 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cannistraci C. V., Alanis-Lobato G. & Ravasi T. From link-prediction in brainconnectomes and protein interactomes to the local-community-paradigm in complexnetworks. Sci. Rep. 3, 1613 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Carmi S., Havlin S., Kirkpatrick S., Shavitt Y. & Shir E. A model of Internet topology using Inline graphic-shell decomposition. Proc. Natl. Acad. Sci. USA 104, 11150–11154 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhang G. Q., Zhang G. Q., Yang Q. F., Cheng S. Q. & Zhou T. Evolution of the Internet and its cores. New J. Phys. 10, 13027 (2008). [Google Scholar]
  22. Linben-Nowell D. & Kleinberg J. The link prediction problem for social networks. J. Am. Soc. Inf. Sci. Tec. 58, 1019–1031 (2007). [Google Scholar]
  23. Zhou T., Lü L. & Zhang Y. C. Predicting missing links via local information. Eur. Phys. J. B. 71, 623–630 (2009). [Google Scholar]
  24. Yan X. Y., Han X. P., Wang B. H. & Zhou T. Diversity of individual mobility patterns and emergence of aggregated scaling laws. Sci. Re 3, 2678 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Watts D. J. & Strogatz S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]
  26. Barthélemy M. Crossover from Scale-Free to Spatial Networks. Europhys. Lett. 63, 915 (2003). [Google Scholar]
  27. Hanely J. A. & McNeil B. J. The meaning and user of the area under a reciever operating characteristic (ROC) curve. Radiology 143, 29 (1982). [DOI] [PubMed] [Google Scholar]
  28. Lü L. & Zhou T. [Appendix A.1 The Selection of n in Calculating AUC] Link Prediction [261–264] (Higher Education Press, Beijing, 2013). [Google Scholar]
  29. Meyer D. University of Oregon RouteViews Project. (2005) Available at: http://www.routeviews.org/ (Date of access: 1st May 2014)
  30. Mislove A. E. Online social networks: Measurement, analysis, and applications to distributed information systems. Ph.D. dissertation. Rice University, Department of Computer Science (2009).
  31. Massa P. & Avesani P. Controversial users demand local trust metrics: an experimental study on epinions.com community. Proceedings of the 20th national conference on Artificial intelligence - Volume 1 pp 121-126 (2005).
  32. Mislove A., Koppula H. S., Gummadi K. P., Druschel P. & Bhattacharjee B. Growth of the Flickr social network. Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (ACM) pp 25-30 (2008).
  33. Viswanath B., Mislove A., Cha M. & Gummadi K. P. On the evolution of user interaction in facebook. Proceedings of the 2nd ACM Workshop on Online Social Networks (ACM) pp 37-42 (2009).
  34. Leskovec J., Kleinberg J. & Faloutsos C. Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (ACM TKDD) 1, 2 (2007). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
srep10350-s1.pdf (36.9KB, pdf)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES