Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Dec 30;118(2):e2013825118. doi: 10.1073/pnas.2013825118

True scale-free networks hidden by finite size effects

Matteo Serafino a, Giulio Cimini b,c,a,d, Amos Maritan e,f, Andrea Rinaldo g,h,1, Samir Suweis e,f,i, Jayanth R Banavar j,k, Guido Caldarelli l,m,d,n,1
PMCID: PMC7812829  PMID: 33380456

Significance

The generalized scale invariance of complex networks, whose trademark feature is the power law distributions of key structural properties like node degree, has recently been questioned on the basis of statistical testing of samples from model and real data. This has important implications on the dynamic origins of network self-organization and consequently, on the general interpretation of their function and resilience. However, a well-known mechanism of departure from scale invariance is the presence of finite size effects. Developed for critical phenomena, finite size scaling analysis assesses whether an underlying scale invariance is clouded by a sample limited in size. Our approach sorts out when we may reject the hypothesis that the inherent structure of networks is scale invariant.

Keywords: network form, degree distribution, power laws, finite size scaling, statistical physics

Abstract

We analyze about 200 naturally occurring networks with distinct dynamical origins to formally test whether the commonly assumed hypothesis of an underlying scale-free structure is generally viable. This has recently been questioned on the basis of statistical testing of the validity of power law distributions of network degrees. Specifically, we analyze by finite size scaling analysis the datasets of real networks to check whether the purported departures from power law behavior are due to the finiteness of sample size. We find that a large number of the networks follows a finite size scaling hypothesis without any self-tuning. This is the case of biological protein interaction networks, technological computer and hyperlink networks, and informational networks in general. Marked deviations appear in other cases, especially involving infrastructure and transportation but also in social networks. We conclude that underlying scale invariance properties of many naturally occurring networks are extant features often clouded by finite size effects due to the nature of the sample data.


Networks play a vital role in the development of predictive models of physical, biological, and social collective phenomena (13). A quite remarkable feature of many real networks is that they are believed to be approximately scale free: the fraction of nodes with k incident links (the degree) follows a power law p(k)kλ for sufficiently large value of k (4, 5). The value of the exponent λ as well as deviations from power law scaling provides invaluable information on the mechanisms underlying the formation of the network such as small degree saturation, variations in the local fitness to compete for links, and high degree cutoffs owing to the finite size of the network. Indeed, real networks are not infinitely large, and the largest degree of any network cannot be larger than the number of nodes. Finite size scaling (FSS) (612), firstly developed in the field of critical phenomena and renormalization group, is a useful tool for analyzing deviations from pure power law behavior as due to finite size effects. Here, we show that despite the essential differences between networks and critical phenomena, FSS provides a powerful framework for analyzing the scale-free nature of empirical networks.

The search of ubiquitous emergent properties occurring in several different systems and transcending the specific system details is a recurrent theme in statistical physics and complexity science (13). Indeed, the presence and the type of such “universal” law give insights on the driving processes or on the characteristic properties of the observed system. Notably, complex systems have the propensity to display “power law”-like relationship in many diverse observables (such as event sizes and centrality distribution, to name a few). In particular, the power law shape of the degree distribution, which is the hallmark of scale-free networks, leads to important emergent attributes such as self-similarity in the network topology, robustness to random failures, and fragility to targeted attacks. Notably, scale invariance extends far beyond the degree distribution, affecting many other quantities as weighted degree, betweenness (14), and degree–degree distance (15).

In the last decade, the existence of such power laws in complex networks [but also in other areas (16); e.g., law in language (17)] has been questioned (18). A reason of the shift in such conclusion is in the availability of larger (and new) datasets and especially, in improved statistical methods. Recently, Broido and Clauset (19) fitted a power law model to the degree distribution of a variety of empirical networks and suggested that scale-free networks are rare. Voitalov et al. (20) rebutted that scale-free networks are not as rare if deviations from pure power law behavior are permitted in the small degree regime. The different conclusions may depend on very fine but critical assumptions at the basis of the statistical test for the power law hypothesis. Moreover, a crucial point that is typically ignored but represents the condition for the proper use of maximum likelihood methods is the independence of the empirical observations (21). In this work, we tackle the problem of detecting power laws in networks from a different perspective, based on the machinery of FSS.

Statistical physics of critical phenomena teaches us that a system at criticality exhibits power law singularities of physical quantities, such as for example, the compressibility, the specific heat, and the density difference between the liquid and vapor, as well as the latent heat. Water at its critical point exhibits fluctuations at all scales between the molecular length scale and the size of the container, which could be macroscopically large. Moreover, one finds thoroughly mixed droplets of water and bubbles of gas. Indeed, any large part of the system looks like the whole—the system is self-similar. The length scale of these droplets and bubbles extends from the molecular scale up to the correlation length, which is a measure of the size of the largest droplet or bubble. The divergence of the correlation length in the vicinity of a phase transition at the thermodynamic limit thus suggests that properties near the critical point can be accurately described within an effective theory involving only long-range collective fluctuations of the system. However, both in experiments and in numerical simulations, the infinite size limit cannot be reached, and thus, one observes deviations from the predicted thermodynamics limit behavior. The FSS ansatz has been developed precisely to infer the singular behavior (i.e., the exponents determining the universality classes) of the physical properties of a system in the thermodynamic limit, having only information on the system properties at finite sizes.

FSS has yet a more general validity and does not require the existence of a phase transition or an evolution process. Indeed, even though it was initially used to study finite systems near the critical point of the corresponding infinite system, FSS can be actually applied to describe structures that are self-similar when observed in a certain range of scales. As an example, we consider a Cantor set where we stop the procedure to divide intervals in three parts, removing the middle one at a scale s0=3m. This corresponds to a fractal structure on scales between s0 and one and to a nonfractal structure on scale smaller than s0. If we measure the total length, L(s), of the set with a stick of length s=3n, we find L(s)=s1DF(s/s0) where F(x)=1 when x>1, whereas F(x)=x1D when x<1 and D=log32 is the Hausdorff–Besicovitch (or fractal) dimension of the Cantor set. Another illustration of FSS analysis is given by the truncated geometrical series S(x,N)=0N1xn. When x is close to one, it is easy to see that S(x,N)=t1F(tN), where t=1x and F(z)=1ez. As a matter of fact, the FSS approach has been used to test scale invariance (and self-similarity) also for noncritical systems such as (just to mention some very famous examples) polymers in confined geometries (22) and interfaces (23, 24). In view of the above, FSS can also be implemented on well-established models of scale-free networks [like, e.g., the Barabási–Albert model (4) or the Bak–Tang–Wiesenfeld toy model of self-organized criticality (25)] where the scale-free behavior is not an emergent property at a critical point. Whether or not the same hypotheses hold for real-world network does not undermine the possibility of applying FSS to them.

Employing the FSS machinery to test whether empirical networks display scale-free behavior in their degree distribution is not, however, straightforward. Unlike for physical systems, representations of a network at different scales are typically not available. Thus, in order to test whether a network shows a power law distribution of its degree k, we have constructed smaller size subsamples, effective representations of the underlying population, drawn in an unbiased manner. We then use the characteristics of the large original network as well as the derived subnetworks to test the scale-free hypothesis. Fig. 1 shows an illustration of this procedure for a snapshot of the structure of the internet at the level of autonomous systems (26). FSS of Networks provides a brief summary of FSS applied to network topology. Ratio of Moments Test presents an independent method of determining whether networks are scale free based on analyses of the size dependence of the ratio of moments of the degree distributions. Subsampling and Scaling Region provides information on the sampling scheme used to build subnetworks and on the region selected for the scaling analysis.

Fig. 1.

Fig. 1.

An illustrative example of the concept of how the underlying true scale invariance in a network may be clouded by a scale imposed by the sample size. If the degree distribution P(k) of the network is scale free, then small subsamples of the network will have the same distribution (i.e., the degree structure of the network will not be altered apart from deviations at high values of k where the cutoff because of sample size operates). Specifically, the P(k) vs. k log–log plots show the largest sample (Left); a reduced sample (Center), where for comparison the largest distribution is shown via gray dots; and the smallest subsample, where the two previous distributions are shown for comparative purposes (gray dots; Right). Any Anderson–Darling-like test of the sample being drawn from a scale-free distribution would fail. The network in this example is a snapshot of the structure of the internet at the level of autonomous systems (26).

In Results, we test the scale-free hypothesis (the power law behavior in the degree distributions) on around 200 large empirical networks [those considered in refs. 19 and 20]. Remarkably, we find that such a venerable hypothesis cannot be rejected for many (but not all) networks. Moreover, the two scaling exponents for such networks satisfy an additional scaling relationship, which derives from the shape of the degree cross-over in scale-free networks. We benchmark our results against the quality measure of the well-known scale-free graph introduced by Barabási and Albert (4). Further, we show that FSS allows discerning pure power laws from log-normal and Weibull distributions. In conclusion, our results support the claim that scale invariance is indeed a feature of many real networks, with finite size effects accounting for quantifiable deviations.

FSS of Networks

A scale-free network is postulated to have a degree distribution p(k)kλ beyond some lower-degree cutoff kmin. For an infinitely sized network, since kmin1, the exponent λ>1 in order for p(k) to be normalizable. In what follows, we will consider the cumulative distribution P(k)=kp(q)dqkγ where γ=λ1>0.

Networks are of course not infinitely large. In a network comprising N nodes, k can be at most equal to N1. This is the intrinsic limit on k given by the network size. Thus, it is plausible that, below some kc (cross-over value), the degree distribution follows a power law behavior as would be expected for an infinite network but falls more rapidly beyond kc. The FSS hypothesis states that

P(k,N)=kγf(kNd), [1]

where d<0. The remarkable simplifying feature of the scaling hypothesis is that P is not an arbitrary function of the two variables k and N but rather, k and N combine in a nontrivial manner to create a composite variable. The behavior of the system is fully defined by the two exponents, γ and d, and the scaling function f. The exponent d<0 so that, for an infinite size network (N), the argument of f approaches zero. A pure power law decay of P(k,N) with k for very large N requires that f(x) constant as x0. The additional normalization condition is f(x)0 sufficiently fast when x1. The finite size effects are quantified by the behavior of the function f as its argument increases (e.g., when kkc). For a network with a finite number of nodes, the degree distribution does not follow a pure power law but is modified by the function f (ref. 27 also has a discussion of finiteness in the context of growing network models).

A powerful way of assessing whether a network is scale invariant is to confirm the validity of the scaling hypothesis and determine the two exponents and the scaling function f by using the collapse plot technique. One may recast Eq. 1 as

P(k,N)kγ=f(kNd). [2]

Then, the path forward is simple. For networks belonging to the same class but with different N, one optimally selects two fitting parameters γ and d by seeking to collapse plots of P(k,N)kγ vs. kNd for different N on top of each other (28). The fidelity of the collapse plot provides a measure of self-similarity and scale-free behavior, the optimal parameters are the desired exponents, and the collapsed curve is a plot of the scaling function.

We start out with a single representation of an empirical network with N nodes. For purposes of the scaling collapse plot, we seek additional representative networks of smaller sizes. In order to accomplish this, we obtained the mean degree distributions of multiple subnetworks of sizes N4, N2, and 3N4, which were then collapsed onto each other and the original network to create a master curve. The quality S of the collapse plot is then measured as the mean square distance of the data from the master curve in units of SEs. S is thus like a reduced χ2 test and should be around one if the data really collapse to a single curve and much larger otherwise (29).

Note that as a measure of the size of a network (or subnetwork), one may use the number of nodes N or alternatively, the number of links E. The scaling function in this case reads as follows:

P(k,E)kγ=fE(kEdE), [3]

where the exponent γ is the same as before and the exponent dE<0 ought to be equal to the previously introduced exponent d for networks satisfying the FSS hypothesis (see the next section).

Ratio of Moments Test

A simple alternative and independent test of the scale-free hypothesis is to study the size dependence of the ratio between the ith and the (i1) th moments of k, for various i. The ith moment ki is defined to be

ki=kmindkki1kγf(kNd)Nd(iγ) [4]

provided i>γ. Instead, if iγ, ki converges to a constant value for N. Therefore, when i1>γ,

ki/ki1Nd, [5]

independently of i. Thus, for a scale-free network, a log–log plot of the ratio of consecutive moments vs. N is a straight line with slope d. Likewise,

ki=kmindkki1kγfE(kEdE)EdE(iγ) [6]

when i>γ; otherwise, ki goes to a constant for E. Therefore, when i1>γ,

ki/ki1EdE. [7]

The exponents d and dE are not independent for scale-free networks. On the one hand, Eqs. 4 and 6 imply ENd/dE. On the other, in general kE/NNd/dE1. Due to the above equations, k is constant for scale-free networks with γ>1, implying that d=dE. Thus, the difference between d and dE values (that we statistically assess through their Z score) provides an independent quality measure of the scale-free attributes of a network.

Subsampling and Scaling Region

In order to generate a subnetwork of a given size n<N, we pick n nodes at random among the N nodes of the original network, removing all of the other nodes and the links originating from them. It is well known that the subsampling procedure modifies the shape of the degree distribution of the network. In particular, subnetworks of scale-free networks are not scale free because of deviations at low k values (30) [this happens independently of the sampling scheme adopted (31)]. The problem of the left tail of the distribution, however, applies more generally because deviations from the scale-free behavior at low degrees are rather common in empirical and network models. Therefore, we perform the scaling analysis described in FSS of Networks and Ratio of Moments Test only for kkmin, where the lower bound of the scaling region kmin is chosen such that the empirical distribution of the original network and its best power law fit [with exponent Γ, computed with the maximum likelihood method of Clauset et al. (18)] (Materials and Methods) are as similar as possible above kmin (32). In SI Appendix, we show that this allows us to get rid of any deviations induced by the subsampling scheme. However, when the empirical distribution of the network deviates substantially from a power law over its entire domain, then the estimated kmin can become very large and may even diverge. In these cases, the number of nodes n* of the (sub-)network with kkmin becomes very small or vanishing, yielding an unstable or undefined collapse. We thus use n*lnN as a condition on the minimum number of nodes in each (sub-)network for the feasibility of the scaling analysis.

Results

To sum up, two independent statistical tests of the scale-free attributes of a network explained in FSS of Networks and Ratio of Moments Test are the quality of the collapse S (i.e., the reduced χ2 between data and master curve) and the compatibility of d and dE (measured through their Z score). SI Appendix, Fig. S1 outlines the flow of the analysis. In line with Broido and Clauset (19) and Voitalov et al. (20), we use these tests to define a classification for the degree distribution of empirical networks:

  • SSF (strong scale free) if S1 and ZddE1,

  • WSF (weak scale free) if S3 and ZddE3,

  • NSF (nonscale free) otherwise or when n*<lnN for the original network or any of its subnetworks.

Note the nestedness of the classification, for which an SSF network is also WSF.

Power Law and Poisson Distribution.

We start analyzing the reference cases of Barabási–Albert (4) and Erdős–Rényi (33) models whose behavior is known. In the former case, p(k)k3, whereas in the latter case, p(k)Poissonk¯(k). Fig. 2 shows that for a realization of the Barabási–Albert graph, the degree distributions of the (sub-)networks result in a collapse of very high quality. The power law exponent γ yielding the best collapse is consistent with the value Γ obtained by maximum likelihood fitting the degree distribution of the mother network with a power law (18). Additionally, the moments ratios are indeed parallel lines, with compatible slopes d and dE. More robust statistics are obtained by analyzing 1,000 realizations of the Barabási–Albert model (Fig. 3). Within this sample, 98% of the networks are classified as SSF, while 2% are classified as WSF. The estimated scaling exponents are all consistent with each other among the different realizations.

Fig. 2.

Fig. 2.

Scaling analysis on a numerical realization of the Barabási–Albert model. The network has N=104 nodes, and the minimum node degree is kmin=14. The best power law fit on this network yields Γ=1.89±0.02. Note that this value is smaller than Γ=2 because of deviations from the pure power law at small k’s: indeed, the theoretical p(k) in the Barabási–Albert model goes as [k(k+1)(k+2)]1 (34). A–C show results of the scaling analysis using the number of nodes as for Eqs. 2 and 5. A reports the dependence of various moment ratios on N; fitting these slopes yields d=0.358±0.035. B shows the collapse of the cumulative degree distributions when scaled with N. The best collapse is obtained with γ=1.89±0.06 and yields S=0.67. C shows how the quality of the collapse reported in A varies on moving away from the optimal value of γ. D and E further show results of the scaling analysis using the number of links as for Eqs. 3 and 7. In this case, the moment ratio test of D returns dE=0.351±0.031, while the best collapse of the cumulative degree distributions reported in E is obtained with γ=1.89±0.05 and yields S=0.66.

Fig. 3.

Fig. 3.

Empirical distribution of the quality of collapse S obtained from FSS analysis on 1,000 realizations of the Barabási–Albert graph (same parameters as Fig. 2). The distribution is well fitted by a log normal with μ=0.70±0.1 and σ=0.414±0.009.

For the Erdős–Rényi model, the estimated kmin for the degree distribution is so large that it is not possible to have (sub-)networks with number of nodes n*lnN (in principle, for this network, the kmin estimated from the Kolmogorov–Smirnov test should be larger than the largest degree of the network). As such, the Erdős–Rényi graph is classified as NSF. We obtained the same outcome in an ensemble of 1,000 realization of this network model.

Alternative Fat-Tail Distributions.

While the power law is the only distribution featuring scale invariance, there are other distributions characterized by a fat right tail that can resemble a power law in finite systems. Hence, determining which of these distributions better fits empirical network data is often a nontrivial task. In particular, the classic approach based on P values computed from a Kolmogorov–Smirnov test (Materials and Methods) is able to rule out some competing hypothesis but not to confirm one (18). Moreover, the hypothesis testing approach may fail when applied to regularly varying distributions (20). It is, therefore, meaningful to put our FSS approach to the test of alternative fat-tail distributions. Here, we consider the representative cases of the log-normal and Weibull distributions. The log-normal distribution p(lnk)=Normal(μ,σ) is characterized by parameters μ and σ the mean and SD, respectively, of the variable’s natural logarithm. For large values of σ, this distribution is highly skewed and features a fat tail for large k values. The Weibull distribution p(k)=(h/lh)kh1exp(k/l)h is characterized by parameters h (shape) and l (scale). The fat tail in this case appears for h0. We use the Viger–Latapy algorithm (35) to generate networks with these degree distributions.

Fig. 4 shows the scaling analysis for a realization of a network with log-normal p(k) and for another realization with Weibull p(k). In both cases, we observe that the quality of the collapse is poor and that the moment ratios are not parallel lines. Therefore, both networks are classified as NSF. Moreover, S as a function of γ does not show any minimum in the region around Γ (the minimum does exist but is located elsewhere). This means that the exponent estimated by FSS γ and that obtained from maximum likelihood power law fitting Γ are substantially different: the outcome of the scaling analysis is not consistent in this case. However, the result depends much on the choices of parameters characterizing the distribution. Indeed, Fig. 5 shows that the percentage of networks classified NSF decreases by increasing σ in the log-normal case, as well as by decreasing h in the Weibull case—up to a point where the variance of the distributions becomes so large that the scaling analysis can hardly distinguish these distributions from power laws at finite N. For these cases, the value of γ that minimizes S is indeed compatible with Γ.

Fig. 4.

Fig. 4.

Scaling analysis (with N) on a numerical realization of a log-normal graph with (σ,μ)=(0.8,1.8) (A, 1; B, 1; and C, 1) and of a Weibull graph with (h,l)=(0.6,1.8) (A, 2; B, 2; and C, 2). In both cases, the network has N=104 nodes. Log-normal graph: the best power law fit is obtained with Γ=2.90±0.12, the moment ratio tests yield d=0.209±0.033 and dE=0.208±0.033, and the best collapse is obtained with γ=2.40±0.37 and yields S=5.047. Weibull graph: the best power law fit is obtained with Γ=3.43±0.08, the moment ratio tests yield d=0.230±0.037 and dE=0.219±0.036, and the best collapse is obtained with γ=1.933±1.055 and yields S=3.271.

Fig. 5.

Fig. 5.

Outcome of the scaling analysis (with N) on log-normal and Weibull networks as a function of the parameters of the degree distributions (μ,σ) and (l,h), respectively. A, 1 and 2 show the percentage of networks classified as SSF, WSF, and NSF for varying σ at fixed μ=1 and for varying h at fixed l=3.5, respectively. These statistics are computed over ensembles of 2,000 networks for each choice of parameters σ and h. B, 1 and 2 show representative instances of the distribution in the range of parameters analyzed, whereas C, 1 and 2 display the corresponding value of the variance of the distribution. Note that we do not report results for varying μ at fixed σ nor for varying l at fixed h because we observe almost no dependency of the classification on these parameters.

Real-World Networks.

At last, we move to real network data. We consider a large set of empirical networks taken from the Index of Complex Networks (ICON) as well as from the Koblenz Network Collection (KONECT). These are the datasets used by Broido and Clauset (19) and Voitalov et al. (20). Materials and Methods has a discussion on how we built the dataset. Overall, we have networks belonging to 10 different categories: biological (protein–protein interaction), social (i.e., friendship and communication), affiliation, authorship (including coauthorship), citation, text (i.e., lexical), annotation (i.e., feature, folksonomy, rating), hyperlink, computer, and infrastructure. Fig. 6 shows results of the FSS analysis for selected network instances, whereas Fig. 7 and Table 1 summarize results of the scaling analysis for all of the networks considered. The main outcomes of the analysis are the following.

  • Fig. 7A: the scaling exponents d and dE obtained from the moment ratio test are compatible in most of the cases.

  • Fig. 7B: the value of γ computed from FSS is often in good agreement with Γ obtained from the maximum likelihood power law fit of the degree distribution (18).

  • Fig. 7C: the exponents γ and d of the scaling function are not independent but satisfy a universal relation d(γ+1)1, which derives from the nature of the degree cross-over in scale-free networks—namely, the maximum degree for which the power law behavior holds. According to Eq. 1, this is the value kc for which the scaling function f(x)0 [graphically speaking, when the master curve P(k)kγ falls down], corresponding to x1 whence kcNd. The analysis presented in Fig. 7C suggests that kcN1/(γ+1), and in agreement with theoretical results, we find that also the maximum degree of the network kmax scales in the same way (SI Appendix). However, this scaling behavior is somehow different from the kcN1/γ as predicted by hand-waving argument (4042), likely due to inner correlations in the networks that modify the value of the cross-over (41).

  • No particular relation between quality of collapse S and estimated exponent γ is found nor any clusterization of networks amenable to categories within the plane defined by these two variables (SI Appendix). However, this result is obtained when the different network categories are well balanced in the dataset because networks that are very similar tend instead to cluster together. This is for instance the case of protein interaction networks belonging to different species. In order to remove this artificial clustering effect, we have not considered in our dataset these (and other) cases of very similar networks nor repetitions of the same network (SI Appendix). This is the main reason why our dataset is apparently smaller than that used by Broido and Clauset (19).

  • Overall, as shown in Table 1, the 185 networks of our dataset are classified as SSF in 27% of the cases, WSF for 23%, and NSF for 50%. This classification, however, does vary substantially among the different network categories. On the one hand, biological networks are very often classified at least as WSF. The same happens for computer and hyperlink networks, with outliers given by the Gnutella peer-to-peer file sharing network [that has the same character of social networks (43)] and by some hyperlink networks restricted to specific domains, respectively. Citation and text networks are few in our analysis but are often scale free. On the other hand, infrastructure networks (i.e., road and flights network) are rarely scale free (with the notable exception of air traffic control systems), possibly because of the heavy cost of establishing a connection. Between these two extremes, there are the social and other kinds of networks (for instance, the well-known discussion of the Facebook case presented in refs. 44 and 45 and that of other information sharing social networks presented in ref. 46].

Fig. 6.

Fig. 6.

Scaling analysis (with N) on four real network instances. (A, 1; B, 1; C, 1; and D, 1) The 2005 version of the proteome-scale map for human binary protein–protein interactions (N=1,706, E=3,155) (36). (A, 2; B, 2; C, 2; and D, 2) The word adjacency graph extracted from the English text The Origin of Species by C. Darwin (N=7,724, E=46,281) (37). (A, 3; B, 3; C, 3; and D, 3) (Symmetrized) snapshot of the internet structure at the level of autonomous systems in 2007 (N=26,475, E=53,381) (38). (A, 4; B, 4; C, 4; and D, 4) The collaboration graph of authors of scientific papers from database systems and logic programming (DBLP) computer science bibliography (N=13,14,050, E=107,24,828) (39). A–C are analogous to those reported in Figs. 2 and 4, whereas D visually shows the classic plots of P(k) in double-logarithmic scale together with the plot of the estimated slope γ using FSS analysis.

Fig. 7.

Fig. 7.

Visual summary of results from the FSS analysis, in which each network dataset is represented as a point in a specific plane. A shows the relation between d and dE resulting from the moment ratio test, with the solid black line representing the identity. The other two panels refer to the scaling analysis with N. B shows the relation between γ computed from FSS and Γ from the maximum likelihood power law fit of the degree distribution (Materials and Methods). The solid line again represents the identity. C shows the relation between the exponents γ and d of the scaling function, with the solid black line representing the curve d=(γ+1)1 (the text has details).

Table 1.

Classification of empirical networks (split into categories)

Total Affiliation Annotation Authorship Biological Citation Computer Hyperlink Infrastructure Social Text
No. 185 8 38 15 30 5 13 14 12 39 11
SSF, % 27 63 21 27 40 40 39 22 0 13 55
WSF, % 23 12 24 20 30 0 38 21 17 18 27
NSF, % 50 25 55 53 30 60 23 57 83 69 18

For each category, we report the total number of networks and the percentages of SSF, WSF, and NSF instances. Detailed results on each network analyzed are in Dataset S1.

Discussion

Since the onset of network science, scale invariance of complex networks has been regarded as a universal feature present in real data (18, 4751) as well as reproduced in models (4, 34, 52–55). Thus, the recent claim by Broido and Clauset (19) that scale-free networks are rare created a stir, strengthening previous claims along the same direction (16, 18, 56). Voitalov et al. (20) replied to these arguments fitting data to generalized power laws: that is, regularly varying distributions p(k)=l(k)kλ [where l(k) is a function that varies slowly at infinity and thus, does not affect the power law tail]. By allowing deviations from the pure power law distribution at low k, they argued that scale-free networks are definitely not rare. Gerlach and Altmann (21) very recently touched on this issue, showing that correlations present in the data can lead to false rejections of statistical laws when using standard maximum likelihood recipes (in the case of networks, this can be important in the presence of degree–degree correlations).

In this work, we go beyond statistical arguments and apply powerful tools from the study of critical phenomena in physics to analyze a wide range of model and empirical networks. Here, we have shown that many of these networks spontaneously, without fine tuning, satisfy the FSS hypothesis, which in turn, supports the claim that complex networks are inherently scale free.

While a direct comparison with the results previously discussed would be interesting, the final results would not be meaningful, given the differences in the underlying hypotheses of the different models. We have shown how different hypotheses can lead to distinct results. The hypothesis underlying our approach, which came from results previously obtained in the field of statistical mechanics and critical phenomena, goes beyond the applications they were initially designed for and does not require the existence of a critical point. Together with previous work, our methodology fits in the bag of tools that a researcher can use in order to assess the scale-free character of a network.

Our scaling analysis is based on the extraction of small representations of the networks using a random node selection scheme. Of course, an intrinsic limitation of any rescaling method applied to network data is the impossibility to consider system sizes spanning orders of magnitude. As a further general remark, finding a robust method to rescale [or coarse grain (57, 58)] a network is still an open issue in the literature since networks are not embedded in any Euclidean space. Commonly used approaches lack generality since they are based on the choice of the embedding geometric space (59) or on the average path length (60). In order to avoid ad hoc assumptions, we decided to follow the simplest (although not necessarily the most accurate) scheme. As shown in SI Appendix, by averaging over many extractions of the subnetwork we are able to preserve the degree distribution of the original network, which is what we are interested in. Finally, note our claims regarding the self-similarity of the degree distribution, but we restrain ourselves in making general conclusions about the overall self-similarity of networks—this would involve the study of other quantities such as clustering, average path length, and so on (61).

Materials and Methods

Here, we report the steps to test the FSS hypothesis of Eq. 2 together with the moments ratio test of Eq. 5. Note that in order to test Eqs. 3 and 7, one uses the number of edges E (e) associated with each (sub-)network of size N (n) and replaces d with dE.

FSS Analysis.

Given an undirected network of size N, our analysis is based on the following steps.

  • 1)

    We compute the degree distribution p(k,N) and use the method of Clauset et al. (18, 32) to estimate the best-fitting power law parameters Γ+1 and kmin.

  • 2)

    We generate an ensemble of 100 subnetworks for each size n{N4,N2,3N4}. Each subsample is obtained by picking n nodes at random from the original network and by deleting all of the other nodes and the links incident to them. We then compute the mean degree distribution p(k,n) over each subnetwork ensemble.

  • 3)

    Both for the original network and for each subnetwork, we check whether the (average) number of nodes n* with kkmin is larger than lnN. If this condition is not met, we classify the network as NSF, and the analysis ends. Otherwise, we proceed by removing the region below kmin in both p(k,N) and each p(k,n) and renormalize them afterward. As explained in the text, this allows us to get rid of deviations at low degrees, including those induced by the subsampling (SI Appendix).

  • 4)

    Using the moment ratio test, we determine d (and its associated error) as follows. We compute a given moment ratio ki/ki1 on each (sub-)network of size n and use least squares to fit ln(ki/ki1) vs. lnn. We then average the resulting fit slope over different choices of the moments (indexed by i) to obtain d. Note that since this test is computationally less expensive than the collapse analysis (see below), we use more than four subnetwork sizes. In particular, we use 20 equally spaced values of n[N4,N], for each of which we compute the moments ratio (and associated error used as fit weight) over an ensemble of 100 n-sized subnetworks built as described above.

  • 5)

    For each (sub-)network size n{N4,N2,3N4,N}, we obtain the cumulative degree distribution P(k,n). We then determine the exponents γ and d (and their associated errors) that maximize the quality of the collapse plot (see below). Notably, the scaling exponent d obtained from the collapse is always compatible with that obtained from the moment ratio test. Hence, in order to decrease the computational cost of the method, one can in principle vary only γ while keeping d fixed at the value obtained from the moment ratios fit.

Quality of Collapse.

We now describe the procedure for deriving the master curve of the scaling function from the cumulative degree distributions of the various subnetworks, following the steps described in refs. 29 and 62. The key premise is that when these distributions are properly rescaled, they can be fitted by a single (master) curve. The quality of the collapse plot is then measured as the distance of the data from the master curve, and the collapse is good if all of the rescaled distributions overlap onto each other.

In practice for each (sub-)network size n{N4,N2,3N4,N}, we have the set {j} of ordered points for the cumulative degree distribution in the form {(kj,P(kj,n))}j. After applying the scaling laws, we have

xnj=kjndynj=P(kj,n)kjγ

so that xnj is the rescaled jth degree in the distribution of the n-sized subnetwork and ynj is the rescaled value of such distribution relative to the jth degree. We also assign an error on the latter quantity as dynj=dP(kj,n)kjγ, where dP(kj,n) is the Poisson error on the count P(kj,n) (SI Appendix).

The master curve Y is the function best fitting all these points. We define the quality of the collapse as

S=13|M|(n,j)M(ynjYnj)2dynj2+dYnj2, [8]

where Ynj and dYnj are the estimated position and SE of the master curve at xnj, while M is the set of terms of the sum (roughly, the set of points for which the curves for the various n overlap).

For each xnj, in order to define Ynj and dYnj we first need to select a set of points mnj as follows. In each of the other sets nn, we select (and put in mnj) the two points j and j+1 that best approximate xnj from below and above [i.e., the two points such that xnjxnjxn(j+1)]. If this procedure fails to select two points for each nn, then Ynj and dYnj are undefined at xnj, which thus does not contribute to S (this happens if set n is alone in this region of x and is the master curve by itself). Otherwise, we compute Ynj and dYnj using a linear fit through the selected points in (n,l)mnj, so that Ynj is the value of that straight line at xnj and dYnj is the associated SE:

Ynj=WxxWyWxWxyη+xnjWWxyWxWyη [9]
dYnj2=1η(Wxx2xnjWx+xnj2W), [10]

where wnl=1/dynl2 for the fit weights and W=(nl)mnjwnl, Wx=(nl)mnjwnlxnl, Wy=(nl)mnjwnlynl, Wxx=(nl)mnjwnlxnl2, Wxy=(nl)mnjwnlxnlynl, and η=WWxxWx2 for the fit parameters.

The quality of the collapse S measures the mean square distance of the sets to the master curve in units of SEs, analogously to a χ2 test (29). The number of degrees of freedom can be estimated by noting that each of the |M| points of the sum of S has in turn three intrinsic degrees of freedom: |m| points as described above (six in our case) minus two from computing mean and variance of Y, minus one. Hence, by using 3|M| as normalization factor, S should be around one if the data really collapse to a single curve and much larger otherwise.

We optimize the quality S of the collapse by varying the scaling exponents γ in the interval Γ0.5γΓ+0.5 and d in the interval d0.1γd+0.1. The errors associated with γ and d are estimated with an S+1 analysis: Δγ is such that S(γ+Δγ)=S(γ)+1, and Δd is such that S(d+Δd)=S(d)+1.

Dataset

We extract a collection of real network data from the ICON at https://icon.colorado.edu/ as well as the KONECT at https://west.uni-koblenz.de/konect. The full list of networks we consider together with detailed results of the FSS analysis is reported in Dataset S1. To define the dataset, we select networks (removing duplicates appearing in both ICON and KONECT) according to the following criteria.

First, to allow for a reliable scaling analysis, we only use networks with N>1,000 and E>1,000 (for computational reasons, we did not consider networks with more than 50 million links). We then include undirected networks, as well as the undirected version of both directed and bipartite networks. Similarly, we consider binary networks as well as the binarized version of weighted and multiedge networks. We, however, ignore networks that are marked as incomplete in the database. Importantly, among database entries that possibly represent the same real-world network we select only one (or at most, a few) entry, and consistently, we do the same for temporal networks (when there is only one snapshot, we ignore the time stamp of links).

In practice, in KONECT we select only the Wikipedia-related networks in English language. For ICON, the implications are more profound. We ignore interactomes of the same species extracted from different experiments, the (almost 100) fungal growth networks, the (more than 100) Norwegian boards of directors graphs, the (more than 100) Center for Applied Internet Data Analysis snapshots denoting autonomous system relationships on the internet, networks of software function for Callgraphs, and digital circuits ITC99 and ISCAS89. We consider only one instance of Gnutella peer-to-peer file sharing network, as well as a few instances of the (more than 50) within-college Facebook social networks and of the (about 50) US states road networks. Among the (more than 100) Kyoto Encyclopedia of Genes and Genomes metabolic networks, we select 17 species trying to balance the different taxonomies.

Thus, in our analysis, we do employ the same data source used by Broido and Clauset (19), but we avoid overrepresented network instances. As explained in the text, this procedure removes the clustering of similar networks as shown in the SI Appendix and leads to less biased conclusions on the scale-free nature of networks belonging to different categories.

Supplementary Material

Supplementary File
Supplementary File
pnas.2013825118.sd01.xlsx (59.4KB, xlsx)

Acknowledgments

G. Cimini and G. Caldarelli acknowledge support from European Project SoBigData++ Grant GA. 871042. G. Caldarelli acknowledges support from Humane-AI-Net grant 952026. A.M. acknowledges support from the University of Padova through “Excellence Project 2018” of the Cariparo Foundation.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2013825118/-/DCSupplemental.

Data Availability.

All raw network data are available from ICON and KONECT, while results are in Dataset S1.

References

  • 1.Barabási A. L., Network Science (Cambridge University Press, 2016). [Google Scholar]
  • 2.Caldarelli G., Scale-Free Networks (Oxford University Press, 2007). [Google Scholar]
  • 3.Cimini G., et al. , The statistical physics of real-world networks. Nat. Rev. Phys. 1, 58–71 (2019). [Google Scholar]
  • 4.Barabási A. L., Albert R., Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
  • 5.Klarreich E., Scant evidence of power laws found in real-world networks. Quanta Magazine, 15 February 2018. https://www.quantamagazine.org/scant-evidence-of-power-laws-found-in-real-world-networks-20180215/. Accessed 15 December 2020.
  • 6.Fisher M. E., The theory of equilibrium critical phenomena. Rep. Prog. Phys. 30, 615–730 (1967). [Google Scholar]
  • 7.Stanley H. E., Introduction to Phase Transitions and Critical Phenomena (Oxford University Press, 1971). [Google Scholar]
  • 8.Binder K., Heermann D., Monte Carlo Simulation in Statistical Physics: An Introduction (Springer-Verlag, 1992). [Google Scholar]
  • 9.Kim J. K., De Souza A. J., Landau D. P., Numerical computation of finite size scaling functions: An alternative approach to finite size scaling. Phys. Rev. E 54, 2291–2297 (1996). [DOI] [PubMed] [Google Scholar]
  • 10.Stanley H. E., Scaling, universality, and renormalization: Three pillars of modern critical phenomena. Rev. Mod. Phys. 71, S358–S366 (1999). [Google Scholar]
  • 11.Redner S., A Guide to First-Passage Processes (Cambridge University Press, 2001). [Google Scholar]
  • 12.Corral Á., Garcia-Millan R., Font-Clos F., Exact derivation of a finite-size scaling law and corrections to scaling in the geometric Galton-Watson process. PloS One 11, e0161586 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Goldenfeld N., Kadanoff L. P., Simple lessons from complexity. Science 284, 87–89 (1999). [DOI] [PubMed] [Google Scholar]
  • 14.Barrat A., Barthélemy M., Pastor-Satorras R., Vespignani A., The architecture of complex weighted networks. Proc. Natl. Acad. Sci. U.S.A. 101, 3747–3752 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou B., Meng X., Stanley H. E., Power-law distribution of degree-degree distance: A better representation of the scale-free property of complex networks. Proc. Natl. Acad. Sci. U.S.A. 117, 14812–14818 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Stumpf M. P. H., Porter M. A., Critical truths about power laws. Science 335, 665–666 (2012). [DOI] [PubMed] [Google Scholar]
  • 17.Moreno-Sánchez I., Font-Clos F., Corral Á., Large-scale analysis of Zipf’s law in English texts. PloS One 11, e0147073 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Clauset A., Shalizi C. R., Newman M. E., Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009). [Google Scholar]
  • 19.Broido A. D., Clauset A., Scale-free networks are rare. Nat. Commun. 10, 1017 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Voitalov I., van der Hoorn P., van der Hofstad R., Krioukov D., Scale-free networks well done. Phys. Rev. Res. 1, 033034 (2019). [Google Scholar]
  • 21.Gerlach M., Altmann E. G., Testing statistical laws in complex systems. Phys. Rev. Lett. 122, 168301 (2019). [DOI] [PubMed] [Google Scholar]
  • 22.De Gennes P. G., Gennes P. G., Scaling Concepts in Polymer Physics (Cornell University Press, 1979). [Google Scholar]
  • 23.Safran S., Statistical Thermodynamics of Surfaces, Interfaces, and Membranes (CRC Press, 2018). [Google Scholar]
  • 24.Nelson D. R., Piran T., Weinberg S., Statistical Mechanics of Membranes and Surfaces (World Scientific, 2004). [Google Scholar]
  • 25.Bak P., Tang C., Wiesenfeld K., Self-organized criticality: An explanation of the 1/f noise. Phys. Rev. Lett. 59, 381–384 (1987). [DOI] [PubMed] [Google Scholar]
  • 26.Newman M., Autonomous systems (dimacs10). http://www-personal.umich.edu/∼mejn/netdata/. Accessed 15 December 2020.
  • 27.Krapivsky P. L., Redner S., Finiteness and fluctuations in growing networks. J. Phys. Math. Gen. 35, 9517–9534 (2002). [Google Scholar]
  • 28.Bhattacharjee S. M., Seno F., A measure of data collapse for scaling. J. Phys. Math. Gen. 34, 6375 (2001). [Google Scholar]
  • 29.Houdayer J., Hartmann A. K., Low-temperature behavior of two-dimensional Gaussian Ising spin glasses. Phys. Rev. B 70, 014418 (2004). [Google Scholar]
  • 30.Stumpf M. P. H., Wiuf C., May R. M., Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Natl. Acad. Sci. U.S.A. 102, 4221–4224 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lee S. H., Kim P. J., Jeong H., Statistical properties of sampled networks. Phys. Rev. E 73, 016102 (2006). [DOI] [PubMed] [Google Scholar]
  • 32.Alstott J., Bullmore E., Plenz D., powerlaw: A python package for analysis of heavy-tailed distributions. PloS One 9, e85777 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Erdös P., Rényi A., On random graphs. Publ. Math. Debr. 6, 290–297 (1959). [Google Scholar]
  • 34.Dorogovtsev S. N., Mendes J. F. F., Samukhin A. N., Structure of growing networks with preferential linking. Phys. Rev. Lett. 85, 4633–4636 (2000). [DOI] [PubMed] [Google Scholar]
  • 35.Viger F., Latapy M., “Efficient and simple generation of random simple connected graphs with prescribed degree sequence” in Computing and Combinatorics, Wang L., Ed. (Springer, Berlin, Germany, 2005), pp. 440–449. [Google Scholar]
  • 36.Stelzl U., et al. , A human protein-protein interaction network: A resource for annotating the proteome. Cell 122, 957–968 (2005). [DOI] [PubMed] [Google Scholar]
  • 37.Milo R., et al. , Superfamilies of evolved and designed networks. Science 303, 1538–1542 (2004). [DOI] [PubMed] [Google Scholar]
  • 38.Leskovec J., Kleinberg J., Faloutsos C., Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 2 (2007). [Google Scholar]
  • 39.Ley M., “The dblp computer science bibliography: Evolution, research issues, perspectives” in String Processing and Information Retrieval, Laender A. H. F., Oliveira A. L., Eds. (Springer, Berlin, Germany, 2002), pp. 1–10. [Google Scholar]
  • 40.Dorogovtsev S. N., Mendes J. F. F., Evolution of networks. Adv. Phys. 51, 1079–1187 (2002). [Google Scholar]
  • 41.Boguñá M., Pastor-Satorras R., Vespignani A., Cut-offs and finite size effects in scale-free networks. Euro. Phys. J. B 38, 205–209 (2004). [Google Scholar]
  • 42.Aiello W., Chung F., Lu L., A random graph model for power law graphs. Exp. Math. 10, 53–66 (2001). [Google Scholar]
  • 43.Wang Y., Yun X., Li Y., “Analyzing the characteristics of Gnutella overlays” in Fourth International Conference on Information Technology (ITNG’07) (IEEE, New York, NY, 2007), pp. 1095–1100. [Google Scholar]
  • 44.Gjoka M., Kurant M., Butts C. T., Markopoulou A., “Walking in Facebook: A case study of unbiased sampling of osns” in 2010 Proceedings IEEE INFOCOM (IEEE, New York, NY, 2010), pp. 1–9. [Google Scholar]
  • 45.Ugander J., Karrer B., Backstrom L., Marlow C., The anatomy of the Facebook social graph. https://arxiv.org/abs/1111.4503 (18 November 2011).
  • 46.Zhou T., Medo M., Cimini G., Zhang Z. K., Zhang Y. C., Emergence of scale-free leadership structure in social recommender systems. PloS One 6, e20648 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Faloutsos M., Faloutsos P., Faloutsos C., “On power-law relationships of the Internet topology” in ACM SIGCOMM Computer Communication Review (ACM Press, New York, NY, 1999), vol. 29, pp. 251–262. [Google Scholar]
  • 48.Adamic L. A., Huberman B. A., Power-law distribution of the world wide web. Science 287, 2115 (2000). [Google Scholar]
  • 49.Caldarelli G., Marchetti R., Pietronero L., The fractal properties of Internet. Europhys. Lett. 52, 386–391 (2000). [Google Scholar]
  • 50.Goh K. I., Oh E., Jeong H., Kahng B., Kim D., Classification of scale-free networks. Proc. Natl. Acad. Sci. U.S.A. 99, 12583–12588 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Mislove A., Marcon M., Gummadi K. P., Druschel P., Bhattacharjee B., “Measurement and analysis of online social networks” in Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC ’07 (ACM, New York, NY, 2007), pp. 29–42. [Google Scholar]
  • 52.Bianconi G., Barabási A., Bose-Einstein condensation in complex network. Phys. Rev. Lett. 86, 5632–5635 (2001). [DOI] [PubMed] [Google Scholar]
  • 53.Caldarelli G., Capocci A., De Los Rios P., Muñoz M. A., Scale-free networks from varying vertex intrinsic fitness. Phys. Rev. Lett. 89, 258702 (2002). [DOI] [PubMed] [Google Scholar]
  • 54.Medo M., Cimini G., Gualdi S., Temporal effects in the growth of networks. Phys. Rev. Lett. 107, 238701 (2011). [DOI] [PubMed] [Google Scholar]
  • 55.Mitzenmacher M., A brief history of generative models for power-law and log-normal. Internet Math. 1, 226–251 (2004). [Google Scholar]
  • 56.Tanaka R., Scale-rich metabolic networks. Phys. Rev. Lett. 94, 168101 (2005). [DOI] [PubMed] [Google Scholar]
  • 57.Song C., Havlin S., Makse H. A., Origins of fractality in the growth of complex networks. Nat. Phys. 2, 275–281 (2006). [Google Scholar]
  • 58.Papadopoulos F., Kitsak M., Serrano M. Á., Boguná M., Krioukov D., Popularity versus similarity in growing networks. Nature 489, 537–540 (2012). [DOI] [PubMed] [Google Scholar]
  • 59.García-Pérez G., Boguñá M., Serrano M. A., Multiscale unfolding of real networks by geometric renormalization. Nat. Phys. 14, 583–589 (2018). [Google Scholar]
  • 60.Song C., Havlin S., Makse H. A., Self-similarity of complex networks. Nature 433, 392–395 (2005). [DOI] [PubMed] [Google Scholar]
  • 61.Kim J. S., Goh K. I., Kahng B., Kim D., Fractality and self-similarity in scale-free networks. New J. Phys. 9, 177 (2007). [Google Scholar]
  • 62.Melchert O., autoscale.py–A program for automatic finite-size scaling analyses: A user’s guide. https://arxiv.org/abs/0910.5403 (28 October 2009).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
Supplementary File
pnas.2013825118.sd01.xlsx (59.4KB, xlsx)

Data Availability Statement

All raw network data are available from ICON and KONECT, while results are in Dataset S1.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES