Skip to main content
PLOS One logoLink to PLOS One
. 2011 Jul 27;6(7):e22557. doi: 10.1371/journal.pone.0022557

Fast Computing Betweenness Centrality with Virtual Nodes on Large Sparse Networks

Jing Yang 1,*, Yingwu Chen 1
Editor: Marco Tomassini2
PMCID: PMC3144890  PMID: 21818337

Abstract

Betweenness centrality is an essential index for analysis of complex networks. However, the calculation of betweenness centrality is quite time-consuming and the fastest known algorithm uses Inline graphic time and Inline graphic space for weighted networks, where Inline graphic and Inline graphic are the number of nodes and edges in the network, respectively. By inserting virtual nodes into the weighted edges and transforming the shortest path problem into a breadth-first search (BFS) problem, we propose an algorithm that can compute the betweenness centrality in Inline graphic time for integer-weighted networks, where Inline graphic is the average weight of edges and Inline graphic is the average degree in the network. Considerable time can be saved with the proposed algorithm when Inline graphic, indicating that it is suitable for lightly weighted large sparse networks. A similar concept of virtual node transformation can be used to calculate other shortest path based indices such as closeness centrality, graph centrality, stress centrality, and so on. Numerical simulations on various randomly generated networks reveal that it is feasible to use the proposed algorithm in large network analysis.

Introduction

Networks, especially complex networks, have been extensively studied during the last decade [1][3]. Owing to the ability to gather and analyze large scale data using computers and communication networks, it is quite common to see studies on networks with millions of vertices (nodes) nowadays. The shift of studies from simple small graphs to large complex networks have increasingly contributed new findings of critical phenomena and development of theories in many fields, such as the scale-free distribution of network degrees [4], [5], burstness of human behaviors [6], vulnerability of internet networks [7], [8], and so on [1][3], [9].

However, the computation of several network properties, such as the shortest paths, betweenness centrality and closeness centrality, are hindered by the large computation complexity [3], [10]. As a result, many large-scale networks are regarded as unweighted when the above measures are reported [2], [3]. Large efforts have been made to improve the efficiency of algorithms for calculating those network properties [10], [11]. Take the betweenness centrality, for example [12], [13]: for a weighted network Inline graphic with Inline graphic nodes and Inline graphic edges, the naive algorithm requires Inline graphic time and Inline graphic storage, regardless of the algorithms implemented to find the shortest paths. A much faster algorithm proposed by Brandes [14], on the other hand, can calculate the betweenness centrality in Inline graphic time and Inline graphic space when the shortest paths are calculated by Dijkstra's algorithm implemented with a Fibonacci heap. Parallel algorithms are also proposed to improve the efficiency for the calculation of betweenness centrality [10], [11], [15][21]: for example, Bader and Madduri [10] proposed a betweenness centrality algorithm on a high-end shared memory symmetric multiprocessor and multithreaded architectures, with which is “possible” to achieve the computation in Inline graphic time with access conflicts, where Inline graphic is the number of processors used. However, the parallel algorithms requires much more complex programming and are highly dependent on the hardwares: for example, in Bader and Madduri's study [10], they used an IBM p5 570 on 16 processors and utilized 20GB RAM. These equipments are obviously not adaptable for general network researchers.

To circumvent the difficulties in calculating betweenness centrality with large time complexity, we propose a new algorithm for integer-weighted networks in this paper. By replacing the weighted edges with connected virtual nodes, the new algorithm computes the betweenness centrality in Inline graphic time and Inline graphic space, with Inline graphic and Inline graphic being the average edge weight and average degree of the network, respectively.

Methods

The Brandes' Algorithm

Given a network Inline graphic, with Inline graphic the number of nodes and Inline graphic the number of edges, for the purpose of this study, we consider strongly connected networks [22] with no self loops (acyclic). Let Inline graphic be the weight matrix of Inline graphic, where Inline graphic is the weight on edge Inline graphic. In real practice, Inline graphic can be distances between airports, information flows between computers, traffic loads between cities, etc.

Let Inline graphic denote the number of shortest paths from node Inline graphic to Inline graphic, and Inline graphic be the number of shortest paths from Inline graphic to Inline graphic that pass through Inline graphic, then the betweenness centrality of node Inline graphic is defined as [13], [14]:

graphic file with name pone.0022557.e038.jpg (1)

From the definition we can see that betweenness centrality is the sum of the fraction of shortest paths over all pairs of nodes passing through the node, high betweenness centrality indicates that a node can reach others (or be reached by others) with relatively short paths, or the node lies on considerable fraction of shortest paths connecting others. In many fields, the betweenness centrality can be regarded as a measure of the extent to which the node has control over information flowing between others, and it is thus a core index for evaluating the importance of nodes in the network [13], [23]. For example, in the study of networks vulnerability to attacks, the removal of nodes with the highest betweenness centrality is shown to be one of the most harmful strategies that can break down the networks [8].

A straightforward way of calculating the betweenness centrality then use the following steps:

Step 1 Compute the length and number of shortest paths between all pairs of nodes;

Step 2 For each node Inline graphic, calculate Inline graphic (pair dependency) for each pair and sum them up.

Obviously, the complexity of the naive algorithm is dominated by the second step which requires Inline graphic time summation and Inline graphic storage of pair dependencies. To introduce Brandes' algorithm, we first define the set of predecessors of node Inline graphic on the shortest paths from Inline graphic:

graphic file with name pone.0022557.e045.jpg (2)

where Inline graphic is the distance of the shortest path from Inline graphic to Inline graphic. Then the number of shortest paths from Inline graphic to Inline graphic can be calculated as:

graphic file with name pone.0022557.e051.jpg (3)

To eliminate the need for explicit summation of all pair dependencies, Brandes [14] defines the dependency of node Inline graphic as:

graphic file with name pone.0022557.e053.jpg (4)

Inline graphic has the recursive property that

graphic file with name pone.0022557.e055.jpg (5)

Note that Inline graphic is merely a partial sum of Eq. (1), then the betweenness centrality can be expressed by:

graphic file with name pone.0022557.e057.jpg (6)

The summation of pair dependencies is then reduced to accumulation of dependencies defined by Eq. (5). Specifically, given the shortest paths from Inline graphic in Inline graphic, the array storing Inline graphic for all nodes can be recursively calculated according to Eq. (5), by traversing the nodes in non-increasing order of their distances from Inline graphic. An illustrative algorithm is shown in Algorithm 1. We can see that the calculation for Step 2 is now in Inline graphic time and Inline graphic space, then the calculation complexity of betweenness centrality is determined by the shortest path algorithms used in Step 1. Using Dijkstra's algorithm implemented with Fibonacci heap [24], which requires Inline graphic time for the single source shortest path problem [25], the betweenness centrality can be computed by Brandes' algorithm in Inline graphic time and Inline graphic space on weighted networks [14].

Computing Betweenness Centrality with Virtual Nodes

Brandes' algorithm has greatly reduced the computation burden for betweenness centrality, however, the time complexity is still too high for networks with millions of nodes since the shortest path algorithm would cost a lot of computation time anyway. In this section, we propose a new algorithm that can reduce the time complexity in Step 1, such that the betweenness centrality can be calculated within reasonable time under certain conditions.

Replacement of Weighted Edges

Our new algorithm originates from the idea that an integer-weighted network can be broken down into a simple unweighted network with virtual nodes, such that the calculation of shortest paths in Step 1 can be solved as a breadth-first search (BFS) problem.

Algorithm 1: Brandes' algorithm [14] .

1 Inline graphic

2 for Inline graphic do

3  [Inline graphic] = single source shortest path algorithm()

  /*Inline graphicset of predecessors for shortest paths from Inline graphic to Inline graphic;*/

  /*Inline graphicarray storing the number of shortest paths from Inline graphic passing through Inline graphic; */

  /*Inline graphicstack storing the distances of nodes from Inline graphic in non-increasing order; */

  /*accumulate dependency from the most distant nodes */

4  Inline graphic

5  while Inline graphic not empty do

6   pop Inline graphic

7   for Inline graphic do Inline graphic

8   if Inline graphic then Inline graphic

9  end

10 end

Figure 1 illustrates the representation of an undirected weighted network by an undirected unweighted network with three additional virtual nodes. We can see that edge Inline graphic and Inline graphic are replaced by 3 and 2 unit edge segments with two and one virtual nodes inserted, respectively. The number of virtual nodes to be inserted on a weighted edge Inline graphic, is then Inline graphic.

Figure 1. Illustration of representing the weighted network (a) by an unweighted network with virtual nodes (b).

Figure 1

Let Inline graphic be the unweighted representation of Inline graphic with virtual nodes, where Inline graphic with Inline graphic the set of virtual nodes, then the number of virtual nodes in Inline graphic, is Inline graphic, where Inline graphic is the average edge weight.

Virtual Node Based Algorithm for Betweenness Centrality. Obviously, the insertion of virtual nodes does not change the distances between pairs of nodes in Inline graphic and consequently the number of shortest paths between nodes is the same as in Inline graphic. The calculation of shortest paths on Inline graphic can then be solved by the BFS algorithm, instead of the traditional Dijkstra's algorithm.

However, before applying the BFS on Inline graphic to calculate the betweenness centrality for nodes in Inline graphic, there is at least one problem to be solved: to use the existing theories on summation of pair dependency in Algorithm 1, the predecessors of nodes in Inline graphic recorded during the shortest path calculation in Inline graphic, should be kept as the same as if they were calculated by any shortest path algorithm in Inline graphic. This can be achieved as follows: suppose the BFS finds a shortest path from Inline graphic to Inline graphic: Inline graphic, where Inline graphic, Inline graphic are two virtual nodes inserted on edge Inline graphic, then the predecessor of Inline graphic, which is Inline graphic, can be passed through Inline graphic to the next non-virtual node Inline graphic:

graphic file with name pone.0022557.e114.jpg

An implementation of the above process is presented in Algorithm 2, the steps for accumulation of dependency are identical as the Brandes' algorithm and thereby are omitted.

Algorithm 2: Virtual node algorithm for betweenness centrality

1 Inline graphic

2 for Inline graphic do

3  Inline graphicempty stack;

4  Inline graphicempty list, Inline graphic;

5  Inline graphic, Inline graphic; Inline graphic;

6  Inline graphic, Inline graphic; Inline graphic;

7  Inline graphicempty queue;

8  enqueue Inline graphic;

9  while Inline graphic not empty do

10   dequeue Inline graphic;

11   push Inline graphic;

12   foreach neighbor Inline graphic of Inline graphic do

13    if Inline graphic then /*visit Inline graphic the first time*/

14     enqueue Inline graphic;

15     Inline graphic;

16    end

17    if Inline graphic then

18     Inline graphic;

19     if Inline graphic then

20      append Inline graphic;

21      else /*if Inline graphic is a virtual node, retrieve the latest non-virtual node as predecessor*/

22      append Inline graphic;

23     end

24    end

25   end

26  end

27  accumulate dependency()/*as shown in Algorithm 1

28 end

Note that in Algorithm 2, we don't need to calculate shortest paths between virtual nodes. The BFS then requires Inline graphic time. For the sake of clarity, let Inline graphic be the average degree of nodes in Inline graphic such that Inline graphic, then we have Inline graphic. The computation of betweenness centrality with virtual nodes (the VN algorithm), is dominated by the BFS and has a time complexity of Inline graphic, and needs Inline graphic space.

Compared with Brandes' algorithm, we can see that the VN algorithm will perform better when Inline graphic, that is, Inline graphic. We henceforth denote Inline graphic as the critical threshold for the average edge weight on a network; if Inline graphic, the VN algorithm will be able to calculate the betweenness centrality faster than Brandes' algorithm. Figure 2 shows the distribution of Inline graphic over the domain of combinations of different network sizes and average degrees. We can see that the advantage of the VN algorithm becomes evident when the network is large and sparse, for example, when the network size is 1 million (Inline graphic), and the average degree is 5, the VN algorithm would be faster for those with Inline graphic; for the same average degree, Inline graphic increases to 7 when the network size reaches 1 billion (Inline graphic). For an average degree of 10, Inline graphic lies beyond 3 for networks larger than 1 million. Note that many large-scale networks are reported to have rather small average degrees; for example, the mobile communication network reported in [26], contains 4.6 million nodes and an average of 3.04 edges. The Internet network [27], math co-authorship network [28], and power grid [29] reported in [1], are found to have average degrees of 3.5–4.1, 3.9 and 2.7, respectively. Networks with low integer weights are also reported in the literature; for example, the neural network of the Caenorhabditis elegans worm [29], the communication network of the online community [30], and the political support network of the US Senate [31], have average edge weights of 3.74, 2.95 and 3.74, respectively.

Figure 2. Critical threshold for average weights (Inline graphic) on networks with specified network size (Inline graphic) and average degree (Inline graphic).

Figure 2

Results and Discussion

Numerical Experiments

To evaluate the algorithms, we generate scale-free networks [32] with different network sizes and edge weights, and the execution time of VN algorithm and Brandes' algorithm are then tested on these networks. Algorithms are coded in C and run on a PC with an Intel Core 2 Quad CPU (2.66 GHz, 6 Mb) and 6 Gb of RAM, all the following reported running times are the average of 100 simulations.

It is intuitive that when seldom edges in the network are weighted, the VN algorithm will calculate the betweenness centrality approximately as fast as the BFS, which is much faster than the Brandes' algorithm. For example, when the network size is 100,000 and we set the average degree as 2 and take 1000 edges to be weighted with random numbers generated from 1 to 10, the execution time for Brandes' algorithm is 8460 seconds, while the VN algorithm needs only 3830 seconds, which is around 1.3 hours faster than the Brandes' algorithm. Since when Inline graphic becomes large, we have Inline graphic, more time can be expected to be saved in larger networks with fixed number of weighted edges. We calculated the VN and Brandes' algorithm on networks with 1% of edges being weighted as 2, and the execution times are presented in Figure 3(a). We can see that the difference in execution time become larger when the network size increases. When the network size is 50,000, the VN algorithm is 3 and 1.5 times faster than the Brandes' algorithm, for average network degrees of 2 and 10, respectively.

Figure 3. Running time of the VN algorithm and Brandes' algorithm.

Figure 3

(a) Networks with average degree Inline graphic and Inline graphic, 1% of the network edges are weighted with Inline graphic; (b) Networks with average degree Inline graphic, all edges are weighted with Inline graphic.

The above results reveal that the VN algorithm is much faster on large sparse networks with limited number of weights. However, we should note that the VN algorithm is quite sensitive to the average degree and weight sum of the network, for any network with Inline graphic, the VN algorithm will not outperform Brandes' algorithm as long as Inline graphic. To illustrate the sensitivity of the VN algorithm, we run algorithms on networks with Inline graphic and Inline graphic, and the difference in running times between the two algorithms decreases quickly as expected (Figure 3(b)).

Discussion

By replacing the weighted edges with connected virtual nodes, we propose the VN algorithm to calculate the betweenness centrality in weighted networks with the BFS rather than shortest path algorithms. The VN algorithm uses Inline graphic time and Inline graphic space. Theoretically, the VN algorithm outperforms the Brandes' algorithm when Inline graphic, indicating that when the average edge weight is low, considerable time can be saved on large sparse networks. The simulation study confirms that when Inline graphic, more time can be saved when the network grows large.

We should note that the VN algorithm is quite sensitive to the density and weight of the networks, it can hardly outperform the Brandes' algorithm when the network is dense and weighted with large values. What's more, the theoretical threshold value Inline graphic, could be even lower in practice since the VN algorithm requires more space. Despite these limitations, given the evidences that large-scale networks in real life are mostly sparse, and the BFS is much easier to implement than the Fibonacci heap based shortest path algorithms, the VN algorithm is expected to be able to save analysis time in many scenarios. Moreover, the VN algorithm can easily be generalized to calculate other shortest path based network properties, such as closeness centrality [33], graph centrality [34], stress centrality [35], and so on. We henceforth recommend that network researchers to use the VN algorithm when the studied network is large, sparse, and lightly weighted, but continue to use the Brandes' algorithm otherwise.

Supporting Information

Both the Brandes' algorithm and the VN algorithm are written in C and are available upon request from the author.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work is funded by the National Science Foundation (no. 70971131). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Albert R, Barabasi AL. Statistical mechanics of complex networks. Reviews of Modern Physics. 2002;74:47–97. [Google Scholar]
  • 2.Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Physics Reports. 2006;424:175–308. [Google Scholar]
  • 3.Newman MEJ. The structure and function of complex networks. SIAM Review. 2003;45:167–256. [Google Scholar]
  • 4.Albert R, Jeong H, Barabasi AL. Diameter of the world-wide web. Nature. 1999;401:130–131. [Google Scholar]
  • 5.Liljeros F, Edling CR, Nunes Amaral LA, Stanley HE, Aberg Y. Social networks: The web of human sexual contacts. Nature. 2001;411:907–908. doi: 10.1038/35082140. [DOI] [PubMed] [Google Scholar]
  • 6.Barabasi AL. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435:207–211. doi: 10.1038/nature03459. [DOI] [PubMed] [Google Scholar]
  • 7.Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks. Nature. 2000;406:378–382. doi: 10.1038/35019019. [DOI] [PubMed] [Google Scholar]
  • 8.Holme P, Kim BJ, Yoon CN, Han SK. Attack vulnerability of complex networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 2002;65:056109/1–056109/14. doi: 10.1103/PhysRevE.65.056109. [DOI] [PubMed] [Google Scholar]
  • 9.Pastor-Satorras R, Vespignani A. Epidemic spreading in scale-free networks. Physical Review Letters. 2001;86:3200–3203. doi: 10.1103/PhysRevLett.86.3200. [DOI] [PubMed] [Google Scholar]
  • 10.Bader DA, Madduri K. Parallel algorithms for evaluating centrality indices in real-world networks. 2006. pp. 539–547. In: Proceedings of the International Conference on Parallel Processing.
  • 11.Madduri K, Bader DA. Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. 2009. In: IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium.
  • 12.Linton CF. A set of measures of centrality based on betweenness. Sociometry. 1977;40:35–41. [Google Scholar]
  • 13.Scott J. Social Network Analysis: A Handbook. SAGE Publications; 2000. [Google Scholar]
  • 14.Brandes U. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology. 2001;25:163–177. [Google Scholar]
  • 15.Bader DA, Kintali S, Madduri K, Mihail M. 2007. Approximating betweenness centrality.
  • 16.Jiang K, Ediger D, Bader DA. Generalizing k-betweenness centrality using short paths and a parallel multithreaded implementation. 2009. pp. 542–549. In: Proceedings of the International Conference on Parallel Processing.
  • 17.Madduri K, Ediger D, Jiang K, Bader DA, Chavarria-Miranda D. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. 2009. In: IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium.
  • 18.Puzis R, Elovici Y, Dolev S. Fast algorithm for successive computation of group betweenness centrality. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 2007;76 doi: 10.1103/PhysRevE.76.056709. [DOI] [PubMed] [Google Scholar]
  • 19.Tan G, Sreedhar VC, Gao GR. Analysis and performance results of computing betweenness centrality on ibm cyclops64. Journal of Supercomputing. 2009:1–24. [Google Scholar]
  • 20.Tan G, Tu D, Sun N. A parallel algorithm for computing betweenness centrality. 2009. pp. 340–347. In: Proceedings of the International Conference on Parallel Processing.
  • 21.Tu D, Tan G. Characterizing betweenness centrality algorithm on multi-core architectures. 2009. pp. 182–189. In: Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009.
  • 22.Schwarte N, Cohen R, Ben-Avraham D, Barabasi AL, Havlin S. Percolation in directed scalefree networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 2002;66:015104/1–015104/4. doi: 10.1103/PhysRevE.66.015104. [DOI] [PubMed] [Google Scholar]
  • 23.Newman MEJ. A measure of betweenness centrality based on random walks. Social Networks. 2005;27:39–54. [Google Scholar]
  • 24.Fredman ML, Tarjan RE. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM. 1987;34:596–6l5. [Google Scholar]
  • 25.Dijkstra EW. A note on two problems in connexion with graphs. Numerische Mathematik. 1959;1:269–271. [Google Scholar]
  • 26.Onnela JP, Saramaki J, Hyvonen J, Szabo G, Lazer D, et al. Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:7332–7336. doi: 10.1073/pnas.0610245104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yook SH, Jeong H, Barabasi AL. Modeling the internet's large-scale topology. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:13382–13386. doi: 10.1073/pnas.172501399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Barabasi AL, Jeong H, Neda Z, Ravasz E, Schubert A, et al. Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications. 2002;311:590–614. [Google Scholar]
  • 29.Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
  • 30.Panzarasa P, Opsahl T, Carley KM. Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology. 2009;60:911–932. [Google Scholar]
  • 31.Skvoretz J, Carolina UOS. Complexity theory and models for social networks. Complexity. 2003:47–55. [Google Scholar]
  • 32.Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
  • 33.Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31:581–603. doi: 10.1007/BF02289527. [DOI] [PubMed] [Google Scholar]
  • 34.Hage P, Harary F. Eccentricity and centrality in networks. Social Networks. 1995;17:57–63. [Google Scholar]
  • 35.Shimbel A. Structural parameters of communication networks. The Bulletin of Mathematical Biophysics. 1953;15:501–507. [Google Scholar]

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES